Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix O_SYNC flag when opening data files #272

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

nuno-faria
Copy link

According to Riak's docs, using the POSIX C API to write files, together with the o_sync strategy, should result in synchronous data writes. However, when looking at the open file descriptors for the Bitcask data, we can see that only the .write.lock files are opened with the O_SYNC flag, e.g.:

COMMAND     PID   TID TASKCMD   USER   FD      TYPE         FILE-FLAG             DEVICE SIZE/OFF    NODE NAME
beam.smp  86800 87045 Eleveldb  root  489u      REG         RW,SYN,LG              0,183       86 2043031 /riak/rel/riak/data/bitcask/1324485858831130769622089379649131486563188867072/bitcask.write.lock
beam.smp  86800 87045 Eleveldb  root  490u      REG          RW,AP,LG              0,183    14767 2043032 /riak/rel/riak/data/bitcask/319703483166135013357056057156686910549735243776/1.bitcask.data
beam.smp  86800 87045 Eleveldb  root  491u      REG          RW,AP,LG              0,183     3036 2043033 /riak/rel/riak/data/bitcask/319703483166135013357056057156686910549735243776/1.bitcask.hint
beam.smp  86800 87045 Eleveldb  root  493u      REG          RW,AP,LG              0,183    13427 2043035 /riak/rel/riak/data/bitcask/1324485858831130769622089379649131486563188867072/1.bitcask.data
beam.smp  86800 87045 Eleveldb  root  494u      REG          RW,AP,LG              0,183     2760 2043036 /riak/rel/riak/data/bitcask/1324485858831130769622089379649131486563188867072/1.bitcask.hint
...

This means writes are still asynchronous, which could lead to data loss even after sending the reply to the client.

The reason for this is that while the get_file_open_flags C function adds the O_SYNC to the bitmask, it replaces it when it receives an ATOM_CREATE option. The easiest fix is to simply add the o_sync option to the end of the list at bitcask_fileops.erl:87, to ensure that it always appears after the create. Now the files are correctly opened with the O_SYNC flag:

COMMAND     PID   TID TASKCMD   USER   FD      TYPE         FILE-FLAG             DEVICE SIZE/OFF    NODE NAME
beam.smp   4080 6414 Eleveldb  root  553u      REG         RW,SYN,LG              0,183       85 2043159 /riak/rel/riak/data/bitcask/1324485858831130769622089379649131486563188867072/bitcask.write.lock
beam.smp   4080 6414 Eleveldb  root  554u      REG      RW,AP,SYN,LG              0,183     5106 2043160 /riak/rel/riak/data/bitcask/319703483166135013357056057156686910549735243776/1.bitcask.data
beam.smp   4080 6414 Eleveldb  root  555u      REG      RW,AP,SYN,LG              0,183     1058 2043161 /riak/rel/riak/data/bitcask/319703483166135013357056057156686910549735243776/1.bitcask.hint
beam.smp   4080 6414 Eleveldb  root  557u      REG      RW,AP,SYN,LG              0,183     3988 2043163 /riak/rel/riak/data/bitcask/1324485858831130769622089379649131486563188867072/1.bitcask.data
beam.smp   4080 6414 Eleveldb  root  558u      REG      RW,AP,SYN,LG              0,183      828 2043164 /riak/rel/riak/data/bitcask/1324485858831130769622089379649131486563188867072/1.bitcask.hint
...

@martinsumner
Copy link
Contributor

@nsaadouni - don't know if you've picked up on this PR, but is this something your team could look at?

@martinsumner
Copy link
Contributor

Note that this is also I believe broken when using the erlang bitcask io_mode. It adds the o_sync flag, when the mode supported in Erlang is to use sync instead.

I think this is an issue dating back to when a bespoke version of the erlang VM was used within Riak that was patched to add support for o_sync, but when the PR was accepted upstream the name was changed to sync.

For a new user of bitcask, I think use of the nif IO mode is questionable. Given the fact that native erlang file IO now uses dirty IO scheduling, it would seem to be obviously safer than using a bespoke nif without dirty-nif support.

Would be interesting to know what the current big users of bitcask actually use in production.

@nuno-faria
Copy link
Author

@martinsumner In our case, the main reason we chose the nif mode was due to the warning provided in the most recent Riak documentation about Bitcask as the backend, stating that:

"Synchronous file I/O via o_sync is supported in Bitcask if io_mode is set to nif and is not supported in the erlang mode."

So we just relied on the nif interface, later discovering the bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants