Skip to content
This repository has been archived by the owner on Jul 16, 2022. It is now read-only.

Option to disable fsync #17

Open
untitaker opened this issue Jul 26, 2016 · 16 comments
Open

Option to disable fsync #17

untitaker opened this issue Jul 26, 2016 · 16 comments

Comments

@untitaker
Copy link
Owner

Unsure if anybody needs this. I might need this because in one usecase I'm writing a lot of files (to different filenames), and only need a guarantee that a SIGKILL won't leave a partially written file (at the target location, tmpfiles are irrelevant).

Also this might be a problem with SSDs, as mentioned in #6

@untitaker
Copy link
Owner Author

Probably should be done through subclassing, but this is not cleanly possible at the moment.

@untitaker
Copy link
Owner Author

Counterargument: https://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/#atomicity-not-durability

@mozillazg
Copy link

I have the same usecase recently. After disable fsync, the program that writing 900+ files has 8x speedup(from 8s to 1s) on Linux.

@untitaker
Copy link
Owner Author

untitaker commented Jun 26, 2018

@mozillazg removing the fsync for the file breaks atomicity, which is the point of this library. I think you removed this one, right?

We can add an option to remove the fsync for the directory though.

@mozillazg
Copy link

mozillazg commented Jun 26, 2018

@untitaker Yes, you are right, I removed that(both for file and directory).

Does fsync for file is must operation? Not sure whether below document is useful, does rename work as read after write?:

After a write() to a regular file has successfully returned:

Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified.

Any subsequent successful write() to the same byte position in the file shall overwrite that file data.

http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html

I'll test just remove the fsync for the directory.

@untitaker
Copy link
Owner Author

@mozillazg read after write without fsync will work fine.

The fsync is for when your computer looses power or gets a bluescreen: it minimizes the chance of a partially written file (either old or new version will be there)

@mozillazg
Copy link

@untitaker Thanks for your reply.

On ext4, when auto_da_alloc is enabled(default on) it looks like removing the fsync for file is safe too:

auto_da_alloc(*)	Many broken applications don't use fsync() when 
noauto_da_alloc		replacing existing files via patterns such as
			fd = open("foo.new")/write(fd,..)/close(fd)/
			rename("foo.new", "foo"), or worse yet,
			fd = open("foo", O_TRUNC)/write(fd,..)/close(fd).
			If auto_da_alloc is enabled, ext4 will detect
			the replace-via-rename and replace-via-truncate
			patterns and force that any delayed allocation
			blocks are allocated such that at the next
			journal commit, in the default data=ordered
			mode, the data blocks of the new file are forced
			to disk before the rename() operation is
			committed.  This provides roughly the same level
			of guarantees as ext3, and avoids the
			"zero-length" problem that can happen when a
			system crashes before the delayed allocation
			blocks are forced to disk.

https://www.kernel.org/doc/Documentation/filesystems/ext4.txt

@untitaker
Copy link
Owner Author

untitaker commented Jun 27, 2018 via email

@blueyed
Copy link

blueyed commented Nov 7, 2019

The _sync_directory used with _replace_atomic on Unix is causing a huge slowdown (collection time for pytest does up by 3s from 5s to 8s, using ~60-70 calls to it IIRC), and is not really necessary. I think that if you care of outages due to loss of power you should call it manually probably?
How do you feel about having a keyword argument that defaults to False (change in behavior, but results in a speedup by default; allowing to keep the old behavior)?
Alternatively it could provide an opt-in for the new behavior (fsync_dir=True by default).

But it seems like doing the extra fsync-dir (only for Unix) is a bit out of scope for the library in the first place?

@untitaker
Copy link
Owner Author

I would like to preserve the current default behavior. I am happy with having an option.

@blueyed
Copy link

blueyed commented Nov 10, 2019

@untitaker
Cool. As for myself I've removed usage of atomicwrites in pytest for non-Windows (pytest-dev/pytest#6147), so I have no use for this myself - i.e. I will not create a PR for it.

@jongiddy
Copy link

I have a use-case for both ways:

  1. I am using atomic files to record the state of a procedure. If the machine crashes, I want to be sure that I see what state we were in before reboot, so I want fsync.
  2. I am using atomic write to create files during service startup. As long as the file is in the memory buffers, the service will see them OK, even if they aren't synced. If the machine crashes, the service startup will rewrite the files. So I don't need fsync.

@pirate
Copy link

pirate commented Aug 21, 2020

I need both options as well for https://github.com/pirate/ArchiveBox to be able to support network storage drive that don't implement FSYNC, while still retaining FSYNC on drives that do support it.

@untitaker
Copy link
Owner Author

Does this proposal make sense to y'all? untitaker/rust-atomicwrites#45

I believe we could allow one to opt out of directory fsync, but the file fsync needs to stay

@pirate
Copy link

pirate commented Jun 11, 2021

Dunno about others but for ArchiveBox I need to be able to disable file fsync more than directory fsync.

@nyanpasu64
Copy link

nyanpasu64 commented Jun 11, 2021

I've written my own fork of rust-atomicwrites for another application, with an added enum to toggle directory fsync: https://github.com/nyanpasu64/handlr/blob/atomic-save/src/common/atomic_save.rs. Note that I haven't used python-atomicwrites in a real-world application.

I feel my chosen API is fairly uncontroversial on Linux (aside from picking a more informative name for the enum cases, and the API break on Rust, which can be mitigated with a default argument in Python). But I don't know how to make it behave on Windows, where directory fsync isn't an option provided by the OS.

  • Should Durability::SyncDir be removed on Windows, resulting in an error if you even try to access it?
  • Should passing it into the "atomic save" operation return an Err or panic in Rust (or raise an exception in Python)?
  • Should passing it in instead silently not fsync the directory, with or without a warning printed to stderr or returned to the user somehow?
  • Should there be separate Durability::SyncDirIfPossible and SyncDirOrFail enum values? I feel this is too much API complexity, and it's better to make a decision one way or another, and tell users about it.

I haven't explored other options aside from an added parameter.

EDIT: Maybe Durability::SyncDir should not be present on Windows, and there should be a const SYNC_DIR_IF_POSSIBLE: Durability which is SyncDir on Linux and DontSyncDir on Windows, either standalone or as a const in the impl block (which is possible unlike a type alias).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants