Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The new SeparateBodyFileCache design makes updates/reads non-atomic #324

Open
itamarst opened this issue Oct 18, 2023 · 4 comments
Open
Labels

Comments

@itamarst
Copy link
Contributor

itamarst commented Oct 18, 2023

The problem

Scenario 1

Program crashes after writing metadata but before writing body. Now the cache thinks the body is empty.

Scenario 2

Multiple processes using the same cache directory. Process A writes metadata, but not yet body. Process B gets metadata, then gets body, and now it thinks the body is empty.

Workarounds

For pip I added logic that basically pretends a cache entry is missing if it doesn't have both body and metadata files.

Solutions

Probably need both a new API that involves writing and reading both at once, and an implementation that makes sure that's atomic. See also the somewhat related #325.

@woodruffw woodruffw added the bug label Oct 18, 2023
@itamarst
Copy link
Contributor Author

This is my fault, by the way... 😢

We discussed various solutions on the pip side here: pypa/pip#12361

@woodruffw
Copy link
Member

Thanks for the report!

Probably need both a new API that involves writing and reading both at once, and an implementation that makes sure that's atomic. See also the somewhat related #325.

This makes sense to me -- if I'm understanding correctly, this would mean replacing get + get_body with a single get, correct? Along with internal changes to ensure that we synchronize/atomize the two I/O operations?

@itamarst
Copy link
Contributor Author

Yeah, plus have a single write API function.

For cross-file atomicity, writing the body file first might do the trick, if reading code reads the metadata file first. This would allow the file format to stay the same, at least.

(Would also be nice if write API didn't take the full bytes, since that means you have to keep full response in memory, but that maybe a bit too much work. Or could use the mmap hack again to work around that and accepts buffer API-implementing objects.)

@itamarst
Copy link
Contributor Author

Note that a combination of fixing this and #235 in the ways suggested in previous comment, plus os.replace(), might allow removing the need for locking altogether.

pip doesn't use locking because historic reliance on deprecated lockfile. They'd vendor the newer filelock if need be, but in general seem happier with fewer dependencies since they need to vendor everything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants