New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix data race between ReloadIfChanged and read-only usage of layerStore #1322
Conversation
Sanitizer output:
|
The explanation #898 (comment) sometimes fails because the Modified function has three cases when it can return true:
And there is a possibility to execute ReloadIfChanged concurrently with read-only access to layerStore's internals. I understand, that my patch doesn't provide thread safety in all cases, but the best solution requires mutex throughout the layers.go (from my point of view). And it can lead to performance degradation (see #278) |
@@ -1924,7 +1931,6 @@ func (r *layerStore) Modified() (bool, error) { | |||
} | |||
if info != nil { | |||
tmodified = info.ModTime() != r.layerspathModified | |||
r.layerspathModified = info.ModTime() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the state of an object in a constant (by meaning) function is a bit strange. From my point of view, such functions should repeatedly return the same value. This was not the case before the fix.
aa6c22a
to
bdbc88d
Compare
There was a race condition if a goroutine accessed to layerStore public methods under RO lock and at the same time ReloadIfChanged was called. In real life, it can occurr when there are two concurrent PlayKube requests in Podman. Signed-off-by: Mikhail Khachayants <tyler92@inbox.ru>
bdbc88d
to
a360de2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saschagrunert the code was added by you with f637c5e. Would this PR break CRI-O? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the change LGTM
Maybe, the corresponding CRI-O test never made it into our suite: https://github.com/saschagrunert/cri-o/blob/e1d2a3cb4961c2af8bf8bb2d2fb09857c54c638f/test/image.bats#L305-L327 I'll can add it so that we can catch the regression when updating c/storage. |
LGTM |
@tyler92 @saschagrunert @giuseppe @nalind Actually, what does this fix and how, at all?? Looking at the trace
The read lock is shared, right? So there is no exclusion of If it doesn’t change that, what does it change? AFAICT the write to (I do agree that a I just can’t see how calling I’m almost certainly missing something essential — what is it? |
It's ver-very-very not obvious PR. I realize that and think this module should be rewritten to avoid such things. What happened here: Goroutine 1:
Goroutine 2:
What my PR does? In step (3) from Goroutine 1 we change layerspathModified and the condition in step (2) from Goroutine 2 will be false. Once again - this PR is just patch, which avoid data race, but without refactoring this module I don't know how it's possible to make a good solution. In my opinion - it's a bad idea to call Load under read-lock. Maybe something like an upgradable lock is required here. |
Ah, so |
I have also filed #1332 for the |
There was a race condition if a goroutine accessed to layerStore public
methods under RO lock and at the same time ReloadIfChanged was called.
In real life, it can occurr when there are two concurrent PlayKube
requests in Podman.
Signed-off-by: Mikhail Khachayants tyler92@inbox.ru