Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lockfile.Locked() must not be used to make any decisions #1388

Closed
mtrmac opened this issue Oct 14, 2022 · 1 comment · Fixed by #1399
Closed

lockfile.Locked() must not be used to make any decisions #1388

mtrmac opened this issue Oct 14, 2022 · 1 comment · Fixed by #1399

Comments

@mtrmac
Copy link
Collaborator

mtrmac commented Oct 14, 2022

Looking into how #1324 crashed:

github.com/containers/storage/pkg/lockfile.(*lockfile).Touch
	github.com/containers/storage@v1.41.1-0.20220616120034-7df64288ef35/pkg/lockfile/lockfile_unix.go:247 +0x198

failed because layerStore.lockfile was not held.

But we got there via

github.com/containers/storage.(*layerStore).saveLayers
	github.com/containers/storage@v1.41.1-0.20220616120034-7df64288ef35/layers.go:498 +0x1d4
github.com/containers/storage.(*layerStore).Load
	github.com/containers/storage@v1.41.1-0.20220616120034-7df64288ef35/layers.go:422 +0x534

which can only happen if layerStore.lockfile.Locked()!

My guess at what has happened (apart from the mystery of graphLock, #1324 (comment) ):

  • There was a layerStore instance for the path in question.
  • Somehow, we got another layerStore instance. Most likely, store.graphLock.TouchedSince(…) triggered a re-load, which abandons the current store.layerStore and created a new instance, without writing for the old one to go quiescent. That’s should be fine, the code is supposed to be reliable against out-of-process writers, so two independent in-process writers is easier.
  • The two layerStore instances get layerStore.lockfile via GetLockfile(path), and both receive the same pkg/lockfile.lockfile instance. That’s as expected, and fine.
  • But now, Locked() is no longer useful! It shows that one of the two owners of that lock object has locked it for writing, but it does not say that it’s the current one.

So, with A and B, both living in the same process, owning the same lock, this becomes possible:

  • A locks the lock
  • B checks .Locked() and thinks that B owns it
  • B starts to make destructive changes
  • At this point A and B are racing and creating invalid state, although out-of-process readers and writers are still excluded
  • A can unlock the lock
  • B continues to make destructive changes, racing against other processes.

AFAICS Locked is fundamentally not useful in Go (at least not without a concept of a goroutine ID, which is intentionally not available).

So, following the precedent of #1376 , I plan to break the API and remove that method.

@mtrmac
Copy link
Collaborator Author

mtrmac commented Oct 14, 2022

Trying to remove it, it turns out that Locked() is a fairly essential part of the current design, via

// EITHER someStore.Lock() or someStore.RLock()
defer someStore.Unlock()
someStore.ReloadIfChanged()
    // …
    someStore.Load()

and someStore.Load() either calls .Locked() to see if it should save, or in the case of containersStore, just calls .Save() ignoring errors, expecting it to bail out if !Locked().

That model could be made to work, by replacing .Locked() with .IsOwnedLockLockedForWriting() (which is undefined if the caller does not hold the lock at all, but safe for held locks), but I just don’t want to add that to the ~stable public API, even for a short while. The callers should know, to make any kind of static reasoning about lock ownership viable.

The work for #1332 will remove ReloadIfChanged from the public interface of the C/I/L stores, having callers instead call startReading/startWriting. At that point the C/I/L stores will be fully in control over whether the lock is held for reading or or writing, allowing us to completely remove all parts of Locked().

Note to self: WIP branch no-Locked, just with the trivial bits and not the real restructuring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant