New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix exclusive lock to work on Linux+NFS #22
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found some things I think need to be adjusted before we look at merging this in.
I'd also like to think about how we can test for this functionality on Linux. Is there a test we could write that would assert that the retry actually works / fails as we expect it to? I'd like to hear your thoughts.
When making these changes, please amend the current commit instead of creating new one(s).
flock_unix.go
Outdated
// Since Linux 3.4 (commit 55725513) | ||
// Probably NFSv4 where flock() is emulated by fcntl(). | ||
func (f *Flock) retryError(err error) bool { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please remove the whitespace after the method header?
flock_unix.go
Outdated
@@ -132,6 +139,8 @@ func (f *Flock) try(locked *bool, flag int) (bool, error) { | |||
} | |||
} | |||
|
|||
retried := false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change this to var retried bool
.
flock_unix.go
Outdated
// mode and try again. This comes from util-linux/sys-utils/flock.c: | ||
// Since Linux 3.4 (commit 55725513) | ||
// Probably NFSv4 where flock() is emulated by fcntl(). | ||
func (f *Flock) retryError(err error) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a huge fan of this method name as it doesn't match what it does. The method itself doesn't retry anything, and instead determines if we should retry and silently changes our file descriptor out from under us.
I'm not sure how I feel about the os.OpenFile
error being silently swallowed by this method, since that would indicate something really weird is going on. In that case, I'd want to bubble the error up to the consumer because the validation you've done already indicates that it should not fail.
How do you feel about something like reopenFDOnError
. It could return a bool
or an error
. If there's an error
, something went wrong and we should tell the caller. If bool is false
, we decided not to reopen the file. If it's true
, we should retry.
6d464cf
to
35c8d77
Compare
The way I've tested it is by setting up two VMs with an NFS4 share between them. Here's a Vagrantfile that you can use to set up two VMs with NFS4 automatically that I've verified the fix on: https://gist.github.com/virtuald/af30557554ea839c6d00fc2386bc0a58 However, seems like a lot of extra work to test this, and I don't really want to take more time to do that. Feel free to do so. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor thing, and this is good to go! 👍
flock_unix.go
Outdated
} | ||
} else { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I try to keep the left side of my functions as the happy path, and the right side of them as the unhappy path. I think it's something I picked this up from the stdlib. For something like that, how would you feel about:
shouldRetry, reopenErr := f.reopenFDOnError(err)
if reopenErr != nil {
return reopenErr
}
if !shouldRetry {
return err
}
return syscall.Flock(int(f.fh.Fd()), flag)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
- Comes from flock.c in util-linux
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this contribution! I'll look to get it merged in shortly.
Fix exclusive lock to work on Linux+NFS Signed-off-by: Tim Heckman <t@heckman.io>
Thank you again. I merged and released this under |
I noticed that trying to obtain an exclusive lock on a file sitting on an NFS system did not work, and returned an error 'invalid file descriptor'. When debugging, I noticed that the
flock
command line tool was able to work without any errors, so I took a look at the source code.https://github.com/karelzak/util-linux/blob/05541825553524e2ac353eb6c62c8b5ad049de24/sys-utils/flock.c#L294
The fix present there is to open the file in read+write mode, then try again.