New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add cleanup sub-command that remove local bugs and identities #933
add cleanup sub-command that remove local bugs and identities #933
Conversation
3470196
to
a985d33
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not too bad really, but need some adjustment.
Thanks for doing this.
7434dcf
to
512e68b
Compare
512e68b
to
b39dff3
Compare
Looking at your PR, I realized that it should be up to the cache layer to update or clear the full-text index (Bleve), meaning that those commands should go through the cache, not directly to the repository. As it turns out, I'm in the process of refactoring that cache (#938), so I'll come back to your PR when I'm done with that. Don't worry it will be merged, it just will require a bit more time. |
Okay, thanks for letting me know. |
b39dff3
to
88afb5c
Compare
PR rebased and adjusted:
Thanks for your help :-) |
hmm, it seems like I unearthed some issue around deleting files on windows. The CI fails with things like:
|
Yeah, I don't think it's specific to Windows. I don't have access to a Windows right now but I managed to reproduce the error on my Mac:
It's not failing every time but it fails often, about half the time. I still don't have much luck identifying the cause but it appears like some sort of race condition that is trying to access the file before it is there. It feels like the file is being created by the logic but it was not synced to the disk yet and unavailable for read at the time it is being accessed. But that's just a theory right now. |
I see what's going on. It's trying to remove I tried patching my function to not use multierr.ErrWaitGroup and go sequentially and return first error. That way the test no longer fails even when executed many times. But if I make that patch we might be hiding a problem with the cache. What do you think? |
So you are suggesting that there is a concurrent access to something somewhere? If so that's indeed a bug, and concurrently calling |
Yes, there is a race condition that is not detectable in the current test for some reason but when I make a new test that adds 2 identities and 2 bugs and tries to concurrently remove them the go tooling is able to detect the race condition. Here's the test:
This test will reliably fail and the race condition is now detectable:
|
@zinderic I looked at it and the race you found is because you don't wait for your goroutine in your test, so you have Point is, I don't think it's the problem that the CI found. It looks more like a race at the filesystem level. @smoyer64 speculated that it might be related to https://cs.opensource.google/go/go/+/refs/tags/go1.19.4:src/testing/testing.go;drc=38cfb3be9d486833456276777155980d1ec0823e;l=1229 |
I checked the link and the comments suggest something specific to Windows. But if it's specific to Windows then why the test fails also on MacOS if run enough times? Running 10 tests like that has about 50-60% error rate on my laptop:
In other words 5 or 6 out of 10 fail when run with -count=10. Running them one at a time seems to produce less failures but it still has significant chance of failing. Those observations were made on Mac Air M2 so I don't think that's specific to just Windows. |
This might be related to #809 - running tests in parallel should work since the test environments are built individually and theoretically isolated (and use different temporary directories). If there's a concurrency issue with the code-under-test, that might explain a bit more. Intermittent failures are still occurring on the Windows build (one happened this morning) and this PR might pass if the check is rerun - I'll try that right now. |
7b8c9bc
to
5e94d06
Compare
Looks like a different test fails now - TestGoGitRepo but it fails in similar way as before. This time I cannot reproduce the failure on MacOS. |
I'll take a look at this after work today - if we track enough of these down, then |
Even the test that I thought I fixed fails intermittently again - I wonder if there's something in the refactoring. At least the more recent failure gives a better reason (a locked file handle). |
The refactor and this PR add more file or folder deletion. I think it's the reason why we see those issues more now. Of course it's possible there is actually an issue in the code, but it's not obvious to me what it would be. It seems more compelling to me that it would be a variation of https://cs.opensource.google/go/go/+/refs/tags/go1.19.4:src/testing/testing.go;drc=38cfb3be9d486833456276777155980d1ec0823e;l=1229 |
Except we're not leaving files behind any more - and the error message is now about being denied access to a file. I guess I'm going to have to boot into my Windows partition and see what's actually happening. |
3907cfe
to
658bc1e
Compare
Still happening:
|
That looks like a theory:
If somehow go-git is racing on listing+removal of git refs, it's possible that Another issue with go-git, is that removing a ref should ideally be idempotent (aka not fail if the ref is gone already), but that doesn't seem to be the case. |
Interesting ... and that sounds a lot like what I was going to go spelunking for in |
I found the issue. Turns out, that was not at all the right place: go-git/go-git#659 |
658bc1e
to
03dcd7e
Compare
go-git/go-git#659 might not be perfect as it triggered the CI there, but does solve our problem. Problem which isn't related to that PR, so I'll merge and we can follow up later with go-git. |
Related to #676
I've added subcommand called "cleanup" which removes local bugs and identities.