Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] unexpected lfs object oid #5548

Open
adlternative opened this issue Oct 17, 2023 · 5 comments
Open

[BUG] unexpected lfs object oid #5548

adlternative opened this issue Oct 17, 2023 · 5 comments

Comments

@adlternative
Copy link

Describe the bug
Our code platform users are experiencing inconsistent OID when downloading LFS objects. After verification, it was found that the LFS data on the storage server (similar to S3) is corrupted. Since this issue has occurred twice, both times on Windows, we strongly suspect that the data corruption is caused by the Git LFS client on Windows when pushing LFS objects.

image

Expected behavior
No data corruption.

System environment
git-lfs/2.7.1
git version 2.21.0.windows.1

@bk2204
Copy link
Member

bk2204 commented Oct 17, 2023

Hey,

Have you checked the disk and memory on the storage server and the machine that pushed the data? Also, are you using any sort of antivirus, firewall, TLS MITM device, or proxy other than the default? These can all be causes of corruption.

Also, have you tried with a newer version of Git and Git LFS? The versions you're using are quite old and they have known security vulnerabilities on Windows. The version you're using also looks like it might have had a problem with resuming failed downloads. For example, I think #3890 fixed several related problems, and #4229 discusses server-related problems with HTTP/2.0 that caused this issue. I'd upgrade to something like Git 2.40 or 2.41, which should have Git LFS 3.3.0 (3.4.0 seems to have a bug that is fixed but not yet released that hits some Windows users, so I might avoid that version at the moment).

We haven't seen any recent reports of corruption, so either you're the first to notice it, we've already fixed it, or there's something relevant about your environment that triggers it.

@adlternative
Copy link
Author

Hey,

Have you checked the disk and memory on the storage server and the machine that pushed the data? Also, are you using any sort of antivirus, firewall, TLS MITM device, or proxy other than the default? These can all be causes of corruption.

Yes, I have asked our company's storage team, and it appears that the storage service is functioning normally.

Also, have you tried with a newer version of Git and Git LFS? The versions you're using are quite old and they have known security vulnerabilities on Windows. The version you're using also looks like it might have had a problem with resuming failed downloads. For example, I think #3890 fixed several related problems, and #4229 discusses server-related problems with HTTP/2.0 that caused this issue. I'd upgrade to something like Git 2.40 or 2.41, which should have Git LFS 3.3.0 (3.4.0 seems to have a bug that is fixed but not yet released that hits some Windows users, so I might avoid that version at the moment).

Yes, I had a user with git-lfs 3.3.0 encounter the same issue before. I would recommend them to wait for the release of version 3.4.0.

We haven't seen any recent reports of corruption, so either you're the first to notice it, we've already fixed it, or there's something relevant about your environment that triggers it.

Thanks for help!

@bk2204
Copy link
Member

bk2204 commented Oct 18, 2023

It would still be helpful for us to know if you're using a proxy of some sort, including an antivirus or firewall other than the default. Also, are you operating on the repository concurrently with multiple processes (since that's been a source of corruption in the past) or is it always a single user and process? Have the affected objects always been uploaded (not downloaded) by a similar set of users, or users using a certain version of Git LFS?

The reason I persist in asking so many questions and am a little doubtful that we have a bug is that every LFS file upload to GitHub is checked that its hash verifies correctly, either implicitly through an Amazon S3 signed URL or by the storage backend for GitHub Enterprise Server. Thus, if we did have such a bug, it would also be affecting GitHub users, and we'd see a lot more problems with corruption on upload, which would be loudly detected. Again, it's not completely out of the question that we have a subtle bug somewhere such that things are broken in some cases and because we retry, things get automatically fixed, but it seems unlikely.

If you still think there's a bug, we can certainly look into it more, and of course we'd want to fix it, but it's very hard to pin down without more information about what exactly is going on when the problem manifests itself, so any relevant information you can provide about the situation when such a file is being uploaded is helpful.

@adlternative
Copy link
Author

It would still be helpful for us to know if you're using a proxy of some sort, including an antivirus or firewall other than the default. Also, are you operating on the repository concurrently with multiple processes (since that's been a source of corruption in the past) or is it always a single user and process? Have the affected objects always been uploaded (not downloaded) by a similar set of users, or users using a certain version of Git LFS?

  1. our service have firewall. However, I do not believe that this data corruption is related to the firewall because users are often in the same internal network environment.
  2. Users should not deliberately use multiple processes to manipulate the repository, but I cannot guarantee this.
  3. This error has occurred in different users' git-lfs (with different version).

The reason I persist in asking so many questions and am a little doubtful that we have a bug is that every LFS file upload to GitHub is checked that its hash verifies correctly, either implicitly through an Amazon S3 signed URL or by the storage backend for GitHub Enterprise Server. Thus, if we did have such a bug, it would also be affecting GitHub users, and we'd see a lot more problems with corruption on upload, which would be loudly detected. Again, it's not completely out of the question that we have a subtle bug somewhere such that things are broken in some cases and because we retry, things get automatically fixed, but it seems unlikely.

  1. Our storage service (similar to S3) does not calculate the sha256sum of the object during upload, although it does calculate the crc32. Therefore, we do not have a true means of verifying the integrity of the data uploaded to the server.
  2. Can you please explain in detail what bug you are referring to? What do you mean "retry"?

If you still think there's a bug, we can certainly look into it more, and of course we'd want to fix it, but it's very hard to pin down without more information about what exactly is going on when the problem manifests itself, so any relevant information you can provide about the situation when such a file is being uploaded is helpful.

What I can confirm is that the data on the user client is intact, both the original files and the LFS objects saved in .git/lfs/xxxx have correct sha256sum values. However, the data in the range of 0x0001E000 to 0x00048000 is corrupted after being uploaded to the server, so the sha256sum calculated for the LFS objects on the server is incorrect. Therefore, I strongly suspect that there was an unexpected issue during the process of uploading the LFS objects.

@bk2204
Copy link
Member

bk2204 commented Oct 23, 2023

On many Windows systems, non-default antiviruses and firewalls often use TLS interception or otherwise access the raw bytes of the data connection, and some of those also try to tamper with the data or connection. That's also true for many proxies and TLS MITM devices and most monitoring software, and it's well known that such software breaks both Git and Git LFS. If you are using such software, it would be helpful to completely uninstall it (disabling it is often not enough) and reboot, to see if it fixes the problem. That's the reason I ask that, because it's one of the top problems that people see with Git and Git LFS, and it very often manifests in ways just like this.

Because Git LFS works with large files and sometimes network connections have problems, an upload is retried if it isn't successful the first time. We have no known bugs in this case, but if we did have a bug and the server verified the SHA-256 hash of the object and a re-upload somehow fixed the problem, then it could have in theory gone undetected that the upload failed the first time. However, it would mean that Git LFS would have to attempt to upload the data a second time, which I feel like users would tend to notice, since this would make their pushes slower, which is why I doubt that we have such a bug.

Can you reproduce the problem when pushing the specific object ID with git lfs push --object-id? If so, can you please run that in Git Bash prefixed with GIT_TRACE=1 GIT_TRANSFER_TRACE=1 GIT_CURL_VERBOSE=1 and then include the output as a text file attachment? Note that if you're using an older version of Git, you'll need to manually redact the Authorization headers so they don't contain credentials.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants