[BUG] unexpected lfs object oid #5548

adlternative · 2023-10-17T09:02:50Z

Describe the bug
Our code platform users are experiencing inconsistent OID when downloading LFS objects. After verification, it was found that the LFS data on the storage server (similar to S3) is corrupted. Since this issue has occurred twice, both times on Windows, we strongly suspect that the data corruption is caused by the Git LFS client on Windows when pushing LFS objects.

Expected behavior
No data corruption.

System environment
git-lfs/2.7.1
git version 2.21.0.windows.1

bk2204 · 2023-10-17T13:00:50Z

Hey,

Have you checked the disk and memory on the storage server and the machine that pushed the data? Also, are you using any sort of antivirus, firewall, TLS MITM device, or proxy other than the default? These can all be causes of corruption.

Also, have you tried with a newer version of Git and Git LFS? The versions you're using are quite old and they have known security vulnerabilities on Windows. The version you're using also looks like it might have had a problem with resuming failed downloads. For example, I think #3890 fixed several related problems, and #4229 discusses server-related problems with HTTP/2.0 that caused this issue. I'd upgrade to something like Git 2.40 or 2.41, which should have Git LFS 3.3.0 (3.4.0 seems to have a bug that is fixed but not yet released that hits some Windows users, so I might avoid that version at the moment).

We haven't seen any recent reports of corruption, so either you're the first to notice it, we've already fixed it, or there's something relevant about your environment that triggers it.

adlternative · 2023-10-18T12:26:18Z

Hey,

Have you checked the disk and memory on the storage server and the machine that pushed the data? Also, are you using any sort of antivirus, firewall, TLS MITM device, or proxy other than the default? These can all be causes of corruption.

Yes, I have asked our company's storage team, and it appears that the storage service is functioning normally.

Also, have you tried with a newer version of Git and Git LFS? The versions you're using are quite old and they have known security vulnerabilities on Windows. The version you're using also looks like it might have had a problem with resuming failed downloads. For example, I think #3890 fixed several related problems, and #4229 discusses server-related problems with HTTP/2.0 that caused this issue. I'd upgrade to something like Git 2.40 or 2.41, which should have Git LFS 3.3.0 (3.4.0 seems to have a bug that is fixed but not yet released that hits some Windows users, so I might avoid that version at the moment).

Yes, I had a user with git-lfs 3.3.0 encounter the same issue before. I would recommend them to wait for the release of version 3.4.0.

We haven't seen any recent reports of corruption, so either you're the first to notice it, we've already fixed it, or there's something relevant about your environment that triggers it.

Thanks for help!

bk2204 · 2023-10-18T13:07:30Z

It would still be helpful for us to know if you're using a proxy of some sort, including an antivirus or firewall other than the default. Also, are you operating on the repository concurrently with multiple processes (since that's been a source of corruption in the past) or is it always a single user and process? Have the affected objects always been uploaded (not downloaded) by a similar set of users, or users using a certain version of Git LFS?

The reason I persist in asking so many questions and am a little doubtful that we have a bug is that every LFS file upload to GitHub is checked that its hash verifies correctly, either implicitly through an Amazon S3 signed URL or by the storage backend for GitHub Enterprise Server. Thus, if we did have such a bug, it would also be affecting GitHub users, and we'd see a lot more problems with corruption on upload, which would be loudly detected. Again, it's not completely out of the question that we have a subtle bug somewhere such that things are broken in some cases and because we retry, things get automatically fixed, but it seems unlikely.

If you still think there's a bug, we can certainly look into it more, and of course we'd want to fix it, but it's very hard to pin down without more information about what exactly is going on when the problem manifests itself, so any relevant information you can provide about the situation when such a file is being uploaded is helpful.

adlternative · 2023-10-19T02:40:26Z

It would still be helpful for us to know if you're using a proxy of some sort, including an antivirus or firewall other than the default. Also, are you operating on the repository concurrently with multiple processes (since that's been a source of corruption in the past) or is it always a single user and process? Have the affected objects always been uploaded (not downloaded) by a similar set of users, or users using a certain version of Git LFS?

our service have firewall. However, I do not believe that this data corruption is related to the firewall because users are often in the same internal network environment.
Users should not deliberately use multiple processes to manipulate the repository, but I cannot guarantee this.
This error has occurred in different users' git-lfs (with different version).

The reason I persist in asking so many questions and am a little doubtful that we have a bug is that every LFS file upload to GitHub is checked that its hash verifies correctly, either implicitly through an Amazon S3 signed URL or by the storage backend for GitHub Enterprise Server. Thus, if we did have such a bug, it would also be affecting GitHub users, and we'd see a lot more problems with corruption on upload, which would be loudly detected. Again, it's not completely out of the question that we have a subtle bug somewhere such that things are broken in some cases and because we retry, things get automatically fixed, but it seems unlikely.

Our storage service (similar to S3) does not calculate the sha256sum of the object during upload, although it does calculate the crc32. Therefore, we do not have a true means of verifying the integrity of the data uploaded to the server.
Can you please explain in detail what bug you are referring to? What do you mean "retry"?

If you still think there's a bug, we can certainly look into it more, and of course we'd want to fix it, but it's very hard to pin down without more information about what exactly is going on when the problem manifests itself, so any relevant information you can provide about the situation when such a file is being uploaded is helpful.

What I can confirm is that the data on the user client is intact, both the original files and the LFS objects saved in .git/lfs/xxxx have correct sha256sum values. However, the data in the range of 0x0001E000 to 0x00048000 is corrupted after being uploaded to the server, so the sha256sum calculated for the LFS objects on the server is incorrect. Therefore, I strongly suspect that there was an unexpected issue during the process of uploading the LFS objects.

bk2204 · 2023-10-23T19:27:01Z

On many Windows systems, non-default antiviruses and firewalls often use TLS interception or otherwise access the raw bytes of the data connection, and some of those also try to tamper with the data or connection. That's also true for many proxies and TLS MITM devices and most monitoring software, and it's well known that such software breaks both Git and Git LFS. If you are using such software, it would be helpful to completely uninstall it (disabling it is often not enough) and reboot, to see if it fixes the problem. That's the reason I ask that, because it's one of the top problems that people see with Git and Git LFS, and it very often manifests in ways just like this.

Because Git LFS works with large files and sometimes network connections have problems, an upload is retried if it isn't successful the first time. We have no known bugs in this case, but if we did have a bug and the server verified the SHA-256 hash of the object and a re-upload somehow fixed the problem, then it could have in theory gone undetected that the upload failed the first time. However, it would mean that Git LFS would have to attempt to upload the data a second time, which I feel like users would tend to notice, since this would make their pushes slower, which is why I doubt that we have such a bug.

Can you reproduce the problem when pushing the specific object ID with git lfs push --object-id? If so, can you please run that in Git Bash prefixed with GIT_TRACE=1 GIT_TRANSFER_TRACE=1 GIT_CURL_VERBOSE=1 and then include the output as a text file attachment? Note that if you're using an older version of Git, you'll need to manually redact the Authorization headers so they don't contain credentials.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] unexpected lfs object oid #5548

[BUG] unexpected lfs object oid #5548

adlternative commented Oct 17, 2023

bk2204 commented Oct 17, 2023

adlternative commented Oct 18, 2023

bk2204 commented Oct 18, 2023

adlternative commented Oct 19, 2023

bk2204 commented Oct 23, 2023

[BUG] unexpected lfs object oid #5548

[BUG] unexpected lfs object oid #5548

Comments

adlternative commented Oct 17, 2023

bk2204 commented Oct 17, 2023

adlternative commented Oct 18, 2023

bk2204 commented Oct 18, 2023

adlternative commented Oct 19, 2023

bk2204 commented Oct 23, 2023