Retry git fetch operations when `net.git_fetch_with_cli = true` is set #9870

Veetaha · 2021-09-03T01:37:20Z

This requires reading the stderr output of git matching it's output with a list of regular expressions.
Unfortunately, this is heuristic, but the best we can do. The implementation is heavily inspired by chromium git-retry script
Stackoverflow mostly references python implementation (also in chromium codebase), however, comments in python code say it's deprecated and reference Go implementation, which this PR is based on.

Some rough edges in this PR:

The ProcessBuilder used to spawn git process provides methods with anyhow::Error API which isn't very convenient for local error handling and leaves some margin for bugs. It uses ProcessError under the hood, having access to it would be handy, but I didn't want to break the error type tendency in the API of this struct
According to our experience with cargo failing on git CLI errors, not all patterns are covered by chromium impl (see the tests for examples), so I had to extend that list
Had to add regex crate as a dependency, however, it already is a transitive dependency of cargo, it's included via env_logger, globset, and ignore crates, so this doesn't increase the dependency tree.

This requires reading the `stderr` output of git matching it's output with a list of regular expressions. Unfortunatelly, this is heuristic, but the best we can do. The implementation is heavily inspired by [chromium git-retry script](https://chromium.googlesource.com/infra/infra/+/b9f6e35d2aa1ce9fcb385e414866e0b1635a6c77/go/src/infra/tools/git/retry_regexp.go#17) Stackoverflow mostly references python implementation (also in chromium codebase), however comments in python code say [it's deprecated](https://chromium.googlesource.com/chromium/tools/depot_tools.git/+/a3d1aaf112499b19d1e2aafe5c33dad3266a6226/git_common.py#77) and reference Go implementation, which this PR is based on. Some rough edges in this PR: - The ProcessBuilder used to spawn `git` process provides methods with `anyhow::Error` API which isn't very convenient for local error handling and leaves some margin for bugs - According to our experience with cargo failing on git CLI errors, not all patterns are covered by `chromium` impl (see the tests for examples), so I had to extend that list - Had to add `regex` crate as a dependency, however, it already is a transitive dependency of `cargo`, it's included via `env_logger`, `globset`, and `ignore` crates, so this doesn't increase the dependency tree

rust-highfive · 2021-09-03T01:37:22Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @ehuss (or someone else) soon.

Please see the contribution instructions for more information.

Veetaha · 2021-09-03T01:41:08Z

src/cargo/util/network.rs

+    #[test]
+    fn git_cli_error_is_spurious() {
+        // This test is important because it also verifies that the regular
+        // expressions used to match git CLI transient errors are compilable,
+        // which is checked only at runtime.


Unfortunately, git diff is not smart enough here.
The actual change is that tests were moved under #[cfg(test)] mod tests and this new test was added

Veetaha · 2021-09-03T01:43:39Z

Hey, @alexcrichton, I believe you are also a good candidate for the review here, since you seem to be an author of the code changed in this PR.

Veetaha · 2021-09-03T01:47:22Z

src/cargo/util/network.rs

+        // The test data is taken from real git failures
+        assert_retryable(
+        "fatal: unable to access 'org/repo.git': Failed to connect to github.com port 443: Timed out",
+    );


This looks like a rustfmt bug here

alexcrichton

Thanks for this! Could you also add some tests about errors not being retry-able?

alexcrichton · 2021-09-03T15:29:45Z

src/cargo/util/errors.rs

+    /// The inner error must by constructed from [`ProcessError`] that
+    /// was returned as a result of running `git` CLI operation.
+    /// If it isn't true, the code won't panic, but [`Self::is_retryable()`]
+    /// will always return `false`.


Instead of listing this as an invariant I think it might be better to do the downcast on construction of a GitCliError and have the internals stored as a ProcessError directly. This constructor could then return Result<Self> or continue to return Self and assert that the downcast succeeds.

I am a bit scared to make this an assert!(), because I added only unit tests that don't invoke the code of fetch_with_git_cli(). I guess I should add a mocked binary implementation of git to verify this works more rigorously. Otherwise, I am afraid that this code is invoked very randomly and catching potential panics of cargo in the asserts will be non-deterministic =(

This API was just added here so I think it's fine to have an assert here. It's true, though, that git-fetch-with-cli is not heavily tested in Cargo's CI. If you'd like, though, then you could make the constructor argument here ProcessError and then when this is invoked you can use error.downcast()? to automatically propagate non-ProcessError errors and only pass in the process error here.

Okay, thanks for the suggestions, I'll try to figure this out!

alexcrichton · 2021-09-03T15:30:07Z

src/cargo/sources/git/utils.rs

    Ok(())
 }

+/// Run the given command assuming it's spawning `git` CLI process
+/// retrying all the possible transient errors it may fail with.
+fn run_git_cli_with_retries(config: &Config, cmd: &ProcessBuilder) -> CargoResult<()> {


I don't think we'll have much use for running this in multiple places, so it should be fine to fold this definition into the function above.

alexcrichton · 2021-09-03T15:40:05Z

src/cargo/util/errors.rs

+    pub fn is_retryable(&self) -> bool {
+        const GIT_TRANSIENT_ERRORS: &[&str] = &[
+            // Error patterns are taken from chromium.
+            // Please, keep them 1-to-1 exactly the same as in their source code


I don't think that we need to be so strict here. For example I don't think that we'll be looking into their source code every-so-often to see if these lists match, and it's not like someone should send a PR to chromium before updating the list here.

alexcrichton · 2021-09-03T15:40:51Z

src/cargo/util/errors.rs

+            .as_deref()
+            .map(|stderr| GIT_TRANSIENT_ERRORS_RE.is_match(&String::from_utf8_lossy(stderr)))
+            .unwrap_or_else(|| {
+                log::warn!(


I think this is something we'll want to assert! in the constructor of this error because otherwise no one will ever see this message.

alexcrichton · 2021-09-03T15:41:17Z

src/cargo/util/errors.rs

+
+            let source = &format!("(?i){}", source);
+
+            regex::Regex::new(source).expect("BUG: invalid git transient error regex")


I think this may be a use-case for RegexSet?

Veetaha · 2021-10-04T08:34:49Z

Just a short note: I am not gone, I am going to return back to this a bit later (have a lot of stuff in my current project going on), if anyone wants to take over, feel free to do so, but I'll try to finish this anyway

bors · 2022-01-12T01:05:02Z

☔ The latest upstream changes (presumably #10265) made this pull request unmergeable. Please resolve the merge conflicts.

ehuss · 2022-03-23T00:44:20Z

Closing due to inactivity. We probably don't have capacity to accept a change of this kind at this time.

rust-highfive assigned ehuss Sep 3, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 3, 2021

Veetaha commented Sep 3, 2021

View reviewed changes

alexcrichton reviewed Sep 3, 2021

View reviewed changes

ehuss added S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 26, 2021

ehuss closed this Mar 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry git fetch operations when `net.git_fetch_with_cli = true` is set #9870

Retry git fetch operations when `net.git_fetch_with_cli = true` is set #9870

Veetaha commented Sep 3, 2021 •

edited

rust-highfive commented Sep 3, 2021

Veetaha Sep 3, 2021

Veetaha commented Sep 3, 2021 •

edited

Veetaha Sep 3, 2021

alexcrichton left a comment

alexcrichton Sep 3, 2021

Veetaha Sep 3, 2021 •

edited

alexcrichton Sep 3, 2021

Veetaha Sep 3, 2021

alexcrichton Sep 3, 2021

alexcrichton Sep 3, 2021

alexcrichton Sep 3, 2021

alexcrichton Sep 3, 2021

Veetaha commented Oct 4, 2021

bors commented Jan 12, 2022

ehuss commented Mar 23, 2022


		let source = &format!("(?i){}", source);

		regex::Regex::new(source).expect("BUG: invalid git transient error regex")

Retry git fetch operations when net.git_fetch_with_cli = true is set #9870

Retry git fetch operations when net.git_fetch_with_cli = true is set #9870

Conversation

Veetaha commented Sep 3, 2021 • edited

rust-highfive commented Sep 3, 2021

Choose a reason for hiding this comment

Veetaha commented Sep 3, 2021 • edited

Choose a reason for hiding this comment

alexcrichton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Veetaha Sep 3, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Veetaha commented Oct 4, 2021

bors commented Jan 12, 2022

ehuss commented Mar 23, 2022

Retry git fetch operations when `net.git_fetch_with_cli = true` is set #9870

Retry git fetch operations when `net.git_fetch_with_cli = true` is set #9870

Veetaha commented Sep 3, 2021 •

edited

Veetaha commented Sep 3, 2021 •

edited

Veetaha Sep 3, 2021 •

edited