Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry git fetch operations when net.git_fetch_with_cli = true is set #9870

Closed
wants to merge 1 commit into from

Conversation

Veetaha
Copy link

@Veetaha Veetaha commented Sep 3, 2021

Closes #9820

This requires reading the stderr output of git matching it's output with a list of regular expressions.
Unfortunately, this is heuristic, but the best we can do. The implementation is heavily inspired by chromium git-retry script
Stackoverflow mostly references python implementation (also in chromium codebase), however, comments in python code say it's deprecated and reference Go implementation, which this PR is based on.

Some rough edges in this PR:

  • The ProcessBuilder used to spawn git process provides methods with anyhow::Error API which isn't very convenient for local error handling and leaves some margin for bugs. It uses ProcessError under the hood, having access to it would be handy, but I didn't want to break the error type tendency in the API of this struct
  • According to our experience with cargo failing on git CLI errors, not all patterns are covered by chromium impl (see the tests for examples), so I had to extend that list
  • Had to add regex crate as a dependency, however, it already is a transitive dependency of cargo, it's included via env_logger, globset, and ignore crates, so this doesn't increase the dependency tree.

This requires reading the `stderr` output of git matching it's output with a list of regular expressions.
Unfortunatelly, this is heuristic, but the best we can do. The implementation is heavily inspired by [chromium git-retry script](https://chromium.googlesource.com/infra/infra/+/b9f6e35d2aa1ce9fcb385e414866e0b1635a6c77/go/src/infra/tools/git/retry_regexp.go#17)
Stackoverflow mostly references python implementation (also in chromium codebase), however comments in python code say [it's deprecated](https://chromium.googlesource.com/chromium/tools/depot_tools.git/+/a3d1aaf112499b19d1e2aafe5c33dad3266a6226/git_common.py#77) and reference Go implementation, which this PR is based on.

Some rough edges in this PR:
- The ProcessBuilder used to spawn `git` process provides methods with `anyhow::Error` API which isn't very convenient for local error handling and leaves some margin for bugs
-  According to our experience with cargo failing on git CLI errors, not all patterns are covered by `chromium` impl (see the tests for examples), so I had to extend that list
- Had to add `regex` crate as a dependency, however, it already is a transitive dependency of `cargo`, it's included via `env_logger`, `globset`, and `ignore` crates, so this doesn't increase the dependency tree
@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @ehuss (or someone else) soon.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 3, 2021
Comment on lines +157 to +161
#[test]
fn git_cli_error_is_spurious() {
// This test is important because it also verifies that the regular
// expressions used to match git CLI transient errors are compilable,
// which is checked only at runtime.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, git diff is not smart enough here.
The actual change is that tests were moved under #[cfg(test)] mod tests and this new test was added

@Veetaha
Copy link
Author

Veetaha commented Sep 3, 2021

Hey, @alexcrichton, I believe you are also a good candidate for the review here, since you seem to be an author of the code changed in this PR.

// The test data is taken from real git failures
assert_retryable(
"fatal: unable to access 'org/repo.git': Failed to connect to github.com port 443: Timed out",
);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a rustfmt bug here

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! Could you also add some tests about errors not being retry-able?

/// The inner error must by constructed from [`ProcessError`] that
/// was returned as a result of running `git` CLI operation.
/// If it isn't true, the code won't panic, but [`Self::is_retryable()`]
/// will always return `false`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of listing this as an invariant I think it might be better to do the downcast on construction of a GitCliError and have the internals stored as a ProcessError directly. This constructor could then return Result<Self> or continue to return Self and assert that the downcast succeeds.

Copy link
Author

@Veetaha Veetaha Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit scared to make this an assert!(), because I added only unit tests that don't invoke the code of fetch_with_git_cli(). I guess I should add a mocked binary implementation of git to verify this works more rigorously. Otherwise, I am afraid that this code is invoked very randomly and catching potential panics of cargo in the asserts will be non-deterministic =(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API was just added here so I think it's fine to have an assert here. It's true, though, that git-fetch-with-cli is not heavily tested in Cargo's CI. If you'd like, though, then you could make the constructor argument here ProcessError and then when this is invoked you can use error.downcast()? to automatically propagate non-ProcessError errors and only pass in the process error here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for the suggestions, I'll try to figure this out!

Ok(())
}

/// Run the given command assuming it's spawning `git` CLI process
/// retrying all the possible transient errors it may fail with.
fn run_git_cli_with_retries(config: &Config, cmd: &ProcessBuilder) -> CargoResult<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we'll have much use for running this in multiple places, so it should be fine to fold this definition into the function above.

pub fn is_retryable(&self) -> bool {
const GIT_TRANSIENT_ERRORS: &[&str] = &[
// Error patterns are taken from chromium.
// Please, keep them 1-to-1 exactly the same as in their source code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we need to be so strict here. For example I don't think that we'll be looking into their source code every-so-often to see if these lists match, and it's not like someone should send a PR to chromium before updating the list here.

.as_deref()
.map(|stderr| GIT_TRANSIENT_ERRORS_RE.is_match(&String::from_utf8_lossy(stderr)))
.unwrap_or_else(|| {
log::warn!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is something we'll want to assert! in the constructor of this error because otherwise no one will ever see this message.


let source = &format!("(?i){}", source);

regex::Regex::new(source).expect("BUG: invalid git transient error regex")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may be a use-case for RegexSet?

@ehuss ehuss added S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 26, 2021
@Veetaha
Copy link
Author

Veetaha commented Oct 4, 2021

Just a short note: I am not gone, I am going to return back to this a bit later (have a lot of stuff in my current project going on), if anyone wants to take over, feel free to do so, but I'll try to finish this anyway

@bors
Copy link
Collaborator

bors commented Jan 12, 2022

☔ The latest upstream changes (presumably #10265) made this pull request unmergeable. Please resolve the merge conflicts.

@ehuss
Copy link
Contributor

ehuss commented Mar 23, 2022

Closing due to inactivity. We probably don't have capacity to accept a change of this kind at this time.

@ehuss ehuss closed this Mar 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retry git fetch operations
5 participants