Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry git fetch operations when net.git_fetch_with_cli = true is set #9870

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.toml
Expand Up @@ -49,6 +49,7 @@ memchr = "2.1.3"
num_cpus = "1.0"
opener = "0.5"
percent-encoding = "2.0"
regex = "1.0"
rustfix = "0.6.0"
semver = { version = "1.0.3", features = ["serde"] }
serde = { version = "1.0.123", features = ["derive"] }
Expand Down
23 changes: 18 additions & 5 deletions src/cargo/sources/git/utils.rs
Expand Up @@ -2,7 +2,7 @@
//! authentication/cloning.

use crate::core::GitReference;
use crate::util::errors::CargoResult;
use crate::util::errors::{CargoResult, GitCliError};
use crate::util::{network, Config, IntoUrl, MetricsCounter, Progress};
use anyhow::{anyhow, Context as _};
use cargo_util::{paths, ProcessBuilder};
Expand Down Expand Up @@ -893,13 +893,26 @@ fn fetch_with_cli(
.env_remove("GIT_OBJECT_DIRECTORY")
.env_remove("GIT_ALTERNATE_OBJECT_DIRECTORIES")
.cwd(repo.path());
config
.shell()
.verbose(|s| s.status("Running", &cmd.to_string()))?;
cmd.exec_with_output()?;

run_git_cli_with_retries(config, &cmd)?;

Ok(())
}

/// Run the given command assuming it's spawning `git` CLI process
/// retrying all the possible transient errors it may fail with.
fn run_git_cli_with_retries(config: &Config, cmd: &ProcessBuilder) -> CargoResult<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we'll have much use for running this in multiple places, so it should be fine to fold this definition into the function above.

network::with_retry(config, || {
config
.shell()
.verbose(|s| s.status("Running", &cmd.to_string()))?;

cmd.exec_with_output().map_err(GitCliError::new)?;

Ok(())
})
}

/// Cargo has a bunch of long-lived git repositories in its global cache and
/// some, like the index, are updated very frequently. Right now each update
/// creates a new "pack file" inside the git database, and over time this can
Expand Down
136 changes: 136 additions & 0 deletions src/cargo/util/errors.rs
Expand Up @@ -4,6 +4,8 @@ use crate::core::{TargetKind, Workspace};
use crate::ops::CompileOptions;
use anyhow::Error;
use cargo_util::ProcessError;
use itertools::Itertools;
use lazy_static::lazy_static;
use std::fmt;
use std::path::PathBuf;

Expand Down Expand Up @@ -300,6 +302,140 @@ impl From<clap::Error> for CliError {
}
}

/// Error that comes from running `git` CLI. This wrapper exists to detect
/// this kind of errors in [`crate::util::network::with_retry`] and properly
/// retry them.
pub struct GitCliError {
inner: anyhow::Error,
}

impl GitCliError {
/// Creates an insteance of [`GitCliError`].
///
/// # Invariants
///
/// The inner error must by constructed from [`ProcessError`] that
/// was returned as a result of running `git` CLI operation.
/// If it isn't true, the code won't panic, but [`Self::is_retryable()`]
/// will always return `false`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of listing this as an invariant I think it might be better to do the downcast on construction of a GitCliError and have the internals stored as a ProcessError directly. This constructor could then return Result<Self> or continue to return Self and assert that the downcast succeeds.

Copy link
Author

@Veetaha Veetaha Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit scared to make this an assert!(), because I added only unit tests that don't invoke the code of fetch_with_git_cli(). I guess I should add a mocked binary implementation of git to verify this works more rigorously. Otherwise, I am afraid that this code is invoked very randomly and catching potential panics of cargo in the asserts will be non-deterministic =(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API was just added here so I think it's fine to have an assert here. It's true, though, that git-fetch-with-cli is not heavily tested in Cargo's CI. If you'd like, though, then you could make the constructor argument here ProcessError and then when this is invoked you can use error.downcast()? to automatically propagate non-ProcessError errors and only pass in the process error here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for the suggestions, I'll try to figure this out!

pub fn new(inner: anyhow::Error) -> Self {
Self { inner }
}

/// Returns `true` if the git CLI error is transient and may be retried.
///
/// The code here is inspired by `chromium` git retries code:
/// <https://chromium.googlesource.com/infra/infra/+/b9f6e35d2aa1ce9fcb385e414866e0b1635a6c77/go/src/infra/tools/git/retry_regexp.go#17>
pub fn is_retryable(&self) -> bool {
const GIT_TRANSIENT_ERRORS: &[&str] = &[
// Error patterns are taken from chromium.
// Please, keep them 1-to-1 exactly the same as in their source code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we need to be so strict here. For example I don't think that we'll be looking into their source code every-so-often to see if these lists match, and it's not like someone should send a PR to chromium before updating the list here.

//
// If you are updating them, please keep the commit ref in the doc comment
// of this method up-to-date.
r#"!.*\[remote rejected\].*\(error in hook\)"#,
r#"!.*\[remote rejected\].*\(failed to lock\)"#,
r#"!.*\[remote rejected\].*\(error in Gerrit backend\)"#,
r#"remote error: Internal Server Error"#,
r#"fatal: Couldn't find remote ref "#,
r#"git fetch_pack: expected ACK/NAK, got"#,
r#"protocol error: bad pack header"#,
r#"The remote end hung up unexpectedly"#,
r#"The remote end hung up upon initial contact"#,
r#"TLS packet with unexpected length was received"#,
r#"RPC failed; result=\d+, HTTP code = \d+"#,
r#"Connection timed out"#,
r#"repository cannot accept new pushes; contact support"#,
r#"Service Temporarily Unavailable"#,
r#"The service is currently unavailable"#,
r#"Connection refused"#,
r#"The requested URL returned error: 5\d+"#,
r#"Operation too slow"#,
r#"Connection reset by peer"#,
r#"Unable to look up"#,
r#"Couldn't resolve host"#,
r#"Unknown SSL protocol error"#,
r#"Revision .* of patch set \d+ does not match refs/changes"#,
r#"Git repository not found"#,
r#"Couldn't connect to server"#,
r#"transfer closed with outstanding read data remaining"#,
r#"Access denied to"#,
r#"The requested URL returned error: 429"#,
r#"RESOURCE_EXHAUSTED"#,
r#"Resource has been exhausted"#,
r#"check quota"#,
r#"fetch-pack: protocol error: bad band #\d+"#,
r#"The requested URL returned error: 400"#,
r#"fetch-pack: fetch failed"#,
r#"fetch-pack: unable to spawn http-fetch"#,
r#"fetch-pack: expected keep then TAB at start of http-fetch output"#,
r#"fetch-pack: expected hash then LF at end of http-fetch output"#,
r#"fetch-pack: unable to finish http-fetch"#,
r#"fetch-pack: pack downloaded from .* does not match expected hash .*"#,
// Regexes that are not included in chromium source code
// This one was reported in https://github.com/rust-lang/cargo/issues/9820
r#"Failed to connect .* Timed out"#,
];

/// Merge all of the regex into a single regex OR-ed together
fn merge_regex(regexps: &[&str]) -> regex::Regex {
let source = regexps
.iter()
.format_with("|", |regex, f| f(&format_args!("(?:{})", regex)));

let source = &format!("(?i){}", source);

regex::Regex::new(source).expect("BUG: invalid git transient error regex")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may be a use-case for RegexSet?

}

lazy_static! {
static ref GIT_TRANSIENT_ERRORS_RE: regex::Regex = merge_regex(GIT_TRANSIENT_ERRORS);
}

let err = match self.inner.downcast_ref::<ProcessError>() {
Some(err) => err,
None => {
log::warn!(
"BUG: `GitCliError` was constructed from non-`ProcessError`, \
can't determine if it's retryable, assuming it's not... Error: {:?}",
self,
);
return false;
}
};

err.stderr
.as_deref()
.map(|stderr| GIT_TRANSIENT_ERRORS_RE.is_match(&String::from_utf8_lossy(stderr)))
.unwrap_or_else(|| {
log::warn!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is something we'll want to assert! in the constructor of this error because otherwise no one will ever see this message.

"BUG: git CLI stderr wasn't captured, \
can't determine if it's retryable, assuming it's not... Error: {:?}",
err
);
false
})
}
}

impl std::error::Error for GitCliError {
fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
self.inner.source()
}
}

impl fmt::Debug for GitCliError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
self.inner.fmt(f)
}
}

impl fmt::Display for GitCliError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
self.inner.fmt(f)
}
}

// =============================================================================
// Construction helpers

Expand Down
138 changes: 92 additions & 46 deletions src/cargo/util/network.rs
@@ -1,6 +1,6 @@
use anyhow::Error;

use crate::util::errors::{CargoResult, HttpNot200};
use crate::util::errors::{CargoResult, GitCliError, HttpNot200};
use crate::util::Config;

pub struct Retry<'a> {
Expand Down Expand Up @@ -43,6 +43,9 @@ fn maybe_spurious(err: &Error) -> bool {
_ => (),
}
}
if let Some(git_cli_err) = err.downcast_ref::<GitCliError>() {
return git_cli_err.is_retryable();
}
if let Some(curl_err) = err.downcast_ref::<curl::Error>() {
if curl_err.is_couldnt_connect()
|| curl_err.is_couldnt_resolve_proxy()
Expand Down Expand Up @@ -94,54 +97,97 @@ where
}
}

#[test]
fn with_retry_repeats_the_call_then_works() {
use crate::core::Shell;
#[cfg(test)]
mod tests {
use super::*;
use cargo_util::ProcessError;

#[test]
fn with_retry_repeats_the_call_then_works() {
use crate::core::Shell;

//Error HTTP codes (5xx) are considered maybe_spurious and will prompt retry
let error1 = HttpNot200 {
code: 501,
url: "Uri".to_string(),
//Error HTTP codes (5xx) are considered maybe_spurious and will prompt retry
let error1 = HttpNot200 {
code: 501,
url: "Uri".to_string(),
}
.into();
let error2 = HttpNot200 {
code: 502,
url: "Uri".to_string(),
}
.into();
let mut results: Vec<CargoResult<()>> = vec![Ok(()), Err(error1), Err(error2)];
let config = Config::default().unwrap();
*config.shell() = Shell::from_write(Box::new(Vec::new()));
let result = with_retry(&config, || results.pop().unwrap());
assert!(result.is_ok())
}
.into();
let error2 = HttpNot200 {
code: 502,
url: "Uri".to_string(),

#[test]
fn with_retry_finds_nested_spurious_errors() {
use crate::core::Shell;

//Error HTTP codes (5xx) are considered maybe_spurious and will prompt retry
//String error messages are not considered spurious
let error1 = anyhow::Error::from(HttpNot200 {
code: 501,
url: "Uri".to_string(),
});
let error1 = anyhow::Error::from(error1.context("A non-spurious wrapping err"));
let error2 = anyhow::Error::from(HttpNot200 {
code: 502,
url: "Uri".to_string(),
});
let error2 = anyhow::Error::from(error2.context("A second chained error"));
let mut results: Vec<CargoResult<()>> = vec![Ok(()), Err(error1), Err(error2)];
let config = Config::default().unwrap();
*config.shell() = Shell::from_write(Box::new(Vec::new()));
let result = with_retry(&config, || results.pop().unwrap());
assert!(result.is_ok())
}
.into();
let mut results: Vec<CargoResult<()>> = vec![Ok(()), Err(error1), Err(error2)];
let config = Config::default().unwrap();
*config.shell() = Shell::from_write(Box::new(Vec::new()));
let result = with_retry(&config, || results.pop().unwrap());
assert!(result.is_ok())
}

#[test]
fn with_retry_finds_nested_spurious_errors() {
use crate::core::Shell;

//Error HTTP codes (5xx) are considered maybe_spurious and will prompt retry
//String error messages are not considered spurious
let error1 = anyhow::Error::from(HttpNot200 {
code: 501,
url: "Uri".to_string(),
});
let error1 = anyhow::Error::from(error1.context("A non-spurious wrapping err"));
let error2 = anyhow::Error::from(HttpNot200 {
code: 502,
url: "Uri".to_string(),
});
let error2 = anyhow::Error::from(error2.context("A second chained error"));
let mut results: Vec<CargoResult<()>> = vec![Ok(()), Err(error1), Err(error2)];
let config = Config::default().unwrap();
*config.shell() = Shell::from_write(Box::new(Vec::new()));
let result = with_retry(&config, || results.pop().unwrap());
assert!(result.is_ok())
}
#[test]
fn curle_http2_stream_is_spurious() {
let code = curl_sys::CURLE_HTTP2_STREAM;
let err = curl::Error::new(code);
assert!(maybe_spurious(&err.into()));
}

#[test]
fn git_cli_error_is_spurious() {
// This test is important because it also verifies that the regular
// expressions used to match git CLI transient errors are compilable,
// which is checked only at runtime.
Comment on lines +157 to +161
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, git diff is not smart enough here.
The actual change is that tests were moved under #[cfg(test)] mod tests and this new test was added


fn assert_retryable(err: &str) {
let err = GitCliError::new(
ProcessError::new_raw(
err,
None,
"status_doesnt_matter",
None,
Some(err.as_bytes()),
)
.into(),
);

let err: Error = err.into();

#[test]
fn curle_http2_stream_is_spurious() {
let code = curl_sys::CURLE_HTTP2_STREAM;
let err = curl::Error::new(code);
assert!(maybe_spurious(&err.into()));
assert!(
maybe_spurious(&err),
"Git CLI wasn't parsed as retryable: {}",
err
);
}

// The test data is taken from real git failures
assert_retryable(
"fatal: unable to access 'org/repo.git': Failed to connect to github.com port 443: Timed out",
);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a rustfmt bug here

assert_retryable(
"error: RPC failed; HTTP 412 curl 22 The requested URL returned error: 412\n\
fatal: the remote end hung up unexpectedly",
);
}
}