Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel fails the build with error code 1 on remote cache timeout #22356

Open
AlexanderGolovlev opened this issue May 13, 2024 · 0 comments
Open
Labels
help wanted Someone outside the Bazel team could own this P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request

Comments

@AlexanderGolovlev
Copy link
Contributor

Description of the bug:

We have a rarely reproduced issue when Bazel fails to build the target because of remote cache timeout error. This is probably caused by network issues or high load of remote cache server. The error message in log is like following:

ERROR: /Users/buildadmin/a/c/g_DCJ9SF26/r/*****/BUILD.bazel:73:24: Compiling *****/event.cpp failed: unable to finalize action: Download of '/cas/0535e6bae6fd101e71c26d71b14b7f1af7bd0abbb98189318dd1c8b9ddfcb4f9' timed out. Received 0 bytes.

After that Bazel exits with error code 1, which means "BUILD_FAILURE". This is unexpected.
We would expect that Bazel either falls back to local execution on remote cache errors or fails with error code like 32 ("REMOTE_ENVIRONMENTAL_ERROR"), 34 ("REMOTE_ERROR") or 39 ("REMOTE_CACHE_EVICTED"). In that case we would be able to handle the cache errors outside the Bazel and restart the build if needed.
With existing behavior, we are not able to identify the reason of failure and process it properly.

Which category does this issue belong to?

Local Execution

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

No response

Which operating system are you running Bazel on?

Linux, macOS

What is the output of bazel info release?

release 7.0.2

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

Log files:
linux.log
macos.log

I guess that BulkTransferException should be handled separately from IOException in

} catch (EnvironmentalExecException | IOException e) {

@github-actions github-actions bot added the team-Local-Exec Issues and PRs for the Execution (Local) team label May 13, 2024
@sgowroji sgowroji added team-Remote-Exec Issues and PRs for the Execution (Remote) team and removed team-Local-Exec Issues and PRs for the Execution (Local) team labels May 14, 2024
@oquenchil oquenchil added type: feature request help wanted Someone outside the Bazel team could own this P2 We'll consider working on this in future. (Assignee optional) and removed type: bug untriaged labels May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Someone outside the Bazel team could own this P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request
Projects
None yet
Development

No branches or pull requests

5 participants