Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit cache does not work with AWS ECR due to wrong manifest mediatype #3714

Open
mcheshkov opened this issue Jan 16, 2024 · 4 comments
Open
Labels
type:bug Something isn't working

Comments

@mcheshkov
Copy link

What went wrong?
When I try to run earthly with --remote-cache pointing to ECR and --push it fails on the very last step after pushing all blobs with Error: error writing manifest blob: failed commit on ref "sha256:823...540": unexpected status from PUT request to https://123.dkr.ecr.eu-central-1.amazonaws.com/v2/build-cache/manifests/explicit-cache-test: 400 Bad Request

To reproduce:

VERSION 0.7

foo:
    FROM docker.io/debian:bookworm-20230904
    RUN echo foo

Login to ECR, create private registry and run something like earthly --remote-cache=123.dkr.ecr.eu-central-1.amazonaws.com/build-cache:foo-explicit-cache --push +foo (use your ECR id, region and repository name).

What should have happened?

Since support for cache manifest is already implemented in ECR (according to aws/containers-roadmap#876, ticket remains opened just for discussion), I expected this to work out-of-box.

Looking closely to said ticket, it seems that ECR would accept only application/vnd.oci.image.manifest.v1+json as a manifest mediatype, while BuildKit (and, transitively, earthly) use application/vnd.oci.image.index.v1+json by default.

I tried building v0.7.23 from source with this patch on builder/solver.go, and ECR happily accepted push.

diff --git a/builder/solver.go b/builder/solver.go
index be22709a..6cba60ec 100644
--- a/builder/solver.go
+++ b/builder/solver.go
@@ -164,6 +164,8 @@ func newCacheImportOpt(ref string) client.CacheOptionsEntry {
 func newCacheExportOpt(ref string, max bool) client.CacheOptionsEntry {
        registryCacheOptAttrs := make(map[string]string)
        registryCacheOptAttrs["ref"] = ref
+       registryCacheOptAttrs["oci-mediatypes"] = "true"
+       registryCacheOptAttrs["image-manifest"] = "true"
        if max {
                registryCacheOptAttrs["mode"] = "max"
        }

I'm not sure that this solution is a proper way to go, maybe other registry implementations would not like it that way.
I did not find a way to work around this without rebuilding earhtly client executable.

Logs from buildkitd and command output were not very helpful, all basically saying the same, 400 Bad Request without firther explanation. Maybe that's on ECR, did not check all the way here.

@mcheshkov mcheshkov added the type:bug Something isn't working label Jan 16, 2024
@mcheshkov
Copy link
Author

mcheshkov commented Mar 15, 2024

Well, now documentation says that AWS ECR supports explicit cache
https://docs.earthly.dev/docs/caching/caching-via-registry

I'm still receiving same 400 Bad Request when trying to push cache.

Documentation was changed at #3814, with a reference to https://aws.amazon.com/blogs/containers/announcing-remote-cache-support-in-amazon-ecr-for-buildkit-clients/
But that exact article points that one should use image-manifest=true,oci-mediatypes=true to use BuildKit remote cache.

BuildKit introduced the ability to export remote caches in 2020. ... However, the format used for storing these caches was not an Open Containers Initiative (OCI) type. Amazon ECR is an OCI-compliant registry, which means that pushing this remote cache format to Amazon ECR resulted in a validation failure.

The new context key introduced in Buildkit 0.12 here is image-manifest. Setting this key’s value to true lets you now store an OCI-compatible version of a remote cache in the registry. We also setting oci-mediatypes to true since that’s required in order to use image-manifest.

cc @tontinton @alexcb from documentation PR: did you check that it actually works? And, if so, can you help me to debug this?

@alexcb
Copy link
Collaborator

alexcb commented Mar 15, 2024

We recently had a PR submitted (and merged) for this: #3869

it hasn't been released yet, but you could give it a try using our pre-release binaries from https://github.com/earthly/earthly-staging/releases/tag/v0.1710273402.818533702

@mcheshkov
Copy link
Author

Thanks for quick response!

$ earthly --version
earthly version v0.1710273402.818533702 30c9d546d8fb1f8cbc97ba0a18546916e1570fde linux/amd64; Fedora Linux 39.20240313.0 (Silverblue)

On this version earthly --remote-cache=123.dkr.ecr.eu-central-1.amazonaws.com/build-cache:mcheshkov-explicit-cache --max-remote-cache --push ... still fails, as expected
But earthly --remote-cache=123.dkr.ecr.eu-central-1.amazonaws.com/build-cache:mcheshkov-explicit-cache,image-manifest=true,oci-mediatypes=true --max-remote-cache --push ... succeded!

I'm still checking if it works fine with complex builds, but at least mediatype issue looks resolved. Looking forward to release with fixes

@alexcb
Copy link
Collaborator

alexcb commented Mar 15, 2024

awesome! thanks for confirming that worked once you added in the extra attributes. We'll likely get a new release out next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

2 participants