Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use inline cache: invalid layer index #1635

Closed
andyli opened this issue Feb 6, 2022 · 11 comments · Fixed by earthly/buildkit-old-fork#71 or #1678
Closed

Unable to use inline cache: invalid layer index #1635

andyli opened this issue Feb 6, 2022 · 11 comments · Fixed by earthly/buildkit-old-fork#71 or #1678
Assignees
Labels
type:bug Something isn't working

Comments

@andyli
Copy link

andyli commented Feb 6, 2022

My project just adopted earthly. Its repo is over https://github.com/hkssprangers/hkssprangers
The GitHub Actions build failed to use inline cache, complaining about invalid layer index.

The Earthfile at the the of writing: https://github.com/hkssprangers/hkssprangers/blob/61cdab2ac66969ad42dfb0ed897b4f61ff5d9134/Earthfile

The cache image in use (was tagged as master):
ghcr.io/hkssprangers/hkssprangers_devcontainer:4ec6d5284fb13000a27bc6da9de8f9332a1afa30

Build log:

  earthly +ci-images --GIT_REF_NAME="master" --GIT_SHA="95cff9a9721c911130a449ad08f8e66364cd8889"
  shell: /usr/bin/bash -e {0}
  env:
    TZ: Asia/Hong_Kong
    EARTHLY_USE_INLINE_CACHE: true
    EARTHLY_SAVE_INLINE_CACHE: true
    EARTHLY_STRICT: true
    EARTHLY_PUSH: true
    FORCE_COLOR: 1
           bootstrap | Bootstrapping successful.

 1. Init 🚀
————————————————————————————————————————————————————————————————————————————————

           buildkitd | Starting buildkit daemon as a docker container (earthly-buildkitd)...
           buildkitd | ...Done


 2. Build 🔧
————————————————————————————————————————————————————————————————————————————————

  m/v/d/base:0-focal | --> Load metadata linux/amd64
               cache | --> importing cache manifest from ghcr.io/hkssprangers/hkssprangers_devcontainer:95cff9a9721c911130a449ad08f8e66364cd8889
               cache | WARN: (importing cache manifest from ghcr.io/hkssprangers/hkssprangers_devcontainer:95cff9a9721c911130a449ad08f8e66364cd8889) ghcr.io/hkssprangers/hkssprangers_devcontainer:95cff9a9721c911130a449ad08f8e66364cd8889: not found
               cache | --> importing cache manifest from ghcr.io/hkssprangers/hkssprangers_devcontainer:master
               cache | WARN: (importing cache manifest from ghcr.io/hkssprangers/hkssprangers_devcontainer:master) invalid layer index 29
             context | --> local context .
             context | --> local context .
...
@vladaionescu
Copy link
Member

vladaionescu commented Feb 7, 2022

Hi @andyli, I don't see any obvious issue at first glance. This could be a buildkit bug.

Is this something that happened that one time only, or does it happen consistently every time?

@andyli
Copy link
Author

andyli commented Feb 8, 2022

It happens everytime.
The same cache warning msg can be reproduced locally with --ci.

@nschmeller
Copy link

nschmeller commented Feb 8, 2022

EDIT: I resolved this problem on my own, I don't think it was related to this bug. When I used a different Docker Hub org like my personal account (instead of company org), the problem vanished. My guess is that the company org has a setting turned on that my personal org doesn't. I'll keep this info here just in case though!


Debugging info Hi, I'm also seeing something that could be related to this bug! It seems a little more specific, though.

I'm pulling several images using WITH DOCKER --pull .... Then on the RUN invocation, I'm running docker tag several times. A snippet from my Earthfile:

    WITH DOCKER \
            --pull lacework/datacollector-private:${FULL_VERSION}-amd64 \
            --pull lacework/datacollector-private:${FULL_VERSION}-arm64
        RUN \
            docker tag lacework/datacollector-private:${FULL_VERSION}-amd64 lacework/datacollector-nightly:nightly-amd64 <line delimiter>
            docker tag lacework/datacollector-private:${FULL_VERSION}-arm64 lacework/datacollector-nightly:nightly-arm64
    ...
    END

Then I get the failure with

               cache | --> importing cache manifest from lacework/datacollector-nightly:nightly-amd64
               cache | WARN: (importing cache manifest from lacework/datacollector-nightly:nightly-amd64) docker.io/lacework/datacollector-nightly:nightly-amd64: not found
               cache | --> importing cache manifest from lacework/datacollector-nightly:nightly-arm64
               cache | WARN: (importing cache manifest from lacework/datacollector-nightly:nightly-arm64) docker.io/lacework/datacollector-nightly:nightly-arm64: not found
...
  +publish-dockerhub | /usr/bin/dockerd
             context | transferred 1 file(s) for context /tmp/earthly-docker-load569855165 (16 MB, 1 file/dir stats)
             context | transferred 1 file(s) for context /tmp/earthly-docker-load248344364 (36 MB, 1 file/dir stats)
...
  +publish-dockerhub | --> WITH DOCKER RUN --privileged docker tag lacework/datacollector-private:${FULL_VERSION}-amd64 ${DOCKER_ORG}/${PUBLISH_REPO}:${PREFIX}-amd64 && docker tag lacework/datacollector-private:${FULL_VERSION}-arm64 ${DOCKER_ORG}/${PUBLISH_REPO}:${PREFIX}-arm64
...
  +publish-dockerhub | Loading images...
  +publish-dockerhub | Loaded image: lacework/datacollector-private:5.3.0.7028-amd64
  +publish-dockerhub | Loaded image: lacework/datacollector-private:5.3.0.7028-arm64
  +publish-dockerhub | ...done
  +publish-dockerhub | no such manifest: docker.io/lacework/datacollector-nightly:nightly-amd64

Some debugging observations: I only get this error on GitHub Actions runners (running Ubuntu 20.04). When I run the same earthly invocation locally (also on Ubuntu 20.04), everything works great. Both my local machine and the GHA runners are on the latest version of Earthly, v0.6.6. I'm invoking earthly on both environments with the --ci flag.

Hope this helps, I'm very interested in figuring out what's going on here!

@vladaionescu vladaionescu added the type:bug Something isn't working label Feb 9, 2022
@cirego
Copy link
Contributor

cirego commented Feb 9, 2022

Hi @andyli, are you logged into ghcr.io using a personal access token? Does that token have read and write packages permissions? If I recall correctly, this behavior was changed recently and you may need to update your credentials. Maybe that will help?

access_token

@andyli
Copy link
Author

andyli commented Feb 9, 2022

I used secrets.GITHUB_TOKEN to log into ghcr.io. It has write permission - I can tell it as all the images in the ghcr.io repo were uploaded by GitHub Actions using the said token.

@sesgoe
Copy link

sesgoe commented Feb 16, 2022

@vladaionescu

https://github.com/sesgoe/earthly-invalid-cache-repro

I've got this one up and it seems to reliably reproduce this. This is a maddening little thing to find, whatever it ends up being.

@vladaionescu vladaionescu self-assigned this Feb 17, 2022
@vladaionescu
Copy link
Member

I was able to reproduce this on my computer. Thanks for the repro code! I'm attempting to fix this now. Some initial investigation seems to point to the writing side. There seems to be a bug in the inline cache config writer, which causes the reading side to spit out that error.

@vladaionescu
Copy link
Member

The issue seems to be related to this piece of the image writer code: https://github.com/moby/buildkit/blob/master/exporter/containerimage/writer.go#L479-L481.

It seems that if the image writer finds an empty layer, it removes it from the descriptor list, however, the indexes referenced from the inline cache manifest aren't updated after that. This leads to indexes that end up being out of bounds sometimes (and possibly other cache inefficiencies that are less obvious). An easy fix could be to simply remove that little optimization for now.

@vladaionescu
Copy link
Member

It seems that the only affected builds were the ones that contained RUN (maybe others too?) commands that have no effect on the root file system. e.g. RUN ls, RUN echo ... etc, which result in an empty layer, which buildkit attempts to optimize in an incorrect manner.

@sesgoe
Copy link

sesgoe commented Feb 17, 2022

@andyli @nschmeller

As my fellow "you also got hit by this bug" comrades, I wanted to share some tips to get you unblocked until this fix gets pulled into buildkit.

I saw this comment this morning on Vlad's PR into buildkit: moby/buildkit#2651 (comment)

Which references this buildkit bug: moby/buildkit#2551

So I examined my Earthfile this morning for "empty layer" commands in an attempt to fix my caching issues, and I have successfully un-bugged myself for now. I have determined (minimally for now) that at least these commands will break your cache because they can produce "empty" (optimized-away) layers:

WORKDIR
RUN find ...

@andyli your initial WORKDIR up near the base of your Earthfile, here:
https://github.com/hkssprangers/hkssprangers/blob/61cdab2ac66969ad42dfb0ed897b4f61ff5d9134/Earthfile#L14

Might be the source of the issue for now.

Just wanted to let you both know in case something like this unblocks you. Have a great rest of your week!

@vladaionescu
Copy link
Member

The fix for this has now been released in v0.6.9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working
Projects
None yet
5 participants