Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage collection does not delete data from cache, preventing reupload of deleted image #4269

Open
roblabla opened this issue Jan 31, 2024 · 2 comments

Comments

@roblabla
Copy link

roblabla commented Jan 31, 2024

Description

Currently, GC does not delete data from the redis cache. This can result in some very weird scenario where trying to reupload an image that was deleted will appear to succeed (the registry thinks it already has the files we're uploading, and so returns 201), but then trying to use it fails with a MANIFEST_UNKNOWN error as the file is not actually present.

Reproduce

We'll need a registry configured to use a redis cache. I used a filesystem storage with redis configuration for simplicity, but any backend storage will do, the problem stems from the cache.

Here's my configuration

version: 0.1
log:
  level: debug
  fields:
    service: registry
storage:
  filesystem:
    rootdirectory: data
    maxthreads: 100
  cache:
    layerinfo: redis
  maintenance:
    uploadpurging:
      enabled: true
      age: 168h
      interval: 24h
      dryrun: false
  delete:
    enabled: true
  redirect:
    disable: true
redis:
  addr: localhost:6379
  db: 2
  readtimeout: 10s
  writetimeout: 10s
  dialtimeout: 10s
  pool:
    maxidle: 100
    maxactive: 500
    idletimeout: 60s
http:
  secret: test123
  addr: :5100
  relativeurls: false
  debug:
    addr: localhost:5101
validation:
  disabled: true
compatibility:
  schema1:
    enabled: true

here's the reproducer:

# First, start the registry and put it in the background.
./bin/registry serve config.yaml &

# Next, upload an image. I used skopeo for this
skopeo --insecure-policy copy --dest-tls-verify=false "docker://alpine:3.19.1" "docker://localhost:5100/alpine:3.19.1" --multi-arch all

# We can verify that the push worked by downloading its manifest. So far so good.
curl -H 'Accept:application/vnd.docker.distribution.manifest.v2+json' http://localhost:5100/v2/alpine/manifests/3.19.1

# Next up, let's delete the manifest.
curl -XDELETE http://localhost:5100/v2/alpine/manifests/3.19.1

# And run garbage collection. This will actually delete the data from the blobstore. However, the cache won't be cleaned!
./bin/registry garbage-collect config.yaml -m

# And finally, let's try pushing the same image again
skopeo --insecure-policy copy --dest-tls-verify=false "docker://alpine:3.19.1" "docker://localhost:5100/alpine:3.19.1" --multi-arch all

# And download the manifest again. This will fail with the MANIFEST_UNKNOWN error.
curl -H 'Accept:application/vnd.docker.distribution.manifest.v2+json' http://localhost:5100/v2/alpine/manifests/3.19.1

Expected behavior

I expect pushing the image to actually work. Either pushing should peer through the cache and check whether it is up-to-date, or garbage-collection should remove the keys from the cache.

registry version

Tested both on 2.8.3 and main branch (9b3eac8)

./bin/registry github.com/distribution/distribution/v3 v3.0.0-alpha.1.m+unknown

Additional Info

This was discovered via harbor's retention policy. Harbor uses distribution as the underlying backend for its storage. When it applies its retention policy, it deletes the manifests, and automatically runs garbage-collection to free up space, triggering this bug.

@dreamerkr
Copy link

I also encountered this problem, how to solve it?

roblabla added a commit to roblabla/harbor-helm that referenced this issue Feb 15, 2024
There are bugs in the registry with regards to cache invalidation when
deleting an image that can lead to some really problematic issues. See
distribution/distribution#4269.

To work around that problem, add a new field to allow disabling redis in
the registry.
roblabla added a commit to roblabla/harbor-helm that referenced this issue Feb 15, 2024
There are bugs in the registry with regards to cache invalidation when
deleting an image that can lead to some really problematic issues. See
distribution/distribution#4269.

To work around that problem, add a new field to allow disabling redis in
the registry.

Signed-off-by: roblabla <unfiltered@roblab.la>
@Vad1mo
Copy link

Vad1mo commented Feb 15, 2024

I think this might be related to PR #3323

roblabla added a commit to roblabla/harbor-helm that referenced this issue Feb 22, 2024
There are bugs in the registry with regards to cache invalidation when
deleting an image that can lead to some really problematic issues. See
distribution/distribution#4269.

To work around that problem, add a new field to allow disabling redis in
the registry.

Signed-off-by: roblabla <unfiltered@roblab.la>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants