Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: readlink /var/lib/containers/storage/overlay: invalid argument after commit + hard reset (unflushed data) #1064

Open
martinetd opened this issue Nov 18, 2021 · 4 comments
Assignees

Comments

@martinetd
Copy link

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

I'm aware running podman commit then pulling the power is not supported, but I'm hoping the most obvious errors can be squashed if traced down precisely enough.
If these issues are not enough I can afford some time to make it into PRs, but someone familiar with the code will probably fix it much faster than myself so I'm thinking my time is better spent finding more issues. Please tell me if I'm wrong.

Steps to reproduce the issue:

  1. run some container e.g. podman run -ti --name test --replace localhost/test and change something inside

  2. commit and pull the plug: podman commit test test, echo c > /proc/sysrq-trigger

  3. try to list/run/inspect new image after reboot

Describe the results you received:

armadillo:~# podman image inspect test
Error: readlink /var/lib/containers/storage/overlay: invalid argument

Running with strace, it's because /var/lib/containers/storage/overlay/22796afe2874d426a1c46770f63c639030d993a6df09fb70909e675922f7e8c4/lower has been created with no data (empty file), so /var/lib/containers/storage/overlay/<content of lower file> is not found which normally leads to stat+readlink, except that readlink on a directory does not work when flie is empty.

(By the way, writting something that doesn't exist e.g. "foo" in the file leads podman to as it writes the invalid path to link file then loops reading link file and trying to check overlay/l/<link content> forever -- this probably warrants failing if overlay/l/ does not exist)

Describe the results you expected:

error should state image is invalid and suggest deleting it
(I don't think it should delete anything automatically, but giving the user a clear command on how to move forward from there is probably good)

Output of podman version:

Version:      3.4.2
API Version:  3.4.2
Go Version:   go1.17.3
Git Commit:   d800e4f57e9388a4cd4e5d9321b4594f552bc5d7
Built:        Sun Nov 14 04:26:13 2021
OS/Arch:      linux/arm64
(same behaviour on 3.2.3)

Output of podman info --debug:

host:
  arch: arm64
  buildahVersion: 1.23.1
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - pids
  - rdma
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: Unknown
    path: /usr/bin/conmon
    version: 'conmon version 2.0.30, commit: 4ab06de98f48513b22602d23d59d63b29e75824b'
  cpus: 4
  distribution:
    distribution: alpine
    version: 3.15.0_rc3
  eventLogger: file
  hostname: armadillo
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.10.52-00112-gec168b956610-dirty
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 1250312192
  memTotal: 1741979648
  ociRuntime:
    name: crun
    package: Unknown
    path: /usr/bin/crun
    version: |-
      crun version 1.3
      commit: 4f6c8e0583c679bfee6a899c05ac6b916022561b
      spec: 1.0.0
      +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: Unknown
    version: |-
      slirp4netns version 1.1.12
      commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 0
  swapTotal: 0
  uptime: 19m 37.46s
plugins:
  log:
  - k8s-file
  - none
  network:
  - bridge
  - macvlan
  volume:
  - local
registries: {}
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 0
    stopped: 1
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 3
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 3.4.2
  Built: 1636831573
  BuiltTime: Sun Nov 14 04:26:13 2021
  GitCommit: d800e4f57e9388a4cd4e5d9321b4594f552bc5d7
  GoVersion: go1.17.3
  OsArch: linux/arm64
  Version: 3.4.2

Package info (e.g. output of rpm -q podman or apt list podman):

# apk info podman
podman-3.4.2-r0 description:
Simple management tool for pods, containers and images

podman-3.4.2-r0 webpage:
https://podman.io/

podman-3.4.2-r0 installed size:
43 MiB

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)

No, yes

Additional environment details (AWS, VirtualBox, physical, etc.):

barebone embedded system

@martinetd
Copy link
Author

martinetd commented Nov 18, 2021

By the way, I think it'd make a lot of sense to issue a fssync() at the end of podman commit (or more precisely the part of podman commit that stores to disk and is shared with pull etc -- after messages Storing signatures and printing image id), that'd probably avoid most of the bug reports that came back to me because of similar issues -- what do you think?

@mheon
Copy link
Member

mheon commented Nov 18, 2021

@nalind PTAL - this one seems pretty bad, given that c/storage seems unusable?

I don't think this is Podman itself given that we don't make any changes to c/storage paths directly.

@rhatdan rhatdan transferred this issue from containers/podman Nov 18, 2021
@mtrmac
Copy link
Collaborator

mtrmac commented Dec 5, 2022

After #1351 and #1407 (and some earlier changes), a layer being committed should be, durably (with fdatasync) recorded as incomplete, and thus, a sudden reboot while creating the layer should cause the layer to be automatically removed on next Podman start.

Of course, if the commit succeeds, and durably records to the metadata that it succeeded, but the layer data like the lower file (which do not use fdatasync) is not written to disk by that time, this does not help: the layer will be recorded as fully-formed but the contents are corrupt.

@martinetd
Copy link
Author

Thank you!
I don't think we can do much about the content itself short of issuing a system-wide sync, and that has its own problems so I wouldn't suggest that. I'm fine as long as podman commands keep working.

I'll try to build a podman version with these and confirm, but we can probably close this and assume it's correct -- likely won't have time to test in the coming weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants