Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rook-ceph-agent CrashLookBackOff on newly joined node -- container log shows help usage? #9231

Closed
zestysoft opened this issue Nov 23, 2021 · 2 comments
Labels

Comments

@zestysoft
Copy link

zestysoft commented Nov 23, 2021

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:
rook-ceph-agent's status is CrashLoopBackOff

Expected behavior:
Pod starts successfully, or shouldn't run on node if not necessary.

How to reproduce it (minimal and precise):
Joined kubernetes node to cluster

File(s) to submit:

  • Cluster CR (custom resource), typically called cluster.yaml, if necessary
    Diff of custer.yaml file from release-1.7 (pulled today so 1.7.8?)
211,212c211,212
<     useAllNodes: true
<     useAllDevices: true
---
>     useAllNodes: false
>     useAllDevices: false
234a235,256
>     nodes:
>     # - name: "k8s-node-3"
>     #  devices: # specific devices to use for storage can be specified for each node
>     #  - name: "/dev/disk/by-id/ata-WDC_WD60EFRX-68MYMN1_WD-WX91D65DC7A9"
>     - name: "k8s-master-1"
>       devices:
>       - name: "/dev/disk/by-id/wwn-0x3044564198838280"
>     - name: "k8s-master-2"
>       devices:
>       - name: "/dev/disk/by-id/wwn-0x3044564198838280"
>     - name: "k8s-master-3"
>       devices:
>       - name: "/dev/disk/by-id/wwn-0x3044564198838280"
>     - name: "k8s-node-1"
>       devices:
>       - name: "/dev/disk/by-id/wwn-0x3044564198839350"
>     - name: "k8s-node-2"
>       devices:
>       - name: "/dev/disk/by-id/usb-SSK_SSK_Storage_DD56419883935-0:0"
>     - name: "k8s-node-3"
>       devices:
>       - name: "/dev/disk/by-id/wwn-0x3044564198839350"
  • Operator's logs, if necessary
  • Crashing pod(s) logs, if necessary
    kubectl -n rook-ceph logs rook-ceph-agent-828t7 on k8s-node-4 (if it matters, k8s-node-4 is not one of the nodes listed in the cluster.yaml file above):
Main command for Ceph operator and daemons.

Usage:
  rook ceph [command]

Available Commands:
  clean       Starts the cleanup process on the disks after ceph cluster is deleted
  config-init Generates basic Ceph config
  mgr
  operator    Runs the Ceph operator for orchestrating and managing Ceph storage in a Kubernetes cluster
  osd         Provisions and runs the osd daemon

Flags:
  -h, --help   help for ceph

Global Flags:
      --log-level string         logging level for logging/tracing output (valid values: ERROR,WARNING,INFO,DEBUG) (default "INFO")
      --operator-image string    Override the image url that the operator uses. The default is read from the operator pod.
      --service-account string   Override the service account that the operator uses. The default is read from the operator pod.

Use "rook ceph [command] --help" for more information about a command.

Environment:

  • OS (e.g. from /etc/os-release):
    Ubuntu 20.04.3 LTS

  • Kernel (e.g. uname -a):
    Linux k8s-node-4 5.4.0-1046-raspi #50-Ubuntu SMP PREEMPT Thu Oct 28 05:32:10 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux

  • Cloud provider or hardware configuration:
    Raspberry Pi 4 8GB

  • Rook version (use rook version inside of a Rook Pod):
    From one of the working agents:
    k -n rook-ceph exec rook-ceph-agent-4f25q -it -- rook version
    rook: v1.5.0-alpha.0.590.g2a19230
    go: go1.15.7

This surprised me since I thought I had upgraded to pacific some time ago. The rook-ceph-agent daemonset shows the pod image as rook-ceph-agent:master, but oddly I can't find the yaml file in this github repository that creates this daemonset?

  • Storage backend version (e.g. for ceph do ceph -v):
    From the toolbox:
    ceph version 16.2.6 (ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific (stable)
    From one of the ubuntu nodes:
    ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)
    I assume Ubuntu just has an older apt package -- is it possible to run this from one of the running rook-ceph pods?

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.3", GitCommit:"c92036820499fedefec0f847e2054d824aea6cd1", GitTreeState:"clean", BuildDate:"2021-10-27T18:34:20Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"darwin/arm64"}
    Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.3", GitCommit:"c92036820499fedefec0f847e2054d824aea6cd1", GitTreeState:"clean", BuildDate:"2021-10-27T18:35:25Z", GoVersion:"go1.16.9", Compiler:"gc", Platform:"linux/arm64"}

  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift):
    k8sadmin

  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):

  cluster:
    id:     3d957ef3-6713-4032-9aa3-88958dd2cb5f
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum o,p,q (age 4h)
    mgr: a(active, since 2w)
    mds: 1/1 daemons up, 1 hot standby
    osd: 6 osds: 6 up (since 18m), 6 in (since 5d)

  data:
    volumes: 1/1 healthy
    pools:   4 pools, 193 pgs
    objects: 190.40k objects, 731 GiB
    usage:   2.1 TiB used, 3.4 TiB / 5.5 TiB avail
    pgs:     193 active+clean

  io:
    client:   852 B/s rd, 18 KiB/s wr, 1 op/s rd, 2 op/s wr

Versions of all the container images currently running from deployments:

k -n rook-ceph get deployments --no-headers|cut -d ' ' -f 1|xargs -I {} sh -c 'echo {};kubectl -n rook-ceph describe deployment {}|grep Image:'
csi-cephfsplugin-provisioner
    Image:      k8s.gcr.io/sig-storage/csi-attacher:v3.3.0
    Image:      k8s.gcr.io/sig-storage/csi-snapshotter:v4.2.0
    Image:      k8s.gcr.io/sig-storage/csi-resizer:v1.3.0
    Image:      k8s.gcr.io/sig-storage/csi-provisioner:v3.0.0
    Image:      quay.io/cephcsi/cephcsi:v3.4.0
    Image:      quay.io/cephcsi/cephcsi:v3.4.0
csi-rbdplugin-provisioner
    Image:      k8s.gcr.io/sig-storage/csi-provisioner:v3.0.0
    Image:      k8s.gcr.io/sig-storage/csi-resizer:v1.3.0
    Image:      k8s.gcr.io/sig-storage/csi-attacher:v3.3.0
    Image:      k8s.gcr.io/sig-storage/csi-snapshotter:v4.2.0
    Image:      quay.io/cephcsi/cephcsi:v3.4.0
    Image:      quay.io/cephcsi/cephcsi:v3.4.0
rook-ceph-crashcollector-k8s-master-1
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
rook-ceph-crashcollector-k8s-master-2
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
rook-ceph-crashcollector-k8s-master-3
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
rook-ceph-crashcollector-k8s-node-1
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
rook-ceph-crashcollector-k8s-node-2
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
rook-ceph-crashcollector-k8s-node-3
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
rook-ceph-mds-filesystem-a
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
rook-ceph-mds-filesystem-b
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
rook-ceph-mgr-a
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:       quay.io/ceph/ceph:v16.2.6
rook-ceph-mon-o
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
rook-ceph-mon-p
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
rook-ceph-mon-q
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
rook-ceph-operator
    Image:      rook/ceph:v1.7.8
rook-ceph-osd-1
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
rook-ceph-osd-2
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
rook-ceph-osd-4
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
rook-ceph-osd-5
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
rook-ceph-osd-6
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
rook-ceph-osd-7
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
    Image:      quay.io/ceph/ceph:v16.2.6
rook-ceph-tools
    Image:      rook/ceph:v1.7.8

same but for daemonsets:

k -n rook-ceph get daemonsets.apps --no-headers|cut -d ' ' -f 1|xargs -I {} sh -c 'echo {};kubectl -n rook-ceph describe daemonsets.apps {}|grep Image:'

csi-cephfsplugin
    Image:      k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.3.0
    Image:      quay.io/cephcsi/cephcsi:v3.4.0
    Image:      quay.io/cephcsi/cephcsi:v3.4.0
csi-rbdplugin
    Image:      k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.3.0
    Image:      quay.io/cephcsi/cephcsi:v3.4.0
    Image:      quay.io/cephcsi/cephcsi:v3.4.0
rook-ceph-agent
    Image:      rook/ceph:master
@zestysoft zestysoft added the bug label Nov 23, 2021
@zestysoft
Copy link
Author

zestysoft commented Nov 23, 2021

I manually edited the daemonset and changed the version from master to v1.7.8 to fix this. I just wish I remember how I installed it to begin with since the quickstart doesn't show these pods getting created and I can't seem to find any yaml file in the repo (maybe it was deprecated and just doesn't show for 1.7.8 release?). Do I even need it? I am using a FlexVolume for cifs, so maybe it came from instructions from somewhere else.

@travisn
Copy link
Member

travisn commented Nov 23, 2021

@zestysoft The rook agent (flex driver) no longer exists in master, it is being removed in Rook v1.8 due out in a couple weeks. See #8076 for more details. All Rook flex volumes will need to be converted to csi before upgrading to v1.8. A tool is just being finished that will help with the migration. It's not quite ready yet, but you can track its progress with #9222 and https://github.com/ceph/persistent-volume-migrator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants