Operator error after K8s master token changed #9314

slow-zhang · 2021-12-06T02:09:49Z

Bug Report

Expected behavior:
no-error

How to reproduce it (minimal and precise):

helm deploy a cluster
change k8s master's token

log

root@lasvai-staging-capos-k8s-master01:~/manifests# kubectl logs rook-ceph-operator-7475fbb785-5g42c -n rook-ceph-admin
2021-12-02 08:21:33.574197 I | op-flags: failed to set flag "logtostderr". no such flag -logtostderr
2021-12-02 08:21:33.574400 I | rookcmd: starting Rook v1.7.5 with arguments '/usr/local/bin/rook ceph operator'
2021-12-02 08:21:33.574408 I | rookcmd: flag values: --csi-cephfs-plugin-template-path=/etc/ceph-csi/cephfs/csi-cephfsplugin.yaml, --csi-cephfs-provisioner-dep-template-path=/etc/ceph-csi/cephfs/csi-cephfsplugin-provisioner-dep.yaml, --csi-rbd-plugin-template-path=/etc/ceph-csi/rbd/csi-rbdplugin.yaml, --csi-rbd-provisioner-dep-template-path=/etc/ceph-csi/rbd/csi-rbdplugin-provisioner-dep.yaml, --enable-machine-disruption-budget=false, --help=false, --kubeconfig=, --log-level=INFO, --operator-image=, --service-account=
2021-12-02 08:21:33.574412 I | cephcmd: starting Rook-Ceph operator
2021-12-02 08:21:33.843956 I | cephcmd: base ceph version inside the rook operator image is "ceph version 16.2.6 (ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific (stable)"
2021-12-02 08:21:33.849634 I | op-k8sutil: ROOK_CEPH_COMMANDS_TIMEOUT_SECONDS="15" (env var)
2021-12-02 08:21:33.851843 I | op-k8sutil: ROOK_ENABLE_DISCOVERY_DAEMON="false" (env var)
2021-12-02 08:21:33.854409 I | operator: looking for secret "rook-ceph-admission-controller"
2021-12-02 08:21:33.856280 I | operator: secret "rook-ceph-admission-controller" not found. proceeding without the admission controller
2021-12-02 08:21:33.858798 I | op-k8sutil: ROOK_ENABLE_FLEX_DRIVER="false" (env var)
2021-12-02 08:21:33.858809 I | operator: watching all namespaces for ceph cluster CRs
2021-12-02 08:21:33.859082 I | operator: setting up the controller-runtime manager
I1202 08:21:34.910370       6 request.go:668] Waited for 1.047258479s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/cert-manager.io/v1?timeout=32s
2021-12-02 08:21:39.063150 I | ceph-cluster-controller: successfully started
2021-12-02 08:21:39.063218 I | ceph-cluster-controller: enabling hotplug orchestration
2021-12-02 08:21:39.063237 I | ceph-crashcollector-controller: successfully started
2021-12-02 08:21:39.063301 I | ceph-block-pool-controller: successfully started
2021-12-02 08:21:39.063357 I | ceph-object-store-user-controller: successfully started
2021-12-02 08:21:39.063408 I | ceph-object-realm-controller: successfully started
2021-12-02 08:21:39.063453 I | ceph-object-zonegroup-controller: successfully started
2021-12-02 08:21:39.063497 I | ceph-object-zone-controller: successfully started
2021-12-02 08:21:39.063608 I | ceph-object-controller: successfully started
2021-12-02 08:21:39.063674 I | ceph-file-controller: successfully started
2021-12-02 08:21:39.063727 I | ceph-nfs-controller: successfully started
2021-12-02 08:21:39.063782 I | ceph-rbd-mirror-controller: successfully started
2021-12-02 08:21:39.063836 I | ceph-client-controller: successfully started
2021-12-02 08:21:39.063881 I | ceph-filesystem-mirror-controller: successfully started
2021-12-02 08:21:39.064670 I | operator: starting the controller-runtime manager
2021-12-02 08:21:39.405020 I | clusterdisruption-controller: create event from ceph cluster CR
2021-12-02 08:21:39.706431 I | ceph-cluster-controller: reconciling ceph cluster in namespace "rook-ceph"
2021-12-02 08:21:39.708877 E | ceph-block-pool-controller: failed to reconcile. failed to populate cluster info: not expected to create new cluster info and did not find existing secret
2021-12-02 08:21:39.708903 I | op-k8sutil: ROOK_ENABLE_FLEX_DRIVER="false" (env var)
2021-12-02 08:21:39.709169 E | ceph-file-controller: failed to reconcile failed to populate cluster info: not expected to create new cluster info and did not find existing secret
2021-12-02 08:21:39.709397 E | ceph-object-controller: failed to reconcile CephObjectStore "rook-ceph/ceph-objectstore". failed to populate cluster info: not expected to create new cluster info and did not find existing secret
2021-12-02 08:21:39.709408 I | op-k8sutil: Reporting Event rook-ceph:ceph-objectstore Warning:ReconcileFailed:failed to populate cluster info: not expected to create new cluster info and did not find existing secret
2021-12-02 08:21:39.710289 I | clusterdisruption-controller: deleted all legacy blocking PDBs for osds
2021-12-02 08:21:39.715906 E | ceph-file-controller: failed to reconcile failed to populate cluster info: not expected to create new cluster info and did not find existing secret
2021-12-02 08:21:39.716297 E | ceph-block-pool-controller: failed to reconcile. failed to populate cluster info: not expected to create new cluster info and did not find existing secret
2021-12-02 08:21:39.723559 I | clusterdisruption-controller: deleted all legacy node drain canary pods
2021-12-02 08:21:39.724154 E | ceph-object-controller: failed to reconcile CephObjectStore "rook-ceph/ceph-objectstore". failed to populate cluster info: not expected to create new cluster info and did not find existing secret
2021-12-02 08:21:39.725882 I | ceph-csi: successfully created csi config map "rook-ceph-csi-config"
2021-12-02 08:21:39.804723 E | ceph-file-controller: failed to reconcile failed to populate cluster info: not expected to create new cluster info and did not find existing secret
2021-12-02 08:21:39.909389 E | ceph-block-pool-controller: failed to reconcile. failed to populate cluster info: not expected to create new cluster info and did not find existing secret
2021-12-02 08:21:40.111492 E | ceph-object-controller: failed to reconcile CephObjectStore "rook-ceph/ceph-objectstore". failed to populate cluster info: not expected to create new cluster info and did not find existing secret
2021-12-02 08:21:40.221997 I | clusterdisruption-controller: Ceph "rook-ceph" cluster not ready, cannot check Ceph status yet.
2021-12-02 08:21:40.309740 I | ceph-cluster-controller: clusterInfo not yet found, must be a new cluster
2021-12-02 08:21:40.321066 I | ceph-cluster-controller: enabling ceph mon monitoring goroutine for cluster "rook-ceph"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x78 pc=0x1894b82]

goroutine 992 [running]:
github.com/rook/rook/pkg/operator/ceph/cluster/osd.NewOSDHealthMonitor(0xc0006b3180, 0x0, 0xc001b88700, 0x0, 0xc001590658, 0x0, 0x0, 0x0, 0xc001590660, 0x0, ...)
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/osd/health.go:64 +0xc2
github.com/rook/rook/pkg/operator/ceph/cluster.(*ClusterController).startMonitoringCheck(0xc0001ccbb0, 0xc000b81800, 0x0, 0x21d33d2, 0x3)
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/monitoring.go:120 +0x125
github.com/rook/rook/pkg/operator/ceph/cluster.(*ClusterController).configureCephMonitoring(0xc0001ccbb0, 0xc000b81800, 0x0)
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/monitoring.go:67 +0x5ae
github.com/rook/rook/pkg/operator/ceph/cluster.(*ClusterController).initializeCluster(0xc0001ccbb0, 0xc000b81800, 0x221a96c, 0x28)
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/cluster.go:205 +0x65b
github.com/rook/rook/pkg/operator/ceph/cluster.(*ClusterController).reconcileCephCluster(0xc0001ccbb0, 0xc000a2a900, 0xc00090eea0, 0xc000a2a900, 0x0)
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/controller.go:394 +0x2f0
github.com/rook/rook/pkg/operator/ceph/cluster.(*ReconcileCephCluster).reconcile(0xc00089e420, 0xc0008318e0, 0x9, 0xc0008318c0, 0x9, 0xc000780400, 0x0, 0xc0012ecd80, 0x40e0f8, 0x30)
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/controller.go:272 +0x3e9
github.com/rook/rook/pkg/operator/ceph/cluster.(*ReconcileCephCluster).Reconcile(0xc00089e420, 0x2587018, 0xc00090ec30, 0xc0008318e0, 0x9, 0xc0008318c0, 0x9, 0xc00090ec30, 0xc00003a000, 0x2029de0, ...)
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/cluster/controller.go:227 +0x74
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0007a6b40, 0x2586f70, 0xc000892940, 0x1fc2780, 0xc001710860)
	/home/rook/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:298 +0x30d
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0007a6b40, 0x2586f70, 0xc000892940, 0x0)
	/home/rook/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:253 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2(0xc000796360, 0xc0007a6b40, 0x2586f70, 0xc000892940)
	/home/rook/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:214 +0x6b
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
	/home/rook/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:210 +0x425

The text was updated successfully, but these errors were encountered:

travisn · 2021-12-06T18:31:03Z

Looks like this is crashing on this line:

logger.Infof("ceph osd status in namespace %q check interval %q", h.clusterInfo.Namespace, checkInterval.Duration.String())

@leseb Do you see how clusterInfo or checkInterval could be ni?

We should always return a nil pointer of clusterInfo if CreateOrLoadClusterInfo() returns an error. Closes: rook#9314 Signed-off-by: Sébastien Han <seb@redhat.com>

leseb · 2021-12-08T10:27:13Z

I've attempted a patch here #9347 but we need more details, like the CephCluster CR for instance. @slow-zhang can you please provide that? Thanks

slow-zhang · 2021-12-08T11:20:22Z

I am not sure what is the CR means? the rook ceph cluster config? the env is empty because I redeployed the cluster

leseb · 2021-12-08T11:23:01Z

The CR is the Custom Resource, the CephCluster you injected, with https://github.com/rook/rook/blob/master/deploy/examples/cluster.yaml?

slow-zhang · 2021-12-08T11:29:42Z

i deploy it by helm, here is my value file. the most params are default. maybe I only add some taint and label selector

operatorNamespace: rook-ceph-admin

# The metadata.name of the CephCluster CR. The default name is the same as the namespace.
# clusterName: rook-ceph

# Ability to override ceph.conf
# configOverride: |
#   [global]
#   mon_allow_pool_delete = true
#   osd_pool_default_size = 3
#   osd_pool_default_min_size = 2

# Installs a debugging toolbox deployment
toolbox:
  enabled: true
  image: rook/ceph:v1.7.5
  tolerations: []
  affinity: {}

monitoring:
  # requires Prometheus to be pre-installed
  # enabling will also create RBAC rules to allow Operator to create ServiceMonitors
  enabled: false # jingzhang: disable and add a ServiceMonitors in monitoring namespace
  rulesNamespaceOverride:

cephFileSystems:
  - name: ceph-filesystem
    # see https://github.com/rook/rook/blob/master/Documentation/ceph-filesystem-crd.md#filesystem-settings for available configuration
    spec:
      metadataPool:
        replicated:
          size: 3
      dataPools:
        - failureDomain: host
          replicated:
            size: 3
      metadataServer:
        activeCount: 1
        activeStandby: true
    storageClass:
      enabled: false


# imagePullSecrets option allow to pull docker images from private docker registry. Option will be passed to all service accounts.
# imagePullSecrets:
# - name: my-registry-secret

# All values below are taken from the CephCluster CRD
# More information can be found at [Ceph Cluster CRD](/Documentation/ceph-cluster-crd.md)
cephClusterSpec:
  cephVersion:
    # The container image used to launch the Ceph daemon pods (mon, mgr, osd, mds, rgw).
    # v14 is nautilus, v15 is octopus, and v16 is pacific.
    # RECOMMENDATION: In production, use a specific version tag instead of the general v14 flag, which pulls the latest release and could result in different
    # versions running within the cluster. See tags available at https://hub.docker.com/r/ceph/ceph/tags/.
    # If you want to be more precise, you can always use a timestamp tag such quay.io/ceph/ceph:v15.2.11-20200419
    # This tag might not contain a new Ceph version, just security fixes from the underlying operating system, which will reduce vulnerabilities
    image: quay.io/ceph/ceph:v16.2.6
    # Whether to allow unsupported versions of Ceph. Currently `nautilus` and `octopus` are supported.
    # Future versions such as `pacific` would require this to be set to `true`.
    # Do not set to true in production.
    allowUnsupported: false

  # The path on the host where configuration files will be persisted. Must be specified.
  # Important: if you reinstall the cluster, make sure you delete this directory from each host or else the mons will fail to start on the new cluster.
  # In Minikube, the '/data' directory is configured to persist across reboots. Use "/data/rook" in Minikube environment.
  dataDirHostPath: /var/lib/rook

  # Whether or not upgrade should continue even if a check fails
  # This means Ceph's status could be degraded and we don't recommend upgrading but you might decide otherwise
  # Use at your OWN risk
  # To understand Rook's upgrade process of Ceph, read https://rook.io/docs/rook/master/ceph-upgrade.html#ceph-version-upgrades
  skipUpgradeChecks: false

  # Whether or not continue if PGs are not clean during an upgrade
  continueUpgradeAfterChecksEvenIfNotHealthy: false

  # WaitTimeoutForHealthyOSDInMinutes defines the time (in minutes) the operator would wait before an OSD can be stopped for upgrade or restart.
  # If the timeout exceeds and OSD is not ok to stop, then the operator would skip upgrade for the current OSD and proceed with the next one
  # if `continueUpgradeAfterChecksEvenIfNotHealthy` is `false`. If `continueUpgradeAfterChecksEvenIfNotHealthy` is `true`, then opertor would
  # continue with the upgrade of an OSD even if its not ok to stop after the timeout. This timeout won't be applied if `skipUpgradeChecks` is `true`.
  # The default wait timeout is 10 minutes.
  waitTimeoutForHealthyOSDInMinutes: 10

  mon:
    # Set the number of mons to be started. Generally recommended to be 3.
    # For highest availability, an odd number of mons should be specified.
    count: 3
    # The mons should be on unique nodes. For production, at least 3 nodes are recommended for this reason.
    # Mons should only be allowed on the same node for test environments where data loss is acceptable.
    allowMultiplePerNode: false

  mgr:
    # When higher availability of the mgr is needed, increase the count to 2.
    # In that case, one mgr will be active and one in standby. When Ceph updates which
    # mgr is active, Rook will update the mgr services to match the active mgr.
    count: 2
    modules:
      # Several modules should not need to be included in this list. The "dashboard" and "monitoring" modules
      # are already enabled by other settings in the cluster CR.
      - name: pg_autoscaler
        enabled: true

  # enable the ceph dashboard for viewing cluster status
  dashboard:
    enabled: true
    # serve the dashboard under a subpath (useful when you are accessing the dashboard via a reverse proxy)
    # urlPrefix: /ceph-dashboard
    # serve the dashboard at the given port.
    # port: 8443

  # Network configuration, see: https://github.com/rook/rook/blob/master/Documentation/ceph-cluster-crd.md#network-configuration-settings
  network:
  #   # enable host networking
    provider: host
  #   # EXPERIMENTAL: enable the Multus network provider
  #   provider: multus
  #   selectors:
  #     # The selector keys are required to be `public` and `cluster`.
  #     # Based on the configuration, the operator will do the following:
  #     #   1. if only the `public` selector key is specified both public_network and cluster_network Ceph settings will listen on that interface
  #     #   2. if both `public` and `cluster` selector keys are specified the first one will point to 'public_network' flag and the second one to 'cluster_network'
  #     #
  #     # In order to work, each selector value must match a NetworkAttachmentDefinition object in Multus
  #     #
  #     # public: public-conf --> NetworkAttachmentDefinition object name in Multus
  #     # cluster: cluster-conf --> NetworkAttachmentDefinition object name in Multus
  #   # Provide internet protocol version. IPv6, IPv4 or empty string are valid options. Empty string would mean IPv4
  #   ipFamily: "IPv6"
  #   # Ceph daemons to listen on both IPv4 and Ipv6 networks
  #   dualStack: false

  # enable the crash collector for ceph daemon crash collection
  crashCollector:
    disable: false
    # Uncomment daysToRetain to prune ceph crash entries older than the
    # specified number of days.
    # daysToRetain: 30

  # enable log collector, daemons will log on files and rotate
  # logCollector:
  #   enabled: true
  #   periodicity: 24h # SUFFIX may be 'h' for hours or 'd' for days.

  # automate [data cleanup process](https://github.com/rook/rook/blob/master/Documentation/ceph-teardown.md#delete-the-data-on-hosts) in cluster destruction.
  cleanupPolicy:
    # Since cluster cleanup is destructive to data, confirmation is required.
    # To destroy all Rook data on hosts during uninstall, confirmation must be set to "yes-really-destroy-data".
    # This value should only be set when the cluster is about to be deleted. After the confirmation is set,
    # Rook will immediately stop configuring the cluster and only wait for the delete command.
    # If the empty string is set, Rook will not destroy any data on hosts during uninstall.
    confirmation: ""
    # sanitizeDisks represents settings for sanitizing OSD disks on cluster deletion
    sanitizeDisks:
      # method indicates if the entire disk should be sanitized or simply ceph's metadata
      # in both case, re-install is possible
      # possible choices are 'complete' or 'quick' (default)
      method: quick
      # dataSource indicate where to get random bytes from to write on the disk
      # possible choices are 'zero' (default) or 'random'
      # using random sources will consume entropy from the system and will take much more time then the zero source
      dataSource: zero
      # iteration overwrite N times instead of the default (1)
      # takes an integer value
      iteration: 1
    # allowUninstallWithVolumes defines how the uninstall should be performed
    # If set to true, cephCluster deletion does not wait for the PVs to be deleted.
    allowUninstallWithVolumes: false

  # To control where various services will be scheduled by kubernetes, use the placement configuration sections below.
  # The example under 'all' would have all services scheduled on kubernetes nodes labeled with 'role=storage-node' and
  # tolerate taints with a key of 'storage-node'.
  placement:
    all:
      # add lable: role=storage-node
      # cmd: kubectl label node <node> role=storage-node
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: role
              operator: In
              values:
              - storage-node
      # podAffinity:
      # podAntiAffinity:
      # topologySpreadConstraints:
      # add taint: node-role.kubernetes.io/ceph=true:NoSchedule
      # cmd: kubectl taint nodes <node> node-role.kubernetes.io/ceph=true:NoSchedule
      tolerations:
      - key: node-role.kubernetes.io/ceph
        operator: Exists
  #   # The above placement information can also be specified for mon, osd, and mgr components
  #   mon:
  #   # Monitor deployments may contain an anti-affinity rule for avoiding monitor
  #   # collocation on the same node. This is a required rule when host network is used
  #   # or when AllowMultiplePerNode is false. Otherwise this anti-affinity rule is a
  #   # preferred rule with weight: 50.
  #   osd:
  #   mgr:
  #   cleanup:

  # annotations:
  #   all:
  #   mon:
  #   osd:
  #   cleanup:
  #   prepareosd:
  #   # If no mgr annotations are set, prometheus scrape annotations will be set by default.
  #   mgr:

  # labels:
  #   all:
  #   mon:
  #   osd:
  #   cleanup:
  #   mgr:
  #   prepareosd:
  #   # monitoring is a list of key-value pairs. It is injected into all the monitoring resources created by operator.
  #   # These labels can be passed as LabelSelector to Prometheus
  #   monitoring:

  # resources:
  #   # The requests and limits set here, allow the mgr pod to use half of one CPU core and 1 gigabyte of memory
  #   mgr:
  #     limits:
  #       cpu: "500m"
  #       memory: "1024Mi"
  #     requests:
  #       cpu: "500m"
  #       memory: "1024Mi"
  #   # The above example requests/limits can also be added to the other components
  #   mon:
  #   osd:
  #   prepareosd:
  #   mgr-sidecar:
  #   crashcollector:
  #   logcollector:
  #   cleanup:

  # The option to automatically remove OSDs that are out and are safe to destroy.
  removeOSDsIfOutAndSafeToRemove: false

  # priority classes to apply to ceph resources
  # priorityClassNames:
  #   all: rook-ceph-default-priority-class
  #   mon: rook-ceph-mon-priority-class
  #   osd: rook-ceph-osd-priority-class
  #   mgr: rook-ceph-mgr-priority-class

  storage: # cluster level storage configuration and selection
    useAllNodes: true
    useAllDevices: true
    # deviceFilter:
    # config:
    #   crushRoot: "custom-root" # specify a non-default root label for the CRUSH map
    #   metadataDevice: "md0" # specify a non-rotational storage so ceph-volume will use it as block db device of bluestore.
    #   databaseSizeMB: "1024" # uncomment if the disks are smaller than 100 GB
    #   journalSizeMB: "1024"  # uncomment if the disks are 20 GB or smaller
    #   osdsPerDevice: "1" # this value can be overridden at the node or device level
    #   encryptedDevice: "true" # the default value for this option is "false"
    # # Individual nodes and their config can be specified as well, but 'useAllNodes' above must be set to false. Then, only the named
    # # nodes below will be used as storage resources.  Each node's 'name' field should match their 'kubernetes.io/hostname' label.
    # nodes:
    #   - name: "vai-k8s-ceph-03-worker-02"
    #     deviceFilter: "^sd[bd]"
    #   - name: "vai-k8s-ceph-03-worker-02"
    #     deviceFilter: "^sd[bd]"
    #   - name: "vai-k8s-ceph-03-worker-03"
    #     deviceFilter: "^sd[bd]"

  # The section for configuring management of daemon disruptions during upgrade or fencing.
  disruptionManagement:
    # If true, the operator will create and manage PodDisruptionBudgets for OSD, Mon, RGW, and MDS daemons. OSD PDBs are managed dynamically
    # via the strategy outlined in the [design](https://github.com/rook/rook/blob/master/design/ceph/ceph-managed-disruptionbudgets.md). The operator will
    # block eviction of OSDs by default and unblock them safely when drains are detected.
    managePodBudgets: true
    # A duration in minutes that determines how long an entire failureDomain like `region/zone/host` will be held in `noout` (in addition to the
    # default DOWN/OUT interval) when it is draining. This is only relevant when  `managePodBudgets` is `true`. The default value is `30` minutes.
    osdMaintenanceTimeout: 30
    # A duration in minutes that the operator will wait for the placement groups to become healthy (active+clean) after a drain was completed and OSDs came back up.
    # Operator will continue with the next drain if the timeout exceeds. It only works if `managePodBudgets` is `true`.
    # No values or 0 means that the operator will wait until the placement groups are healthy before unblocking the next drain.
    pgHealthCheckTimeout: 0
    # If true, the operator will create and manage MachineDisruptionBudgets to ensure OSDs are only fenced when the cluster is healthy.
    # Only available on OpenShift.
    manageMachineDisruptionBudgets: false
    # Namespace in which to watch for the MachineDisruptionBudgets.
    machineDisruptionBudgetNamespace: openshift-machine-api

  # Configure the healthcheck and liveness probes for ceph pods.
  # Valid values for daemons are 'mon', 'osd', 'status'
  healthCheck:
    daemonHealth:
      mon:
        disabled: false
        interval: 45s
      osd:
        disabled: false
        interval: 60s
      status:
        disabled: false
        interval: 60s
    # Change pod liveness probe, it works for all mon, mgr, and osd pods.
    livenessProbe:
      mon:
        disabled: false
      mgr:
        disabled: false
      osd:
        disabled: false

We should always return a nil pointer of clusterInfo if CreateOrLoadClusterInfo() returns an error. Closes: rook#9314 Signed-off-by: Sébastien Han <seb@redhat.com>

We should always return a nil pointer of clusterInfo if CreateOrLoadClusterInfo() returns an error. Closes: #9314 Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit fdd243d)

We should always return a nil pointer of clusterInfo if CreateOrLoadClusterInfo() returns an error. Closes: rook#9314 Signed-off-by: Sébastien Han <seb@redhat.com>

slow-zhang added the bug label Dec 6, 2021

leseb added a commit to leseb/rook that referenced this issue Dec 8, 2021

core: always return nil on error

69d7988

We should always return a nil pointer of clusterInfo if CreateOrLoadClusterInfo() returns an error. Closes: rook#9314 Signed-off-by: Sébastien Han <seb@redhat.com>

leseb added a commit to leseb/rook that referenced this issue Dec 8, 2021

core: always return nil on error

b45887f

We should always return a nil pointer of clusterInfo if CreateOrLoadClusterInfo() returns an error. Closes: rook#9314 Signed-off-by: Sébastien Han <seb@redhat.com>

leseb mentioned this issue Dec 8, 2021

core: always return nil clusterInfo on failure #9347

Merged

10 tasks

leseb self-assigned this Dec 8, 2021

leseb added a commit to leseb/rook that referenced this issue Dec 8, 2021

core: always return nil on error

fdd243d

We should always return a nil pointer of clusterInfo if CreateOrLoadClusterInfo() returns an error. Closes: rook#9314 Signed-off-by: Sébastien Han <seb@redhat.com>

leseb closed this as completed in #9347 Dec 8, 2021

parth-gr pushed a commit to parth-gr/rook that referenced this issue Feb 22, 2022

core: always return nil on error

52fa192

We should always return a nil pointer of clusterInfo if CreateOrLoadClusterInfo() returns an error. Closes: rook#9314 Signed-off-by: Sébastien Han <seb@redhat.com>

parth-gr pushed a commit to parth-gr/rook that referenced this issue Feb 22, 2022

core: always return nil on error

1ec95fa

We should always return a nil pointer of clusterInfo if CreateOrLoadClusterInfo() returns an error. Closes: rook#9314 Signed-off-by: Sébastien Han <seb@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operator error after K8s master token changed #9314

Operator error after K8s master token changed #9314

slow-zhang commented Dec 6, 2021

travisn commented Dec 6, 2021

leseb commented Dec 8, 2021

slow-zhang commented Dec 8, 2021 •

edited

leseb commented Dec 8, 2021

slow-zhang commented Dec 8, 2021

Operator error after K8s master token changed #9314

Operator error after K8s master token changed #9314

Comments

slow-zhang commented Dec 6, 2021

travisn commented Dec 6, 2021

leseb commented Dec 8, 2021

slow-zhang commented Dec 8, 2021 • edited

leseb commented Dec 8, 2021

slow-zhang commented Dec 8, 2021

slow-zhang commented Dec 8, 2021 •

edited