"cluster-logging-operator" pod keeps restarting with "fatal error: concurrent map read and map write" #2345

ShowMeTheGita · 2024-02-07T19:37:33Z

Describe the bug
Hello. We're facing an issue where the "cluster-logging-operator" pod has been restarted 100 times in the past 6 months, always with the same error "fatal error: concurrent map read and map write". Our openshift-logging is configured with a ClusterLogging and a ClusterLogForwarder forwarding logs to three Kafka brokers.

Environment

Versions of OpenShift, Cluster Logging and any other relevant components

Client Version: 4.7.3  
Server Version: 4.10.53  
Kubernetes Version: v1.23.12+8a6bfe4

oc get deployment.apps/cluster-logging-operator -o yaml | grep version
operatorframework.io/properties: '{"properties":[{"type":"olm.gvk","value":{"group":"logging.openshift.io","kind":"ClusterLogForwarder","version":"v1"}},{"type":"olm.gvk","value":{"group":"logging.openshift.io","kind":"ClusterLogging","version":"v1"}},{"type":"olm.maxOpenShiftVersion","value":4.12},{"type":"olm.package","value":{"packageName":"cluster-logging","version":"5.5.9"}}]}'

ClusterLogging instance

apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
  annotations:
    clusterlogging.openshift.io/logforwardingtechpreview: enabled
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"logging.openshift.io/v1","kind":"ClusterLogging","metadata":{"annotations":{"clusterlogging.openshift.io/logforwardingtechpreview":"enabled"},"name":"instance","namespace":"openshift-logging"},"spec":{"collection":{"logs":{"fluentd":{},"type":"fluentd"}},"managementState":"Unmanaged"}}
  creationTimestamp: "2021-07-27T14:40:14Z"
  generation: 5
  managedFields:
  - apiVersion: logging.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:clusterlogging.openshift.io/logforwardingtechpreview: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
      f:spec:
        .: {}
        f:collection:
          .: {}
          f:logs:
            .: {}
            f:fluentd: {}
            f:type: {}
    manager: kubectl-client-side-apply
    operation: Update
    time: "2021-07-27T14:40:53Z"
  - apiVersion: logging.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        f:collection:
          f:logs:
            f:fluentd:
              f:resources: {}
      f:status:
        .: {}
        f:clusterConditions: {}
        f:collection:
          .: {}
          f:logs:
            .: {}
            f:fluentdStatus:
              .: {}
              f:daemonSet: {}
              f:nodes:
                .: {}
                f:fluentd-2hcrj: {}
                f:fluentd-2kbxm: {}
                f:fluentd-4dg7r: {}
                f:fluentd-4v7qs: {}
                f:fluentd-5jmhk: {}
                f:fluentd-84kkk: {}
                f:fluentd-8dp6m: {}
                f:fluentd-8wncg: {}
                f:fluentd-8wv7k: {}
                f:fluentd-8xrwk: {}
                f:fluentd-47jbr: {}
                f:fluentd-cp8gm: {}
                f:fluentd-f57pt: {}
                f:fluentd-gl8bb: {}
                f:fluentd-gsgm9: {}
                f:fluentd-hmkm9: {}
                f:fluentd-jjjpv: {}
                f:fluentd-lbn4k: {}
                f:fluentd-lkxvh: {}
                f:fluentd-mvq7m: {}
                f:fluentd-n7q9b: {}
                f:fluentd-p7n7x: {}
                f:fluentd-pbjh9: {}
                f:fluentd-rnzn6: {}
                f:fluentd-rrntm: {}
                f:fluentd-s925v: {}
                f:fluentd-t5hsx: {}
                f:fluentd-xg7gq: {}
                f:fluentd-xkhmj: {}
                f:fluentd-xmpht: {}
              f:pods:
                .: {}
                f:failed: {}
                f:notReady: {}
                f:ready: {}
        f:curation: {}
        f:logStore: {}
        f:visualization: {}
    manager: cluster-logging-operator
    operation: Update
    time: "2021-07-27T14:47:12Z"
  - apiVersion: logging.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        f:managementState: {}
    manager: Mozilla
    operation: Update
    time: "2021-07-27T14:47:37Z"
  name: instance
  namespace: openshift-logging
  resourceVersion: "12835895"
  uid: d29a1c1d-2c74-4e49-928e-62ba89487d84
spec:
  collection:
    logs:
      fluentd: {}
      type: fluentd
  managementState: Unmanaged
status:
  collection:
    logs:
      fluentdStatus:
        daemonSet: fluentd
        nodes:
          fluentd-2hcrj: ocp-master-1.internal-url.org
          fluentd-2kbxm: ocp-master-5.internal-url.org
          fluentd-4v7qs: ocp-worker-4.internal-url.org
          fluentd-5jmhk: ocp-worker-2.internal-url.org
          fluentd-47jbr: ocp-worker-12.internal-url.org
          fluentd-4dg7r: ocp-worker-21.internal-url.org
          fluentd-84kkk: ocp-worker-11.internal-url.org
          fluentd-8dp6m: ocp-worker-1.internal-url.org
          fluentd-8wncg: ocp-worker-17.internal-url.org
          fluentd-8wv7k: ocp-worker-16.internal-url.org
          fluentd-8xrwk: ocp-worker-8.internal-url.org
          fluentd-cp8gm: ocp-worker-10.internal-url.org
          fluentd-f57pt: ocp-worker-18.internal-url.org
          fluentd-gl8bb: ocp-worker-23.internal-url.org
          fluentd-gsgm9: ocp-master-4.internal-url.org
          fluentd-hmkm9: ocp-worker-15.internal-url.org
          fluentd-jjjpv: ocp-worker-22.internal-url.org
          fluentd-lbn4k: ocp-master-3.internal-url.org
          fluentd-lkxvh: ocp-worker-19.internal-url.org
          fluentd-mvq7m: ocp-worker-5.internal-url.org
          fluentd-n7q9b: ocp-worker-3.internal-url.org
          fluentd-p7n7x: ocp-worker-25.internal-url.org
          fluentd-pbjh9: ocp-worker-13.internal-url.org
          fluentd-rnzn6: ocp-master-2.internal-url.org
          fluentd-rrntm: ocp-worker-6.internal-url.org
          fluentd-s925v: ocp-worker-24.internal-url.org
          fluentd-t5hsx: ocp-worker-7.internal-url.org
          fluentd-xg7gq: ocp-worker-14.internal-url.org
          fluentd-xkhmj: ocp-worker-20.internal-url.org
          fluentd-xmpht: ocp-worker-9.internal-url.org
        pods:
          failed: []
          notReady: []
          ready:
          - fluentd-2hcrj
          - fluentd-2kbxm
          - fluentd-47jbr
          - fluentd-4dg7r
          - fluentd-4v7qs
          - fluentd-5jmhk
          - fluentd-84kkk
          - fluentd-8dp6m
          - fluentd-8wncg
          - fluentd-8wv7k
          - fluentd-8xrwk
          - fluentd-cp8gm
          - fluentd-f57pt
          - fluentd-gl8bb
          - fluentd-gsgm9
          - fluentd-hmkm9
          - fluentd-jjjpv
          - fluentd-lbn4k
          - fluentd-lkxvh
          - fluentd-mvq7m
          - fluentd-n7q9b
          - fluentd-p7n7x
          - fluentd-pbjh9
          - fluentd-rnzn6
          - fluentd-rrntm
          - fluentd-s925v
          - fluentd-t5hsx
          - fluentd-xg7gq
          - fluentd-xkhmj
          - fluentd-xmpht
  curation: {}
  logStore: {}
  visualization: {}

Logs
cluster-logging-operator.log

Expected behavior
cluster-logging-operator pod does not crash/restart

Actual behavior
Pod restarts after X amount of time

To Reproduce
Cannot consistently reproduce - pod crashes seemingly at random after X time

Additional context
Happy to provide additional info if necessary. Thank you.

The text was updated successfully, but these errors were encountered:

openshift-bot · 2024-05-08T01:00:25Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"cluster-logging-operator" pod keeps restarting with "fatal error: concurrent map read and map write" #2345

"cluster-logging-operator" pod keeps restarting with "fatal error: concurrent map read and map write" #2345

ShowMeTheGita commented Feb 7, 2024

openshift-bot commented May 8, 2024

"cluster-logging-operator" pod keeps restarting with "fatal error: concurrent map read and map write" #2345

"cluster-logging-operator" pod keeps restarting with "fatal error: concurrent map read and map write" #2345

Comments

ShowMeTheGita commented Feb 7, 2024

openshift-bot commented May 8, 2024