Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"cluster-logging-operator" pod keeps restarting with "fatal error: concurrent map read and map write" #2345

Open
ShowMeTheGita opened this issue Feb 7, 2024 · 1 comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@ShowMeTheGita
Copy link

Describe the bug
Hello. We're facing an issue where the "cluster-logging-operator" pod has been restarted 100 times in the past 6 months, always with the same error "fatal error: concurrent map read and map write". Our openshift-logging is configured with a ClusterLogging and a ClusterLogForwarder forwarding logs to three Kafka brokers.

Environment

  • Versions of OpenShift, Cluster Logging and any other relevant components
Client Version: 4.7.3  
Server Version: 4.10.53  
Kubernetes Version: v1.23.12+8a6bfe4
oc get deployment.apps/cluster-logging-operator -o yaml | grep version
operatorframework.io/properties: '{"properties":[{"type":"olm.gvk","value":{"group":"logging.openshift.io","kind":"ClusterLogForwarder","version":"v1"}},{"type":"olm.gvk","value":{"group":"logging.openshift.io","kind":"ClusterLogging","version":"v1"}},{"type":"olm.maxOpenShiftVersion","value":4.12},{"type":"olm.package","value":{"packageName":"cluster-logging","version":"5.5.9"}}]}'
  • ClusterLogging instance
apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
  annotations:
    clusterlogging.openshift.io/logforwardingtechpreview: enabled
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"logging.openshift.io/v1","kind":"ClusterLogging","metadata":{"annotations":{"clusterlogging.openshift.io/logforwardingtechpreview":"enabled"},"name":"instance","namespace":"openshift-logging"},"spec":{"collection":{"logs":{"fluentd":{},"type":"fluentd"}},"managementState":"Unmanaged"}}
  creationTimestamp: "2021-07-27T14:40:14Z"
  generation: 5
  managedFields:
  - apiVersion: logging.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:clusterlogging.openshift.io/logforwardingtechpreview: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
      f:spec:
        .: {}
        f:collection:
          .: {}
          f:logs:
            .: {}
            f:fluentd: {}
            f:type: {}
    manager: kubectl-client-side-apply
    operation: Update
    time: "2021-07-27T14:40:53Z"
  - apiVersion: logging.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        f:collection:
          f:logs:
            f:fluentd:
              f:resources: {}
      f:status:
        .: {}
        f:clusterConditions: {}
        f:collection:
          .: {}
          f:logs:
            .: {}
            f:fluentdStatus:
              .: {}
              f:daemonSet: {}
              f:nodes:
                .: {}
                f:fluentd-2hcrj: {}
                f:fluentd-2kbxm: {}
                f:fluentd-4dg7r: {}
                f:fluentd-4v7qs: {}
                f:fluentd-5jmhk: {}
                f:fluentd-84kkk: {}
                f:fluentd-8dp6m: {}
                f:fluentd-8wncg: {}
                f:fluentd-8wv7k: {}
                f:fluentd-8xrwk: {}
                f:fluentd-47jbr: {}
                f:fluentd-cp8gm: {}
                f:fluentd-f57pt: {}
                f:fluentd-gl8bb: {}
                f:fluentd-gsgm9: {}
                f:fluentd-hmkm9: {}
                f:fluentd-jjjpv: {}
                f:fluentd-lbn4k: {}
                f:fluentd-lkxvh: {}
                f:fluentd-mvq7m: {}
                f:fluentd-n7q9b: {}
                f:fluentd-p7n7x: {}
                f:fluentd-pbjh9: {}
                f:fluentd-rnzn6: {}
                f:fluentd-rrntm: {}
                f:fluentd-s925v: {}
                f:fluentd-t5hsx: {}
                f:fluentd-xg7gq: {}
                f:fluentd-xkhmj: {}
                f:fluentd-xmpht: {}
              f:pods:
                .: {}
                f:failed: {}
                f:notReady: {}
                f:ready: {}
        f:curation: {}
        f:logStore: {}
        f:visualization: {}
    manager: cluster-logging-operator
    operation: Update
    time: "2021-07-27T14:47:12Z"
  - apiVersion: logging.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        f:managementState: {}
    manager: Mozilla
    operation: Update
    time: "2021-07-27T14:47:37Z"
  name: instance
  namespace: openshift-logging
  resourceVersion: "12835895"
  uid: d29a1c1d-2c74-4e49-928e-62ba89487d84
spec:
  collection:
    logs:
      fluentd: {}
      type: fluentd
  managementState: Unmanaged
status:
  collection:
    logs:
      fluentdStatus:
        daemonSet: fluentd
        nodes:
          fluentd-2hcrj: ocp-master-1.internal-url.org
          fluentd-2kbxm: ocp-master-5.internal-url.org
          fluentd-4v7qs: ocp-worker-4.internal-url.org
          fluentd-5jmhk: ocp-worker-2.internal-url.org
          fluentd-47jbr: ocp-worker-12.internal-url.org
          fluentd-4dg7r: ocp-worker-21.internal-url.org
          fluentd-84kkk: ocp-worker-11.internal-url.org
          fluentd-8dp6m: ocp-worker-1.internal-url.org
          fluentd-8wncg: ocp-worker-17.internal-url.org
          fluentd-8wv7k: ocp-worker-16.internal-url.org
          fluentd-8xrwk: ocp-worker-8.internal-url.org
          fluentd-cp8gm: ocp-worker-10.internal-url.org
          fluentd-f57pt: ocp-worker-18.internal-url.org
          fluentd-gl8bb: ocp-worker-23.internal-url.org
          fluentd-gsgm9: ocp-master-4.internal-url.org
          fluentd-hmkm9: ocp-worker-15.internal-url.org
          fluentd-jjjpv: ocp-worker-22.internal-url.org
          fluentd-lbn4k: ocp-master-3.internal-url.org
          fluentd-lkxvh: ocp-worker-19.internal-url.org
          fluentd-mvq7m: ocp-worker-5.internal-url.org
          fluentd-n7q9b: ocp-worker-3.internal-url.org
          fluentd-p7n7x: ocp-worker-25.internal-url.org
          fluentd-pbjh9: ocp-worker-13.internal-url.org
          fluentd-rnzn6: ocp-master-2.internal-url.org
          fluentd-rrntm: ocp-worker-6.internal-url.org
          fluentd-s925v: ocp-worker-24.internal-url.org
          fluentd-t5hsx: ocp-worker-7.internal-url.org
          fluentd-xg7gq: ocp-worker-14.internal-url.org
          fluentd-xkhmj: ocp-worker-20.internal-url.org
          fluentd-xmpht: ocp-worker-9.internal-url.org
        pods:
          failed: []
          notReady: []
          ready:
          - fluentd-2hcrj
          - fluentd-2kbxm
          - fluentd-47jbr
          - fluentd-4dg7r
          - fluentd-4v7qs
          - fluentd-5jmhk
          - fluentd-84kkk
          - fluentd-8dp6m
          - fluentd-8wncg
          - fluentd-8wv7k
          - fluentd-8xrwk
          - fluentd-cp8gm
          - fluentd-f57pt
          - fluentd-gl8bb
          - fluentd-gsgm9
          - fluentd-hmkm9
          - fluentd-jjjpv
          - fluentd-lbn4k
          - fluentd-lkxvh
          - fluentd-mvq7m
          - fluentd-n7q9b
          - fluentd-p7n7x
          - fluentd-pbjh9
          - fluentd-rnzn6
          - fluentd-rrntm
          - fluentd-s925v
          - fluentd-t5hsx
          - fluentd-xg7gq
          - fluentd-xkhmj
          - fluentd-xmpht
  curation: {}
  logStore: {}
  visualization: {}

Logs
cluster-logging-operator.log

Expected behavior
cluster-logging-operator pod does not crash/restart

Actual behavior
Pod restarts after X amount of time

To Reproduce
Cannot consistently reproduce - pod crashes seemingly at random after X time

Additional context
Happy to provide additional info if necessary. Thank you.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

2 participants