Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AtlasDatabaseUser - message - unable to list: test because of unknown namespace for the cache #1515

Open
qtranton opened this issue Apr 16, 2024 · 10 comments

Comments

@qtranton
Copy link

Have a version of operator 1.7.1 and decide to upgrade to the latest in cluster.
Create local env

  1. k8s - by docker desktop v 1.25.4
  2. operator v1.7.1
  3. Add AtlasDeployment and AtlasDatabaseUser
  4. Upgrade to v2.2.0 ( helm upgrade crd then upgrade operator )
  5. Fix AtlasDeployment
  6. Check logs of operator get error aka

{"level":"INFO","time":"2024-04-16T12:12:14.543Z","msg":"Status update","atlasdatabaseuser":"test/operator-upgrade-test","lastCondition":{"type":"DatabaseUserReady","status":"False","lastTransitionTime":null,"reason":"DatabaseUserStaleConnectionSecrets","message":"unable to list: test because of unknown namespace for the cache"}}

What did you expect?
After all step operator just should work as expected

What happened instead?
AtlasDatabaseUser status always in False state

Operator Information

  • 1.7.1 -> 2.2.0

Kubernetes Cluster Information

  • Docker Desktop
  • 1.25.4

Additional context
Try to figure out why AtlasDatabaseUser CRD failed.
It's created proper secrets and creates users in AtlasUI but CRD itself always in Ready - False state

status: conditions: - lastTransitionTime: "2024-04-16T12:03:17Z" status: "False" type: Ready - lastTransitionTime: "2024-04-16T11:44:08Z" status: "True" type: ResourceVersionIsValid - lastTransitionTime: "2024-04-16T11:44:08Z" status: "True" type: ValidationSucceeded - lastTransitionTime: "2024-04-16T12:03:18Z" message: 'unable to list: test because of unknown namespace for the cache' reason: DatabaseUserStaleConnectionSecrets status: "False" type: DatabaseUserReady

If possible, please include:

{"level":"DEBUG","time":"2024-04-16T12:17:12.709Z","msg":"Ensured connection Secret up-to-date","atlasdatabaseuser":"test/operator-upgrade-test","secretname":"HIDDEN"} {"level":"INFO","time":"2024-04-16T12:17:12.709Z","msg":"Status update","atlasdatabaseuser":"test/operator-upgrade-test-","lastCondition":{"type":"DatabaseUserReady","status":"False","lastTransitionTime":null,"reason":"DatabaseUserStaleConnectionSecrets","message":"unable to list: test because of unknown namespace for the cache"}}

@josvazg
Copy link
Collaborator

josvazg commented Apr 17, 2024

Thanks for reporting this issue @qtranton !

Could you give us a minimum YAML sample we could use to reproduce the issue?
Does not need to be your original complete setup, just the definitions that reproduce the same failure.

@qtranton
Copy link
Author

Sure, i have cleanup i guess my yaml here

apiVersion: v1
kind: Secret
metadata:
  labels:
    app: operator-upgrade
    atlas.mongodb.com/type: credentials
    env: dev
  name: operator-upgrade-test
  namespace: test
stringData:
  password: testpassword


---
# Source: app-resources/templates/mongodb_atlas.yaml
apiVersion: atlas.mongodb.com/v1
kind: AtlasBackupPolicy
metadata:
  name: operator-upgrade-test
  namespace: test
  annotations:
    mongodb.com/atlas-resource-policy: "keep"
spec:
  items: 
    - frequencyInterval: 12
      frequencyType: hourly
      retentionUnit: days
      retentionValue: 1
    - frequencyInterval: 1
      frequencyType: daily
      retentionUnit: days
      retentionValue: 7
    - frequencyInterval: 6
      frequencyType: weekly
      retentionUnit: weeks
      retentionValue: 1
    - frequencyInterval: 40
      frequencyType: monthly
      retentionUnit: months
      retentionValue: 1
---
# Source: app-resources/templates/mongodb_atlas.yaml
apiVersion: atlas.mongodb.com/v1
kind: AtlasBackupSchedule
metadata:
  name: operator-upgrade-test
  namespace: test
  annotations:
    mongodb.com/atlas-resource-policy: "keep"
spec:
  autoExportEnabled: false
  referenceHourOfDay: 21
  referenceMinuteOfHour: 2
  policy:
    name: operator-upgrade-test
    namespace: test
---
# Source: app-resources/templates/mongodb_atlas.yaml
apiVersion: atlas.mongodb.com/v1
kind: AtlasDatabaseUser
metadata:
  name: operator-upgrade-test
  labels:
    app: "operator-upgrade"
    env: dev
  #   mongodb.com/atlas-resource-policy: "keep"
spec:
  roles:
  - roleName: readWrite
    databaseName: Application
  scopes:
  - type: CLUSTER
    name: operator-upgrade-test
  projectRef:
    name: project-name
    namespace: mongodb-operator
  username: operator-upgrade-test
  databaseName: admin
  passwordSecretRef:
    name: "operator-upgrade-test"

---
# Source: app-resources/templates/mongodb_atlas.yaml
apiVersion: atlas.mongodb.com/v1
kind: AtlasDeployment
metadata:
  name: operator-upgrade-test
  namespace: test
  labels:
    app: "operator-upgrade"
    env: dev
  # annotations:
  #   mongodb.com/atlas-resource-policy: "keep"
spec:
  backupRef:
    name: operator-upgrade-test
    namespace: test
  projectRef:
    name: project-name
    namespace: mongodb-operator
  advancedDeploymentSpec:
    mongoDBMajorVersion: "6.0"
    clusterType: REPLICASET
    backupEnabled: true
    pitEnabled: false
    name: operator-upgrade-test
    replicationSpecs:
      - regionConfigs:
        - electableSpecs:
              instanceSize: M10
              nodeCount: 3
          providerName: GCP
          backingProviderName: GCP
          regionName: "EASTERN_US"
          # Priority description https://www.mongodb.com/docs/atlas/reference/atlas-operator/atlasdeployment-custom-resource/#mongodb-setting-spec.advancedDeploymentSpec.replicationSpecs.regionConfigs.priority
          priority: 7
          autoScaling:
            compute:
              enabled: false

@s-urbaniak
Copy link
Collaborator

cc @roothorp

@s-urbaniak
Copy link
Collaborator

s-urbaniak commented Apr 26, 2024

@qtranton can you check if you happen to have the WATCH_NAMESPACE environment variable set for your operator deployment? i.e. if you could submit the output of kubectl -n <operator_namespace> get pod <operator_name> here?

@s-urbaniak
Copy link
Collaborator

i.e. it looks like the test namespace is not being listened by the operator, overriden by the WATCH_NAMESPACE env variable.

@qtranton
Copy link
Author

In helm i see this

{{- if .Values.watchNamespaces }}
          - name: WATCH_NAMESPACE
            value: "{{ join "," .Values.watchNamespaces }}"
          {{- end }}

So i have check pod and

 Readiness:  http-get http://:8081/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:
      OPERATOR_POD_NAME:   mongodb-atlas-operator-5df9ff6978-tqznx (v1:metadata.name)
      OPERATOR_NAMESPACE:  mongodb-operator (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4kgq7 (to)

@qtranton
Copy link
Author

qtranton commented Apr 29, 2024

In roles i also see some mention of this variable, but since it empty no additional roles was created

mongodb-operator   mongodb-atlas-operator                           
mongodb-operator   mongodb-atlas-operator-leader-election-role      

Plus it works on older version so older version could read secrets i guess

@qtranton
Copy link
Author

Validate secrets as well
When remove labels

atlas.mongodb.com/type: credentials

Get error like

"msg":"Status update","atlasdatabaseuser":"tester/operator-upgrade-test","lastCondition":{"type":"DatabaseUserReady","status":"False","lastTransitionTime":null,"reason":"InternalError","message":"Secret \"operator-upgrade-test\" not found"}}

Back labels in place get error

"msg":"Status update","atlasdatabaseuser":"tester/operator-upgrade-test","lastCondition":{"type":"DatabaseUserReady","status":"False","lastTransitionTime":null,"reason":"DatabaseUserStaleConnectionSecrets","message":"unable to list: tester because of unknown namespace for the cache"}}

@qtranton
Copy link
Author

qtranton commented May 27, 2024

@josvazg @s-urbaniak hey have some time to debug issue, so on my local cluster for some reason on version 2.2.2 i do not see status.name parameters.
Just put a lot of println in local branch :D

    #############################
    operator-upgrade-test
    cleanupStaleSecrets: Failed to list connection Secrets 
    ############################# 

To

if user.Status.UserName != user.Spec.Username {
		// Note, that we pass the username from the status, not from the spec
		fmt.Println("#############################")
		fmt.Println(user.Status.UserName, user.Spec.Username)
		fmt.Println("cleanupStaleSecrets: Failed to list connection Secrets")
		fmt.Println("#############################")
		return RemoveStaleSecretsByUserName(ctx.Context, k8sClient, projectID, user.Status.UserName, user, ctx.Log)
	}

Here
https://github.com/mongodb/mongodb-atlas-kubernetes/blob/main/pkg/controller/connectionsecret/connectionsecrets.go#L126
Now i try figure out why i have error related to secret if user not set
Meanwhile CRD look like that :

status:
    conditions:
    - lastTransitionTime: "2024-05-27T11:30:38Z"
      status: "False"
      type: Ready
    - lastTransitionTime: "2024-05-27T11:30:38Z"
      status: "True"
      type: ResourceVersionIsValid
    - lastTransitionTime: "2024-05-27T11:30:38Z"
      status: "True"
      type: ValidationSucceeded
    - lastTransitionTime: "2024-05-27T11:30:39Z"
      message: 'unable to list: tester because of unknown namespace for the cache'
      reason: DatabaseUserStaleConnectionSecrets
      status: "False"
      type: DatabaseUserReady
    observedGeneration: 1
    passwordVersion: "3017702"

@qtranton
Copy link
Author

Update: Recheck on v 1.7 and name in status appear

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants