Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: do not watch resourceversion and heartbeat node updates #14186

Merged
merged 1 commit into from May 21, 2024

Conversation

parth-gr
Copy link
Member

@parth-gr parth-gr commented May 9, 2024

currently node predicate watches for resourceVersion and heartbeat changes, which are not critical things.
So with the PR we will add a early-exit condition for the predicate as an optimization and to ignore considering for a reconcile of cephcluster if updates are not crucial

closes: #14070

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

Copy link
Member

@BlaineEXE BlaineEXE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the output sample posted in the issue (#14070 (comment)), I think we might want to consider whether we should ignore other fields as well, or even if we should instead focus on a subset of fields that we do want to reconcile on, and ignore all other fields.

IIRC, @subhamkrai did some work on this, though it could be older. Let's chat in an upcoming huddle about what field changes we definitely want to reconcile on, which we aren't sure about, and which we definitely don't need to reconcile on.

pkg/operator/ceph/cluster/predicate.go Outdated Show resolved Hide resolved
Comment on lines 41 to 50
// do not watch node if only resourceversion got changed
resourceQtyComparer := cmpopts.IgnoreFields(v1.ObjectMeta{}, "ResourceVersion")
diff := cmp.Diff(objOld.Spec, objNew.Spec, resourceQtyComparer)

// do not watch node if only LastHeartbeatTime got changed
resourceQtyComparer = cmpopts.IgnoreFields(corev1.NodeCondition{}, "LastHeartbeatTime")
diff1 := cmp.Diff(objOld.Spec, objNew.Spec, resourceQtyComparer)

if diff == "" && diff1 == "" {
return diff
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain about this logic. It sets up 2 different comparers separately, instead of using the comparers together to ignore both at once. Even if this is working as intended, I'd like to see a unit test for the function that proves it's working as it should be.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems working, wil also add the unit test
Testing:

2024-05-10 09:47:09.009607 D | ceph-cluster-controller: only changes in the ResourceVersion or LastHeartbeatTime "  &v1.Node{\n  \tTypeMeta: {},\n  \tObjectMeta: v1.ObjectMeta{\n  \t\t... // 3 identical fields\n  \t\tSelfLink:          \"\",\n  \t\tUID:               \"245242a2-381e-4f4c-ae5d-5d055f2e7f34\",\n- \t\tResourceVersion:   \"75120\",\n+ \t\tResourceVersion:   \"75475\",\n  \t\tGeneration:        0,\n  \t\tCreationTimestamp: {Time: s\"2024-05-06 11:53:46 +0000 UTC\"},\n  \t\t... // 4 identical fields\n  \t\tOwnerReferences: nil,\n  \t\tFinalizers:      nil,\n  \t\tManagedFields: []v1.ManagedFieldsEntry{\n  \t\t\t... // 3 identical elements\n  \t\t\t{Manager: \"kube-controller-manager\", Operation: \"Update\", APIVersion: \"v1\", Time: s\"2024-05-08 11:25:59 +0000 UTC\", ...},\n  \t\t\t{Manager: \"kube-controller-manager\", Operation: \"Update\", APIVersion: \"v1\", Time: s\"2024-05-08 11:26:00 +0000 UTC\", ...},\n  \t\t\t{\n  \t\t\t\tManager:     \"kubelet\",\n  \t\t\t\tOperation:   \"Update\",\n  \t\t\t\tAPIVersion:  \"v1\",\n- \t\t\t\tTime:        s\"2024-05-10 09:42:02 +0000 UTC\",\n+ \t\t\t\tTime:        s\"2024-05-10 09:47:08 +0000 UTC\",\n  \t\t\t\tFieldsType:  \"FieldsV1\",\n  \t\t\t\tFieldsV1:    &{Raw: `{\"f:metadata\":{\"f:annotations\":{\"f:csi.volume.kubernetes.io/node`...},\n  \t\t\t\tSubresource: \"status\",\n  \t\t\t},\n  \t\t},\n  \t},\n  \tSpec: {PodCIDR: \"10.244.0.0/24\", PodCIDRs: {\"10.244.0.0/24\"}},\n  \tStatus: v1.NodeStatus{\n  \t\tCapacity:    {s\"cpu\": {i: {...}, s: \"2\", Format: \"DecimalSI\"}, s\"ephemeral-storage\": {i: {...}, s: \"6354932Ki\", Format: \"BinarySI\"}, s\"hugepages-2Mi\": {s: \"0\", Format: \"DecimalSI\"}, s\"memory\": {i: {...}, Format: \"BinarySI\"}, ...},\n  \t\tAllocatable: {s\"cpu\": {i: {...}, s: \"2\", Format: \"DecimalSI\"}, s\"ephemeral-storage\": {i: {...}, s: \"6354932Ki\", Format: \"BinarySI\"}, s\"hugepages-2Mi\": {s: \"0\", Format: \"DecimalSI\"}, s\"memory\": {i: {...}, Format: \"BinarySI\"}, ...},\n  \t\tPhase:       \"\",\n  \t\tConditions: []v1.NodeCondition{\n  \t\t\t{\n  \t\t\t\tType:               \"MemoryPressure\",\n  \t\t\t\tStatus:             \"False\",\n- \t\t\t\tLastHeartbeatTime:  v1.Time{Time: s\"2024-05-10 09:42:02 +0000 UTC\"},\n+ \t\t\t\tLastHeartbeatTime:  v1.Time{Time: s\"2024-05-10 09:47:08 +0000 UTC\"},\n  \t\t\t\tLastTransitionTime: {Time: s\"2024-05-08 11:26:11 +0000 UTC\"},\n  \t\t\t\tReason:             \"KubeletHasSufficientMemory\",\n  \t\t\t\tMessage:            \"kubelet has sufficient memory available\",\n  \t\t\t},\n  \t\t\t{\n  \t\t\t\tType:               \"DiskPressure\",\n  \t\t\t\tStatus:             \"False\",\n- \t\t\t\tLastHeartbeatTime:  v1.Time{Time: s\"2024-05-10 09:42:02 +0000 UTC\"},\n+ \t\t\t\tLastHeartbeatTime:  v1.Time{Time: s\"2024-05-10 09:47:08 +0000 UTC\"},\n  \t\t\t\tLastTransitionTime: {Time: s\"2024-05-08 11:26:11 +0000 UTC\"},\n  \t\t\t\tReason:             \"KubeletHasNoDiskPressure\",\n  \t\t\t\tMessage:            \"kubelet has no disk pressure\",\n  \t\t\t},\n  \t\t\t{\n  \t\t\t\tType:               \"PIDPressure\",\n  \t\t\t\tStatus:             \"False\",\n- \t\t\t\tLastHeartbeatTime:  v1.Time{Time: s\"2024-05-10 09:42:02 +0000 UTC\"},\n+ \t\t\t\tLastHeartbeatTime:  v1.Time{Time: s\"2024-05-10 09:47:08 +0000 UTC\"},\n  \t\t\t\tLastTransitionTime: {Time: s\"2024-05-08 11:26:11 +0000 UTC\"},\n  \t\t\t\tReason:             \"KubeletHasSufficientPID\",\n  \t\t\t\tMessage:            \"kubelet has sufficient PID available\",\n  \t\t\t},\n  \t\t\t{\n  \t\t\t\tType:               \"Ready\",\n  \t\t\t\tStatus:             \"True\",\n- \t\t\t\tLastHeartbeatTime:  v1.Time{Time: s\"2024-05-10 09:42:02 +0000 UTC\"},\n+ \t\t\t\tLastHeartbeatTime:  v1.Time{Time: s\"2024-05-10 09:47:08 +0000 UTC\"},\n  \t\t\t\tLastTransitionTime: {Time: s\"2024-05-08 11:26:11 +0000 UTC\"},\n  \t\t\t\tReason:             \"KubeletReady\",\n  \t\t\t\tMessage:            \"kubelet is posting ready status\",\n  \t\t\t},\n  \t\t},\n  \t\tAddresses:       {{Type: \"InternalIP\", Address: \"192.168.39.166\"}, {Type: \"Hostname\", Address: \"minikube\"}},\n  \t\tDaemonEndpoints: {KubeletEndpoint: {Port: 10250}},\n  \t\t... // 5 identical fields\n  \t},\n  }\n"

@parth-gr
Copy link
Member Author

Based on the output sample posted in the issue (#14070 (comment)), I think we might want to consider whether we should ignore other fields as well, or even if we should instead focus on a subset of fields that we do want to reconcile on, and ignore all other fields.

IIRC, @subhamkrai did some work on this, though it could be older. Let's chat in an upcoming huddle about what field changes we definitely want to reconcile on, which we aren't sure about, and which we definitely don't need to reconcile on.

@subhamkrai do you have any comments?

Comment on lines 42 to 50
resourceQtyComparer := cmpopts.IgnoreFields(v1.ObjectMeta{}, "ResourceVersion")
diff := cmp.Diff(objOld.Spec, objNew.Spec, resourceQtyComparer)

// do not watch node if only LastHeartbeatTime got changed
resourceQtyComparer = cmpopts.IgnoreFields(corev1.NodeCondition{}, "LastHeartbeatTime")
diff1 := cmp.Diff(objOld.Spec, objNew.Spec, resourceQtyComparer)

if diff == "" && diff1 == "" {
return diff
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
resourceQtyComparer := cmpopts.IgnoreFields(v1.ObjectMeta{}, "ResourceVersion")
diff := cmp.Diff(objOld.Spec, objNew.Spec, resourceQtyComparer)
// do not watch node if only LastHeartbeatTime got changed
resourceQtyComparer = cmpopts.IgnoreFields(corev1.NodeCondition{}, "LastHeartbeatTime")
diff1 := cmp.Diff(objOld.Spec, objNew.Spec, resourceQtyComparer)
if diff == "" && diff1 == "" {
return diff
}
// do not watch the node if only 'LastHeartbeatTime' and 'ResourceVersion' are changed
resourceQtyComparer = cmpopts.IgnoreFields(corev1.NodeCondition{}, "LastHeartbeatTime", "ResourceVersion")
return resourceQtyComparer

Copy link
Member Author

@parth-gr parth-gr May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resourceQtyComparer is just a structure which goes with cmp.Diff so we cant get it directly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we can just add this line right at L43

diff := cmp.Diff(objOld.Spec, objNew.Spec, resourceQtyComparer)
return diff

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One is for the struct v1.ObjectMeta{} and second is for corev1.NodeCondition{}.
So we need to comparators.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may use NodeType,
But not sure IgnoreFields can look for sub-sub-sub fields....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that is not possible only one nesting look feasable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@subhamkrai
Copy link
Contributor

Based on the output sample posted in the issue (#14070 (comment)), I think we might want to consider whether we should ignore other fields as well, or even if we should instead focus on a subset of fields that we do want to reconcile on, and ignore all other fields.
IIRC, @subhamkrai did some work on this, though it could be older. Let's chat in an upcoming huddle about what field changes we definitely want to reconcile on, which we aren't sure about, and which we definitely don't need to reconcile on.

@subhamkrai do you have any comments?

I did worked on the predicates to watch for node level taints

@parth-gr
Copy link
Member Author

@BlaineEXE lemme know if you have anything in your mind to stop it during reconciling,

Otherwise, let's keep this a base starting point of the reconciliation and we can open more issues.

if diff == "" && diff1 == "" {
return diff
}
return "not-equal"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about returning a bool? The actual diff isn't being returned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, updated

pkg/operator/ceph/cluster/predicate.go Outdated Show resolved Hide resolved
@@ -40,3 +42,38 @@ func TestIsHotPlugCM(t *testing.T) {
cm.Labels["app"] = "rook-discover"
assert.True(t, isHotPlugCM(cm))
}

func TestCompareNodes(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please test all cases. I only see one case tested, and it doesn't give me confidence that the comparators are working as we expect them to.

What happens if only the heartbeat time changes?
What happens if only the resource version changes?
What happens if nothing changes?
What happens if a field changes, but not resource version or heartbeat?
What happens if a field changes, and only resource version?
What happens if a field changes, and only heartbeat?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added all the cases

Comment on lines 63 to 64
objOld := e.ObjectOld.(*corev1.Node)
objNew := e.ObjectNew.(*corev1.Node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if either of these casting operations fail?

I think we need to check, to do our due diligence:

obj, ok := object.(*typecast)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@parth-gr
Copy link
Member Author

A sample node

apiVersion: v1
kind: Node
metadata:
  annotations:
    alpha.kubernetes.io/provided-node-ip: 10.1.112.133
    csi.volume.kubernetes.io/nodeid: '{"csi.vsphere.vmware.com":"420ffbfe-3c06-a48c-9598-911ad67d2200"}'
    k8s.ovn.org/host-cidrs: '["10.1.112.133/23"]'
    k8s.ovn.org/l3-gateway-config: '{"default":{"mode":"shared","interface-id":"br-ex_control-plane-0","mac-address":"00:50:56:8f:44:de","ip-addresses":["10.1.112.133/23"],"ip-address":"10.1.112.133/23","next-hops":["10.1.113.254"],"next-hop":"10.1.113.254","node-port-enable":"true","vlan-id":"0"}}'
    k8s.ovn.org/network-ids: '{"default":"0"}'
    k8s.ovn.org/node-chassis-id: 88cd29e0-1937-4b85-9595-0f731b32a2d3
    k8s.ovn.org/node-gateway-router-lrp-ifaddr: '{"ipv4":"100.64.0.2/16"}'
    k8s.ovn.org/node-id: "2"
    k8s.ovn.org/node-mgmt-port-mac-address: 16:54:7f:ab:fb:93
    k8s.ovn.org/node-primary-ifaddr: '{"ipv4":"10.1.112.133/23"}'
    k8s.ovn.org/node-subnets: '{"default":["10.128.0.0/23"]}'
    k8s.ovn.org/node-transit-switch-port-ifaddr: '{"ipv4":"100.88.0.2/16"}'
    k8s.ovn.org/remote-zone-migrated: control-plane-0
    k8s.ovn.org/zone-name: control-plane-0
    machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
    machineconfiguration.openshift.io/currentConfig: rendered-master-66bac8cef3aa44c784f4d13802a40684
    machineconfiguration.openshift.io/desiredConfig: rendered-master-66bac8cef3aa44c784f4d13802a40684
    machineconfiguration.openshift.io/desiredDrain: uncordon-rendered-master-66bac8cef3aa44c784f4d13802a40684
    machineconfiguration.openshift.io/lastAppliedDrain: uncordon-rendered-master-66bac8cef3aa44c784f4d13802a40684
    machineconfiguration.openshift.io/lastObservedServerCAAnnotation: "false"
    machineconfiguration.openshift.io/lastSyncedControllerConfigResourceVersion: "40487"
    machineconfiguration.openshift.io/reason: ""
    machineconfiguration.openshift.io/state: Done
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2024-05-13T07:00:07Z"
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/instance-type: vsphere-vm.cpu-4.mem-16gb.os-unknown
    beta.kubernetes.io/os: linux
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: control-plane-0
    kubernetes.io/os: linux
    node-role.kubernetes.io/control-plane: ""
    node-role.kubernetes.io/master: ""
    node.kubernetes.io/instance-type: vsphere-vm.cpu-4.mem-16gb.os-unknown
    node.openshift.io/os_id: rhcos
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:alpha.kubernetes.io/provided-node-ip: {}
          f:volumes.kubernetes.io/controller-managed-attach-detach: {}
        f:labels:
          .: {}
          f:beta.kubernetes.io/arch: {}
          f:beta.kubernetes.io/os: {}
          f:kubernetes.io/arch: {}
          f:kubernetes.io/hostname: {}
          f:kubernetes.io/os: {}
          f:node-role.kubernetes.io/control-plane: {}
          f:node-role.kubernetes.io/master: {}
          f:node.openshift.io/os_id: {}
    manager: kubelet
    operation: Update
    time: "2024-05-13T07:00:07Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          f:beta.kubernetes.io/instance-type: {}
          f:node.kubernetes.io/instance-type: {}
      f:spec:
        f:providerID: {}
    manager: vsphere-cloud-controller-manager
    operation: Update
    time: "2024-05-13T07:02:16Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:addresses:
          k:{"type":"ExternalIP"}:
            .: {}
            f:address: {}
            f:type: {}
    manager: vsphere-cloud-controller-manager
    operation: Update
    subresource: status
    time: "2024-05-13T07:02:16Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:k8s.ovn.org/network-ids: {}
          f:k8s.ovn.org/node-gateway-router-lrp-ifaddr: {}
          f:k8s.ovn.org/node-id: {}
          f:k8s.ovn.org/node-subnets: {}
          f:k8s.ovn.org/node-transit-switch-port-ifaddr: {}
    manager: control-plane-2
    operation: Update
    subresource: status
    time: "2024-05-13T07:03:03Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:k8s.ovn.org/host-cidrs: {}
          f:k8s.ovn.org/l3-gateway-config: {}
          f:k8s.ovn.org/node-chassis-id: {}
          f:k8s.ovn.org/node-mgmt-port-mac-address: {}
          f:k8s.ovn.org/node-primary-ifaddr: {}
          f:k8s.ovn.org/remote-zone-migrated: {}
          f:k8s.ovn.org/zone-name: {}
    manager: control-plane-0
    operation: Update
    subresource: status
    time: "2024-05-13T07:03:22Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:machineconfiguration.openshift.io/controlPlaneTopology: {}
          f:machineconfiguration.openshift.io/desiredConfig: {}
          f:machineconfiguration.openshift.io/lastAppliedDrain: {}
      f:spec:
        f:taints: {}
    manager: machine-config-controller
    operation: Update
    time: "2024-05-13T07:34:03Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:machineconfiguration.openshift.io/currentConfig: {}
          f:machineconfiguration.openshift.io/desiredDrain: {}
          f:machineconfiguration.openshift.io/lastObservedServerCAAnnotation: {}
          f:machineconfiguration.openshift.io/lastSyncedControllerConfigResourceVersion: {}
          f:machineconfiguration.openshift.io/reason: {}
          f:machineconfiguration.openshift.io/state: {}
    manager: machine-config-daemon
    operation: Update
    time: "2024-05-13T07:34:08Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:csi.volume.kubernetes.io/nodeid: {}
      f:status:
        f:conditions:
          k:{"type":"DiskPressure"}:
            f:lastHeartbeatTime: {}
          k:{"type":"MemoryPressure"}:
            f:lastHeartbeatTime: {}
          k:{"type":"PIDPressure"}:
            f:lastHeartbeatTime: {}
          k:{"type":"Ready"}:
            f:lastHeartbeatTime: {}
            f:lastTransitionTime: {}
            f:message: {}
            f:reason: {}
            f:status: {}
        f:images: {}
    manager: kubelet
    operation: Update
    subresource: status
    time: "2024-05-13T16:21:03Z"
  name: control-plane-0
  resourceVersion: "345702"
  uid: 6717bd4b-ce04-4027-97f9-aaec16a52185
spec:
  providerID: vsphere://420ffbfe-3c06-a48c-9598-911ad67d2200
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
status:
  addresses:
  - address: 10.1.112.133
    type: InternalIP
  - address: 10.1.112.133
    type: ExternalIP
  - address: control-plane-0
    type: Hostname
  allocatable:
    cpu: 3500m
    ephemeral-storage: "114345831029"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 15225992Ki
    pods: "250"
  capacity:
    cpu: "4"
    ephemeral-storage: 125238252Ki
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 16376968Ki
    pods: "250"
  conditions:
  - lastHeartbeatTime: "2024-05-13T16:21:03Z"
    lastTransitionTime: "2024-05-13T07:00:07Z"
    message: kubelet has sufficient memory available
    reason: KubeletHasSufficientMemory
    status: "False"
    type: MemoryPressure
  - lastHeartbeatTime: "2024-05-13T16:21:03Z"
    lastTransitionTime: "2024-05-13T07:00:07Z"
    message: kubelet has no disk pressure
    reason: KubeletHasNoDiskPressure
    status: "False"
    type: DiskPressure
  - lastHeartbeatTime: "2024-05-13T16:21:03Z"
    lastTransitionTime: "2024-05-13T07:00:07Z"
    message: kubelet has sufficient PID available
    reason: KubeletHasSufficientPID
    status: "False"
    type: PIDPressure
  - lastHeartbeatTime: "2024-05-13T16:21:03Z"
    lastTransitionTime: "2024-05-13T07:03:31Z"
    message: kubelet is posting ready status
    reason: KubeletReady
    status: "True"
    type: Ready
  daemonEndpoints:
    kubeletEndpoint:
      Port: 10250
  images:
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ec603de2a0d5c00eaf548ed035f33cd62f801771805ee42d7dca770f25562437
    sizeBytes: 2294038402
  - names:
    - registry.redhat.io/redhat/redhat-operator-index@sha256:2d419f95209f7d1f3cfcdaec16674c67370d3d9d1380b8900a3b70e4e58e8040
    - registry.redhat.io/redhat/redhat-operator-index@sha256:fa8330e2cdabaa48d52a74c17edf71b2ba202c7507c2d9650df5c48de610deec
    - registry.redhat.io/redhat/redhat-operator-index:v4.16
    sizeBytes: 1962505373
  - names:
    - registry.redhat.io/redhat/community-operator-index@sha256:11d3def91f16bdbfe10000dd622ad0cbc1f23f0e5a85a630009a6f1ec4dd191c
    - registry.redhat.io/redhat/community-operator-index@sha256:bd7d327c6a25eb64def1900b6390d0a57780eab1ee4d1083435dfcaddae8b2c9
    - registry.redhat.io/redhat/community-operator-index:v4.16
    sizeBytes: 1338000827
  - names:
    - registry.redhat.io/redhat/community-operator-index@sha256:f57ae23e5fb89687d5afb9ab5e59f4bed7d066f36a2e9a6eb8592425ba9c75f0
    sizeBytes: 1337627577
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:022e3cb7f2c5721742dbf229f4ecd9f160ac7f7fa8307e633c1380ee0c7cf16f
    sizeBytes: 1337151561
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e18affd45a3f53dc5b7969d64f97228088e05086fed87e7b21589956847414e5
    sizeBytes: 1210231458
  - names:
    - registry.redhat.io/redhat/certified-operator-index@sha256:b56fc790e7103cd3e2ca50176f4fa1490dedef51733c32c9af4de866dc3ccdcd
    - registry.redhat.io/redhat/certified-operator-index@sha256:dae1de9aa6bfa23e145e0c29d8e68fa6e32ef43ea800dc4a38bd1968fd9631b3
    - registry.redhat.io/redhat/certified-operator-index:v4.16
    sizeBytes: 1071346611
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b42a0ec761e18eb5a092304d5eebb65b3267733c23008484c124db80dfaa724c
    sizeBytes: 1055741778
  - names:
    - registry.redhat.io/redhat/redhat-marketplace-index@sha256:6e061f9df95bcd03dd68263e332c2cbaf6afecd15cd13fc8bf197102f7a266f1
    - registry.redhat.io/redhat/redhat-marketplace-index@sha256:bbf9ff9d5e26d740d3676c83c907c948cb045e26962c0d5b0bf14ee7be215214
    - registry.redhat.io/redhat/redhat-marketplace-index:v4.16
    sizeBytes: 966321575
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0f193fd514a3a51a47a4ef0e9a215530c8dbda1a30b4ba0dff86ba9494e43229
    sizeBytes: 862962998
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d49bc591308a9cabef4f54619308b5eabaf47280d7dbea74f1d18b035f90318d
    sizeBytes: 813894588
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e549958b7194788a13bc59704d495d1d11a87aa606f60182f4e441351b497370
    sizeBytes: 811482580
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b6a744492254024e9755f12932e41615509bf11e8ed8a442d7e03030e20c571b
    sizeBytes: 762516281
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9939ba044f40fd1859cd9e7c6b6918c1e1412db4626446a82c2e2ca11355ea5c
    sizeBytes: 673443090
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b8ae901d20adc830ca1fe51c1ee076aa8294546e603e124d12508375b27a300c
    sizeBytes: 666255286
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e436a19fd8734e614d7eb5910751a9c6c474c2a3e7895f7f4ae06c0c66de0054
    sizeBytes: 656872048
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7fc663a9d57ed773243c971829ed5c4b1e1700ceae3c30a2e7411af039e1853c
    sizeBytes: 641382458
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:56fbe5faf35a5c3ed376bfbfb7c9f4d9c40717ede2c6db5a0797653fee854ec9
    sizeBytes: 637253610
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2c04ab4922a16af3e85e03f115c82af84a8904e99b22c20fc4d87124f86f0414
    sizeBytes: 584816096
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4fa5e79b7d06fad4910cdddb4c38b8dfbce16cdd05f30d30c018fa0d0dbdb0ea
    sizeBytes: 556362576
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c51a31a687d96346596ac23ec578f35eabd4c43370e097cdde7c34d191c027ef
    sizeBytes: 550211555
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cf69bd6463080b8dd95f23b38e46ffa17eee69c155cd98ce701bba474864362b
    sizeBytes: 533626617
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4398dc8021b890dc1b1176fe3becdd3881d737e445ccfee8bff4eb77471caee3
    sizeBytes: 528673685
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:75887eabb55b351f2d56ca521bd86a39f0004f6cdcbdf25715627419022a8afe
    sizeBytes: 495516931
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d1ddf92666c7eb7df24ab2016dea070170bd1d28b0d61cecb855af82e847e689
    sizeBytes: 492338359
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7cd20de1f9ee0dce1ec10d99f507f56d7e512602ada21693a8541efeadf219d3
    sizeBytes: 491384490
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1d8c99a84ae1698b7365035f738eadd4fa1c3d8d1731a0d35344051a144e1e7c
    sizeBytes: 490916435
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:270cd1ec00c0ced6e84f03d430ec86febf15b43bb5a92a9c85fbb3ebec1ade3b
    sizeBytes: 487924149
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:14f4f0923989b58d9f0d490822426df57768f6580362b785a4ee12b16a2f3fe9
    sizeBytes: 487836690
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:707392053e60fc3d59ba6c56f00742e9f4a16ce76e47a5376c0040339f26e8b3
    sizeBytes: 486557444
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eb7b35b69ef6b5a84b65de992c2b86f7dad471d36408c6b55d3f69a92268e047
    sizeBytes: 485322590
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:baa1d7c9285d2e9798fa95876432f0651175a0dfa1d1f64b69202a6311d0d622
    sizeBytes: 485282522
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:63edcb49897ead4739771c506fdb9f9beca8e4c6c22cdce7f56cf1ca9f1c9679
    sizeBytes: 480383688
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6014b2c1b3927bc694d019427d4894e31b2146c681061a1b68ab524719825198
    sizeBytes: 476694015
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e9a0b004b04458c107f73dc25dc60bf492dd8fa12b400aeba4fecd1bad4b4241
    sizeBytes: 475476222
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9c5863100af97f244bc0af6c727785a7487eb8514ee549a41089bf6b06315e0f
    sizeBytes: 472404808
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:79401dadbf5b9a1c4269c7945d0cd48147250c054269846b4ee99a51a307acd7
    sizeBytes: 470297046
  - names:
    - registry.ci.openshift.org/ocp/release@sha256:0a4c3794aa399948c109ec28e6383cb7912c60268318f17958e0aedcb73361ac
    sizeBytes: 468495250
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:07a3036883a45d9a12497a3f58e1d04f70755529f757157cdc947b5be623e707
    sizeBytes: 456833220
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:930a67ebb2d819dc8ba057be981138cae633a10bbe74eb99009f5bdee6d0bb2c
    sizeBytes: 453199502
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5bc3c2908d66a993a802e02868f1508d9ec94a071008d0169b59d904b60dfba7
    sizeBytes: 449804207
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e890029b41d81be8f987d34d3d159c01ef29e13bed352e05a4cbbc1f7110cb67
    sizeBytes: 449182206
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9febea7366ab006c8cae4ca036969ba9bf26a60338491d0f1807e039078cb507
    sizeBytes: 448446016
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d7e3718799708fb27e21fba9638599f5ad6ce9cc976fc26ca8dcd34a94f4c283
    sizeBytes: 448063084
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bf0bb4a07f8f5a2c845f78eff2da9fb01a8aa813987174a6f9a83d4d7b3d38e9
    sizeBytes: 447745067
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2c211c19ee52675a66abe9a1a19dc7a3cbb6f429c140a24b5cb2f7ca30b760bf
    sizeBytes: 445262334
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:44001e0c59ad0d95aeef617a5baffe62a5f3f412dbbd1e96646c01a616393324
    sizeBytes: 441468672
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d1bb8dd476e603802cbb44c58d9c2b42ac4f8fa5cea128a75d6c88fbde6aab44
    sizeBytes: 440291405
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2038701bac255640224b030257478e95ec090a7c3ab947d427e3027bf9eab410
    sizeBytes: 435951700
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:217e3f5426f935611c9ae42ce5960d8e4497a4dbcbd2b090540b52f1a06976eb
    sizeBytes: 435004945
  nodeInfo:
    architecture: amd64
    bootID: bad97557-66e3-4e3a-8098-71ede9fc0490
    containerRuntimeVersion: cri-o://1.29.4-6.rhaos4.16.git0e93ae2.el9
    kernelVersion: 5.14.0-427.16.1.el9_4.x86_64
    kubeProxyVersion: v1.29.4+6c10b2d
    kubeletVersion: v1.29.4+6c10b2d
    machineID: 66dda98e8f5b432eb1f2b7ff3674ed9d
    operatingSystem: linux
    osImage: Red Hat Enterprise Linux CoreOS 416.94.202405081407-0
    systemUUID: fefb0f42-063c-8ca4-9598-911ad67d2200
[rider@localhost ocs-operator]$ 


@parth-gr parth-gr requested a review from BlaineEXE May 13, 2024 16:22
@parth-gr
Copy link
Member Author

parth-gr commented May 13, 2024

Another thing I can see is

managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:alpha.kubernetes.io/provided-node-ip: {}
          f:volumes.kubernetes.io/controller-managed-attach-detach: {}
        f:labels:
          .: {}
          f:beta.kubernetes.io/arch: {}
          f:beta.kubernetes.io/os: {}
          f:kubernetes.io/arch: {}
          f:kubernetes.io/hostname: {}
          f:kubernetes.io/os: {}
          f:node-role.kubernetes.io/control-plane: {}
          f:node-role.kubernetes.io/master: {}
          f:node.openshift.io/os_id: {}
    manager: kubelet
    operation: Update
    **_time: "2024-05-13T07:00:07Z_"**

This time which may be changed and doesn't require a reconcile @travisn @BlaineEXE any thoughts?

Copy link
Member

@BlaineEXE BlaineEXE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting one more unit test, and fix for specific gosec ignore.

@BlaineEXE
Copy link
Member

BlaineEXE commented May 13, 2024

This time which may be changed and doesn't require a reconcile @travisn @BlaineEXE any thoughts?

I have lots of thoughts, and too many questions. I wrote up a long reply to this, but ended up scrapping it because I finally learned something that I feel like is the final puzzle piece: the code path that emits this log is part of the existing predicate, which is already being used already to filter out unnecessary CephCluster reconciles.

func (c *clientCluster) handleNodeFailure(ctx context.Context, cluster *cephv1.CephCluster, node *corev1.Node) error {
watchForNodeLoss, err := k8sutil.GetOperatorSetting(ctx, c.context.Clientset, opcontroller.OperatorSettingConfigMapName, "ROOK_WATCH_FOR_NODE_FAILURE", "true")
if err != nil {
return pkgerror.Wrapf(err, "failed to get configmap value `ROOK_WATCH_FOR_NODE_FAILURE`.")
}
if strings.ToLower(watchForNodeLoss) != "true" {
logger.Debugf("not watching for node failures since `ROOK_WATCH_FOR_NODE_FAILURE` is set to %q", watchForNodeLoss)
return nil
}
_, err = c.context.ApiExtensionsClient.ApiextensionsV1().CustomResourceDefinitions().Get(ctx, "networkfences.csiaddons.openshift.io", metav1.GetOptions{})
if err != nil {
if errors.IsNotFound(err) {
logger.Debug("networkfences.csiaddons.openshift.io CRD not found, skip creating networkFence")
return nil
}
return pkgerror.Wrapf(err, "failed to get networkfences.csiaddons.openshift.io CRD, skip creating networkFence")
}

In our discussions in huddle, I have been under the impression (possibly my own fault) that this is a bug where Rook is reconciling too often, but I don't think that's the case, and I think that knowledge changes how we conceptualize these code changes. Because this log line is part of the predicate and not part of the reconcile logic, what this means to me is that we shouldn't immediately interpret the "bug" report to imply that the CephCluster is reconciling when it shouldn't be. We have to take it at face value, which is that the user is seeing the log line repeatedly, and nothing more.

While a user might be upset about seeing this log message often, I have no reason to jump to the conclusion that the CephCluster predicate is faulty. Users can disable that particular log message by setting ROOK_WATCH_FOR_NODE_FAILURE = "false".

IMO, unless the bug reporter can show that the CephCluster is being re-reconciled wihout need, I think the easiest solution is to tell the user to set ROOK_WATCH_FOR_NODE_FAILURE = "false".

We have the option to "optimize" the predicate and exit early in case we discover that only the resource version and heartbeat have been updated. However, this is a specific case we are optimizing, and we should make sure that we continue to process the rest of the predicate in all other cases to ensure we don't miss other important events. Because this is an optimization, I also think that we can close this PR if we want.

That said, I think the PR is basically complete, so we can continue with it as we like.
[Update]
However, I think we should update the description and commit message to make it more clear that this is not fixing a reconcile bug, but instead adding an early-exit condition for the predicate as an optimization.

@travisn
Copy link
Member

travisn commented May 13, 2024

@BlaineEXE Good catch, if this issue is really only about the log entry and not reconciles, I have two questions:

  1. This statement is debug level: "networkfences.csiaddons.openshift.io CRD not found, skip creating networkFence". We expect the debug logs to be more verbose, so this itself is not a problem.
  2. Without proof that extra reconciles are being triggered than needed, agreed we can close this PR since the events are already being filtered.

@microyahoo
Copy link
Member

This time which may be changed and doesn't require a reconcile @travisn @BlaineEXE any thoughts?

I have lots of thoughts, and too many questions. I wrote up a long reply to this, but ended up scrapping it because I finally learned something that I feel like is the final puzzle piece: the code path that emits this log is part of the existing predicate, which is already being used already to filter out unnecessary CephCluster reconciles.

func (c *clientCluster) handleNodeFailure(ctx context.Context, cluster *cephv1.CephCluster, node *corev1.Node) error {
watchForNodeLoss, err := k8sutil.GetOperatorSetting(ctx, c.context.Clientset, opcontroller.OperatorSettingConfigMapName, "ROOK_WATCH_FOR_NODE_FAILURE", "true")
if err != nil {
return pkgerror.Wrapf(err, "failed to get configmap value `ROOK_WATCH_FOR_NODE_FAILURE`.")
}
if strings.ToLower(watchForNodeLoss) != "true" {
logger.Debugf("not watching for node failures since `ROOK_WATCH_FOR_NODE_FAILURE` is set to %q", watchForNodeLoss)
return nil
}
_, err = c.context.ApiExtensionsClient.ApiextensionsV1().CustomResourceDefinitions().Get(ctx, "networkfences.csiaddons.openshift.io", metav1.GetOptions{})
if err != nil {
if errors.IsNotFound(err) {
logger.Debug("networkfences.csiaddons.openshift.io CRD not found, skip creating networkFence")
return nil
}
return pkgerror.Wrapf(err, "failed to get networkfences.csiaddons.openshift.io CRD, skip creating networkFence")
}

In our discussions in huddle, I have been under the impression (possibly my own fault) that this is a bug where Rook is reconciling too often, but I don't think that's the case, and I think that knowledge changes how we conceptualize these code changes. Because this log line is part of the predicate and not part of the reconcile logic, what this means to me is that we shouldn't immediately interpret the "bug" report to imply that the CephCluster is reconciling when it shouldn't be. We have to take it at face value, which is that the user is seeing the log line repeatedly, and nothing more.

While a user might be upset about seeing this log message often, I have no reason to jump to the conclusion that the CephCluster predicate is faulty. Users can disable that particular log message by setting ROOK_WATCH_FOR_NODE_FAILURE = "false".

IMO, unless the bug reporter can show that the CephCluster is being re-reconciled wihout need, I think the easiest solution is to tell the user to set ROOK_WATCH_FOR_NODE_FAILURE = "false".

We have the option to "optimize" the predicate and exit early in case we discover that only the resource version and heartbeat have been updated. However, this is a specific case we are optimizing, and we should make sure that we continue to process the rest of the predicate in all other cases to ensure we don't miss other important events. Because this is an optimization, I also think that we can close this PR if we want.

That said, I think the PR is basically complete, so we can continue with it as we like. [Update] However, I think we should update the description and commit message to make it more clear that this is not fixing a reconcile bug, but instead adding an early-exit condition for the predicate as an optimization.

yes, it is just a optimization, rather than bug. Although setting ROOK_WATCH_FOR_NODE_FAILURE to false can skip frequent warning message printing, I think if it's just resource version and heartbeat updates, we can exit early without performing the subsequent operations.

@parth-gr
Copy link
Member Author

parth-gr commented May 14, 2024

The PR adds,

  1. Early exist from the predicate logic.
    i) Helps in optimisation the logic
    ii) As we early exist cache will be controllere-runtime cache will be cleared early
    iii) There are two many API calls, which are very expensive in a distributed system

  2. ROOK_WATCH_FOR_NODE_FAILURE by default we kept it true but erroring it out in multiple times logs would be a bad experience.

  3. skip The node updates are not crucial to look for further logic.

@BlaineEXE
Copy link
Member

BlaineEXE commented May 15, 2024

  1. Early exist from the predicate logic.
    i) Helps in optimisation the logic
    ii) As we early exist cache will be controllere-runtime cache will be cleared early
    iii) There are two many API calls, which are very expensive in a distributed system

I don't think it would be genuine to tout all of these benefits. I would stop at saying that it is an optimization, and say no more in the commit message or PR description.

As I mentioned in huddle. There is nothing we can do in predicate logic that changes what controller-runtime caches. Controller runtime caches all objects that match the Watch(), regardless of predicate functions.

As for API calls, we can't necessarily say this for certain. If the predicate uses the controller-runtime client.Client, the API Get() calls actually only "get" from the cache. We only reduce API calls if we eliminate Kubernetes clientset usage. If we truly want to reduce the number of k8s API calls in Rook we would do better to focus on removing usage of the k8s clientset and shifting entirely to client.Client.

I would also argue that API calls aren't "expensive" for Rook when we consider the true scope. If we were part of the data plane, I agree, but the scale of our API calls when we are only configuring the data system is largely irrelevant, unless we are truly being wasteful. I think the idea that we need to eliminate all API calls because they are bad tends to be a black-and-white viewpoint that often results in developers implementing controller logic that is not properly idempotent, which introduces bugs. I would much rather prioritize correctness over the number of API calls, and I want to keep stressing this point because of how often I see API call optimization contribute to logic bugs.

@parth-gr
Copy link
Member Author

@BlaineEXE I do have updated the Description message, any other thoughts? or looks good?

@parth-gr parth-gr force-pushed the heartbeat-node branch 3 times, most recently from df510a2 to f7ec1c8 Compare May 20, 2024 16:08
pkg/operator/ceph/cluster/predicate.go Outdated Show resolved Hide resolved
currently node predicate watches for resourceVersion and heartbeat
changes, which are not critical things. So with the PR we will add
a early-exit condition for the predicate as an optimization and to
ignore considering for a reconcile of cephcluster if updates are
not crucial

closes: rook#14070

Signed-off-by: parth-gr <partharora1010@gmail.com>
@subhamkrai subhamkrai merged commit 161306a into rook:master May 21, 2024
53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[enhancement] ignore the update events of heartbeat time changed when reconcile node events
5 participants