OCPBUGS-23221: internalServiceChanged: Fix target port logic #1052

Miciah · 2024-04-30T20:06:16Z

`internalServiceChanged:` Fix target port logic

Fix internalServiceChanged to copy target ports from the expected service rather than preserving target ports from the current service. Only copy NodePort and other fields that we want to preserve.

Before this PR, the internalServiceChanged function explicitly preserved the current service's ports' target ports when reconciling the service. However, the function was intended to update the target ports if they changed.

In particular, if the current service specifies a numeric target port for the metrics port, then the operator should replace the port number with a port name so that the service uses the correct port even if the pod container's port number changes. Without this change, updating an IngressController's spec.endpointPublishingStrategy.hostNetwork.statsPort API field can break metrics as the service's metrics port's numeric target port may no longer match the pod container's metrics port.

Follow-up to #864.

`Test_*Changed`: Verify that some update was made

This is a follow-up to the previous change for consistency.

`crdChanged`, `nodePortServiceChanged`: Fix preserve logic

Fix the update logic in crdChanged and nodePortServiceChanged to preserve fields that are ignored when checking whether an update is necessary.

Follow-up to #567, #890, and #927.

Set SessionAffinity to avoid confusing diffs

If a service does not specify spec.sessionAffinity, the API sets a default value of "None". This means that when creating or updating a service, the empty value and "None" are equivalent, and so when reconciling a service, cluster-ingress-operator treats these values as equal. This results in the correct update behavior: The operator ignores the API defaulting, but if someone sets any value other than the empty value or "None", the operator reverts the update; and if the operator leaves spec.sessionAffinity empty when reverting the update, the API again defaults it to "None".

However, even the updating behavior is correct, it can cause confusing log output when the operator logs the diff with the changes: If the operator has the empty value in the updated object but the current value is actually "None", this shows up in the diff, even though the actual update does not change the field value when taking defaulting into account.

This change adds an explicit spec.sessionAffinity: None setting to the services that the operator manages to avoid having the API defaulting show up in these diffs.

Follow-up to #407.

`Test_desiredLoadBalancerService`: Add assertions

Add assertions for service type, externalTrafficPolicy, internalTrafficPolicy, and ports, for consistency with Test_desiredInternalIngressControllerService.

`Test_desiredInternalIngressControllerService`: Fix godoc

Fix capitalization of the test function name in the godoc.

`TestHostNetworkPortBinding`: Use `t.Log`, `t.Cleanup`

Update TestHostNetworkPortBinding to comply with current conventions. Convert some comments into t.Log statements, replace defer with t.Cleanup, and tweak some error messages.

`TestHostNetworkPortBinding`: Check service ports

Update TestHostNetworkPortBinding to verify that the internal service has the expected ports. In particular, the metrics port should have its target port set to the name "metrics" rather than a port number.

`Test_deploymentConfigChanged`: Use k8s.io/utils/ptr

Replace use of the deprecated "k8s.io/utils/pointer" package by using "k8s.io/utils/ptr" instead.

Set "service-ca-bundle" volume's default mode

Set an explicit value of 420 decimal (0644 octal) for the router deployment's "service-ca-bundle" volume's default mode.

Previously, the router deployment yaml manifest had defaultMode indented incorrectly, and so the operator did not actually set the intended value for the volume's default mode. Because the intended value is equal to the default value, the volume ultimately ends up with the intended default mode anyway. However, the fact that the operator did not specify the default mode made log output show a bogus diff for the volume when the operator updated the deployment.

After this change, the yaml is correctly indented, and the log output for a deployment update has a less noisy diff.

Follow-up to #399.

Specify default mode on configmap/secret volumes

Set an explicit value of 420 decimal (0644 octal) for the default mode of all configmap or secret volumes in the router deployment that are defined in Go code (as opposed to being defined in the yaml manifest): client-ca, default-certificate, error-pages, metrics-certs, rsyslog-config, service-ca-bundle, and stats-auth.

After this change, the log output for a deployment update has a less noisy diff.

Specify parameter values for router probes

Set explicit values for the liveness, readiness, and startup probes in the router deployment's yaml manifest.

Previously, the router deployment yaml manifest omitted these values when the intended values were equal to the default values. This meant that the API server would set the default values for these parameters, and so the probes ultimately ended up with the intended parameter values. However, omitting these values in the yaml manifest also had the unintended side-effect of making the operator log bogus diffs for these parameters whenever it updated the deployment.

After this change, the log output for a deployment update has a less noisy diff.

`copyProbe`: Refactor to group copying of defaults

Group together the assignments for the case that copyDefaultValues is true.

openshift-ci-robot · 2024-04-30T20:06:23Z

@Miciah: This pull request references Jira Issue OCPBUGS-23221, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.16.0) matches configured target version for branch (4.16.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @ShudiLi

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

internalServiceChanged: Fix target port logic

Fix internalServiceChanged to copy target ports from the expected service rather than preserving target ports from the current service.

Before this PR, the internalServiceChanged function explicitly preserved the current service's ports' target ports when reconciling the service. However, the function was intended to update the target ports if they changed.

In particular, if the current service specifies a numeric target port for the metrics port, then the operator should replace the port number with a port name so that the service uses the correct port even if the pod container's port number changes. Without this change, updating an IngressController's spec.endpointPublishingStrategy.hostNetwork.statsPort API field can break metrics as the service's metrics port's numeric target port may no longer match the pod container's metrics port.

Follow-up to #864.

Test_*Changed: Verify that some update was made

This is a follow-up to the previous change for consistency.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Miciah · 2024-05-01T13:42:29Z

ci/prow/e2e-aws-ovn-single-node failed with the following error:

 Updating install-config.yaml to a single m5d.2xlarge control plane node and 0 workers
/bin/bash: line 19: pip3: command not found

I filed OCPBUGS-33162 for this error.

Miciah · 2024-05-01T14:11:21Z

ci/prow/e2e-aws-operator and ci/prow/e2e-azure-operator both failed because TestRouteAdmissionPolicy failed, which is a known issue that is being tracked as part of OCPBUGS-26498.

candita · 2024-05-01T15:34:21Z

/assign

candita · 2024-05-01T22:09:24Z

@Miciah will this also need to be backported to 4.12, like #864 (comment) ?

openshift-ci-robot · 2024-05-01T23:14:15Z

@Miciah: This pull request references Jira Issue OCPBUGS-23221, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.16.0) matches configured target version for branch (4.16.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @ShudiLi

In response to this:

internalServiceChanged: Fix target port logic

Fix internalServiceChanged to copy target ports from the expected service rather than preserving target ports from the current service. Only copy NodePort and other fields that we want to preserve.

Before this PR, the internalServiceChanged function explicitly preserved the current service's ports' target ports when reconciling the service. However, the function was intended to update the target ports if they changed.

In particular, if the current service specifies a numeric target port for the metrics port, then the operator should replace the port number with a port name so that the service uses the correct port even if the pod container's port number changes. Without this change, updating an IngressController's spec.endpointPublishingStrategy.hostNetwork.statsPort API field can break metrics as the service's metrics port's numeric target port may no longer match the pod container's metrics port.

Follow-up to #864.

Test_*Changed: Verify that some update was made

This is a follow-up to the previous change for consistency.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Miciah · 2024-05-01T23:18:40Z

https://github.com/openshift/cluster-ingress-operator/compare/53baea6f04f80b40128edd4d55cae58e7dd13dc3..584ef427468eac9d8a5a1948e11ae26f673d3114 updates the logic to preserve the same fields that we ignore when checking whether an update is needed, and also adds an E2E test.

Miciah · 2024-05-01T23:19:38Z

@Miciah will this also need to be backported to 4.12, like #864 (comment) ?

Yeah, if anyone requests a backport and PM agrees, it makes sense to backport this PR.

Miciah · 2024-05-01T23:35:46Z

https://github.com/openshift/cluster-ingress-operator/compare/584ef427468eac9d8a5a1948e11ae26f673d3114..600ed28e6f713ec4fe0d2e94b37061f96bf61852 adds comments about keeping cmp options and updated.Spec.Foo = current.Spec.Foo assignments consistent and adds some missing assignments.

Miciah · 2024-05-02T03:43:30Z

ci/prow/e2e-azure-operator failed because the installer timed out waiting for the API server respond after bootstrap.

ci/prow/e2e-aws-operator failed because TestRouteAdmissionPolicy failed.

candita · 2024-05-02T23:00:31Z

pkg/operator/controller/ingress/internal_service.go

 	for i, updatedPort := range updated.Spec.Ports {
 		for _, currentPort := range current.Spec.Ports {
 			if currentPort.Name == updatedPort.Name {
-				updated.Spec.Ports[i].TargetPort = currentPort.TargetPort
+				updated.Spec.Ports[i].NodePort = currentPort.NodePort


Doesn't this mean that we are allowing the spec.ports.nodePorts to be changed from what we expect, to what is set in current?

If something/someone changed the nodePort, and only the nodePort, then L147-148 would return "unchanged" because we ignore it. However, if something changed the nodePort, and also changed something we don't ignore, and we get to here, then we change it. Am I missing something?

Doesn't this mean that we are allowing the spec.ports.nodePorts to be changed from what we expect, to what is set in current?

It does. This is explained in the comments:

cluster-ingress-operator/pkg/operator/controller/ingress/internal_service.go

Lines 120 to 122 in 4a306cb

// Ignore fields that the API, other controllers, or user may

// have modified. Note: This list must be kept consistent with

// the updated.Spec.Foo = current.Spec.Foo assignments below!

cluster-ingress-operator/pkg/operator/controller/ingress/internal_service.go

Lines 166 to 168 in 4a306cb

// Preserve fields that the API, other controllers, or user may have

// modified. Note: This list must be kept consistent with

// serviceCmpOpts above!

In particular, NodePort is set by the API, and so NodePort must be ignored in the comparison and preserved during the update so that the operator does not try to revert the value that the API set.

If something/someone changed the nodePort, and only the nodePort, then L147-148 would return "unchanged" because we ignore it. However, if something changed the nodePort, and also changed something we don't ignore, and we get to here, then we change it. Am I missing something?

updated.Spec.Ports[i].NodePort = currentPort.NodePort copies the value that was observed (currentPort.NodePort) to the final version of the service that we will use for the update (updated). That is, the assignment preserves the current NodePort value for updates.

Note that we initialize updated by copying the spec field from expected:

cluster-ingress-operator/pkg/operator/controller/ingress/internal_service.go

Lines 151 to 152 in 4a306cb

updated := current.DeepCopy()

updated.Spec = expected.Spec

Then if we want to preserve the current value for some subfield, we need to copy the value from current:

cluster-ingress-operator/pkg/operator/controller/ingress/internal_service.go

Lines 169 to 178 in 4a306cb

updated.Spec.ClusterIP = current.Spec.ClusterIP

updated.Spec.ClusterIPs = current.Spec.ClusterIPs

updated.Spec.ExternalIPs = current.Spec.ExternalIPs

updated.Spec.HealthCheckNodePort = current.Spec.HealthCheckNodePort

updated.Spec.IPFamilies = current.Spec.IPFamilies

updated.Spec.IPFamilyPolicy = current.Spec.IPFamilyPolicy

for i, updatedPort := range updated.Spec.Ports {

for _, currentPort := range current.Spec.Ports {

if currentPort.Name == updatedPort.Name {

updated.Spec.Ports[i].NodePort = currentPort.NodePort

At the end of all this, updated is the result of merging current and expected, copying the field values that the operator should update from expected and copying the field values that the operator should preserve (that is, leave alone) from current.

Still,

If something/someone changed the nodePort, and only the nodePort, then L147-148 would return "unchanged" because we ignore it.

Does that mean the change to nodePort won't get saved?

The return false, nil statement on line 147/148 means that updateInternalService won't do an update. In that case, there's no need to preserve NodePort's value—we're not doing an update anyway.

Okay. If we never expect current to hold any values that are different for the ignored fields, then why do we have to preserve them later?

The operator preserves the ignored fields so that it doesn't inadvertently update fields that we intend for it to ignore.

The operator ignores these fields because it isn't supposed to be managing them. In particular, the API sets NodePort, so the operator should never try to update NodePort, even if expected.Spec.Ports[i].NodePort differs from current.Spec.Ports[i].NodePort.

Maybe the word "expected" is misleading. expected has some fields where the operator manages the field and actually expects the field to have a certain value, as well as some fields where the operator is intended to ignore changes to the field and preserve the current value, whatever that value may be.

Ultimately, the operator should update the service if a managed field (for example, spec.internalTrafficPolicy) does not have the expected value, and the operator should update that particular field. However, the operator should not update the service if only ignored fields (for example, nodePort) changed, and the operator should not update an unmanaged field if the operator has to update the service because some managed field value needs an update.

Miciah · 2024-05-03T06:28:43Z

e2e-gcp-operator succeeded, and the operator logs have the expected output:

% curl -so ingress-operator.log https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/1052/pull-ci-openshift-cluster-ingress-operator-master-e2e-gcp-operator/1785815245374099456/artifacts/e2e-gcp-operator/gather-extra/artifacts/pods/openshift-ingress-operator_ingress-operator-58469b5f8b-6dqzj_ingress-operator.log
% grep -e 'created internal ingresscontroller service' -c -- ingress-operator.log                                                                                                                                       
83
% grep -e 'updated internal service' -- ingress-operator.log                                                                                      
2024-05-02T00:36:39.000Z        INFO    operator.ingress_controller     ingress/internal_service.go:44  updated internal service        {"namespace": "openshift-ingress", "name": "router-internal-reconcile-internal-service", "diff": "  &v1.Service{\n  \tTypeMeta:   {},\n  \tObjectMeta: {Name: \"router-internal-reconcile-internal-service\", Namespace: \"openshift-ingress\", UID: \"5ba7f8c8-9af0-4da4-82dd-22dbf88c5f94\", ResourceVersion: \"48575\", ...},\n  \tSpec: v1.ServiceSpec{\n  \t\tPorts: []v1.ServicePort{\n  \t\t\t{Name: \"http\", Protocol: \"TCP\", Port: 80, TargetPort: {Type: 1, StrVal: \"http\"}, ...},\n  \t\t\t{Name: \"https\", Protocol: \"TCP\", Port: 443, TargetPort: {Type: 1, StrVal: \"https\"}, ...},\n  \t\t\t{\n  \t\t\t\t... // 2 identical fields\n  \t\t\t\tAppProtocol: nil,\n  \t\t\t\tPort:        1936,\n  \t\t\t\tTargetPort: intstr.IntOrString{\n- \t\t\t\t\tType:   0,\n+ \t\t\t\t\tType:   1,\n- \t\t\t\t\tIntVal: 12345,\n+ \t\t\t\t\tIntVal: 0,\n- \t\t\t\t\tStrVal: \"\",\n+ \t\t\t\t\tStrVal: \"metrics\",\n  \t\t\t\t},\n  \t\t\t\tNodePort: 0,\n  \t\t\t},\n  \t\t},\n  \t\tSelector:                 {\"ingresscontroller.operator.openshift.io/deployment-ingresscontroller\": \"reconcile-internal-service\"},\n  \t\tClusterIP:                \"172.30.106.3\",\n  \t\tClusterIPs:               {\"172.30.106.3\"},\n  \t\tType:                     \"ClusterIP\",\n  \t\tExternalIPs:              nil,\n- \t\tSessionAffinity:          \"None\",\n+ \t\tSessionAffinity:          \"\",\n  \t\tLoadBalancerIP:           \"\",\n  \t\tLoadBalancerSourceRanges: nil,\n  \t\t... // 10 identical fields\n  \t},\n  \tStatus: {},\n  }\n"}
% grep -e 'updated internal service' -- ingress-operator.log | sed -e 's/^[^{]*//' | jq -r .diff
  &v1.Service{
        TypeMeta:   {},
        ObjectMeta: {Name: "router-internal-reconcile-internal-service", Namespace: "openshift-ingress", UID: "5ba7f8c8-9af0-4da4-82dd-22dbf88c5f94", ResourceVersion: "48575", ...},
        Spec: v1.ServiceSpec{
                Ports: []v1.ServicePort{
                        {Name: "http", Protocol: "TCP", Port: 80, TargetPort: {Type: 1, StrVal: "http"}, ...},
                        {Name: "https", Protocol: "TCP", Port: 443, TargetPort: {Type: 1, StrVal: "https"}, ...},
                        {
                                ... // 2 identical fields
                                AppProtocol: nil,
                                Port:        1936,
                                TargetPort: intstr.IntOrString{
-                                       Type:   0,
+                                       Type:   1,
-                                       IntVal: 12345,
+                                       IntVal: 0,
-                                       StrVal: "",
+                                       StrVal: "metrics",
                                },
                                NodePort: 0,
                        },
                },
                Selector:                 {"ingresscontroller.operator.openshift.io/deployment-ingresscontroller": "reconcile-internal-service"},
                ClusterIP:                "172.30.106.3",
                ClusterIPs:               {"172.30.106.3"},
                Type:                     "ClusterIP",
                ExternalIPs:              nil,
-               SessionAffinity:          "None",
+               SessionAffinity:          "",
                LoadBalancerIP:           "",
                LoadBalancerSourceRanges: nil,
                ... // 10 identical fields
        },
        Status: {},
  }

%

SessionAffinity is an unusual case; "None" and empty are equivalent, and the API defaults the value to "None", so it shows up in the diff. I realize the update logic is confusing, and seeing SessionAffinity in the diff may add to the confusion, so I'll push a commit to prevent it.

Miciah · 2024-05-03T14:29:25Z

Multiple CI jobs failed because Clone the correct source code into an image and tag it as src failed with 0/26 nodes are available. This is an issue with the CI cluster; the tests didn't even actually run.
/retest-required

candita · 2024-05-03T21:33:48Z

test/e2e/operator_test.go

+		t.Fatal(err)
+	}
+
+	findPort := func(name string, service *corev1.Service) *corev1.ServicePort {


Shouldn't this e2e test find/update the port on the IngressController, and check to make sure it is updated on the Service?

TestHostNetworkPortBinding verifies that updating IngressController.spec.endpointPublishingStrategy.hostNetwork.statsPort updates the deployment. TestReconcileInternalService verifies that the "metrics" port exists, and that the operator is managing it. 🤔... Would it make sense to add a check to TestHostNetworkPortBinding to verify that the service uses "metrics" (and not a number) as the target port?

3a56b8d adds an assertion on the expected ports in TestHostNetworkPortBinding.

TestHostNetworkPortBinding would have prevented the original bug if it was testing the same thing as the bug report. It is testing assertContainerHasPort(t, routerContainer, "metrics", 9936). Doesn't it need to assert that the service now has the port?

Or, maybe the new assertion for expected ports should have the new port numbers too? https://github.com/openshift/cluster-ingress-operator/pull/1052/files/cfba4d9ad778d01a6ab874682ad0c2d95f81eab0#diff-cf4b4e5424070a666a364d1d1b04011478d888d6d3d65c943ca7783333b7a6e4R1018-R1034

At least during the second check on https://github.com/openshift/cluster-ingress-operator/pull/1052/files/cfba4d9ad778d01a6ab874682ad0c2d95f81eab0#diff-cf4b4e5424070a666a364d1d1b04011478d888d6d3d65c943ca7783333b7a6e4R1093

TestHostNetworkPortBinding would have prevented the original bug if it was testing the same thing as the bug report. It is testing assertContainerHasPort(t, routerContainer, "metrics", 9936). Doesn't it need to assert that the service now has the port?

Is the assertion that I added what you expected?

Or, maybe the new assertion for expected ports should have the new port numbers too? https://github.com/openshift/cluster-ingress-operator/pull/1052/files/cfba4d9ad778d01a6ab874682ad0c2d95f81eab0#diff-cf4b4e5424070a666a364d1d1b04011478d888d6d3d65c943ca7783333b7a6e4R1018-R1034

The port changes on the pod, but the port doesn't need to change on the service if the service just references the target port by name instead of by number. That's the idea behind #864. The fault is that the update logic didn't actually work properly, and so the service would still have port 1936 instead of port "metrics".

candita · 2024-05-03T22:01:00Z

pkg/operator/controller/ingress/monitoring_test.go

-		t.Fatal("serviceMonitorChanged does not behave as a fixed-point function")
+	} else {
+		if updatedChanged, _ := serviceMonitorChanged(sm1, sm3); !updatedChanged {
+			t.Error("serviceMonitorChangedChanged reported changes but did not make any update")


nit:

Suggested change

t.Error("serviceMonitorChangedChanged reported changes but did not make any update")

t.Error("serviceMonitorChanged reported changes but did not make any update")

Thanks! Fixed in https://github.com/openshift/cluster-ingress-operator/compare/3a56b8dbad66a5ca58fdd7d140985872e119c173..3d06500a64b02ae95d533bebc03fd4bee019e05b.

candita · 2024-05-03T22:09:42Z

pkg/operator/controller/ingress/internal_service_test.go

nit: you could add a test case for NodePort changes.

Added in https://github.com/openshift/cluster-ingress-operator/compare/3d06500a64b02ae95d533bebc03fd4bee019e05b..cfba4d9ad778d01a6ab874682ad0c2d95f81eab0.

Fix internalServiceChanged to copy target ports from the expected service rather than preserving target ports from the current service. Only copy NodePort and other fields that we want to preserve. Before this commit, the internalServiceChanged function explicitly preserved the current service's ports' target ports when reconciling the service. However, the function was intended to update the target ports if they changed. In particular, if the current service specifies a numeric target port for the metrics port, then the operator should replace the port number with a port name so that the service uses the correct port even if the pod container's port number changes. Without this change, updating an IngressController's spec.endpointPublishingStrategy.hostNetwork.statsPort API field can break metrics as the service's metrics port's numeric target port may no longer match the pod container's metrics port. Follow-up to commit 44a60f5 and commit 9c6e368. This commit fixes OCPBUGS-23221 https://issues.redhat.com/browse/OCPBUGS-23221 * pkg/operator/controller/ingress/internal_service.go (internalServiceChanged): Fix logic that preserves the current service's fields not to preserve the ports' target ports and instead to preserve the same fields that we ignore with serviceCmpOpts. * pkg/operator/controller/ingress/internal_service_test.go (Test_internalServiceChanged): Verify that internalServiceChanged actually updates the service when it reports that the service has changed. Add a test case to verify that updates to NodePort are ignored. * test/e2e/operator_test.go (TestReconcileInternalService): New test to verify that the operator reconciles the internal service's metrics port's target port by setting a bogus target port and checking that the operator reverts the test's update. * test/e2e/all_test.go (TestAll): Add TestReconcileInternalService to parallel tests.

This is a follow-up to the previous commit for consistency. * pkg/operator/controller/canary/daemonset_test.go (Test_canaryDaemonsetChanged): * pkg/operator/controller/canary/namespace_test.go (Test_canaryNamespaceChanged): * pkg/operator/controller/canary/route_test.go (Test_canaryRouteChanged): * pkg/operator/controller/ingress/deployment_test.go (Test_deploymentConfigChanged): * pkg/operator/controller/ingress/load_balancer_service_test.go (Test_loadBalancerServiceChanged) (Test_loadBalancerServiceAnnotationsChanged): * pkg/operator/controller/ingress/monitoring_test.go (Test_serviceMonitorChanged): * pkg/operator/controller/ingress/namespace_test.go (Test_routerNamespaceChanged): * pkg/operator/controller/ingress/nodeport_service_test.go (Test_nodePortServiceChanged): * pkg/operator/controller/ingress/poddisruptionbudget_test.go (Test_podDisruptionBudgetChange): * pkg/operator/controller/ingressclass/ingressclass_test.go (Test_ingressClassChanged): Verify that the function made some update to the object if it reported that the object changed.

Fix the update logic in crdChanged and nodePortServiceChanged to preserve fields that are ignored when checking whether an update is necessary. Follow-up to commit 4a306cb, commit dd962af, commit 5579aa1, and commit d7ffd6e. * pkg/operator/controller/gatewayapi/crds.go (crdChanged): Preserve spec.Conversion. * pkg/operator/controller/ingress/nodeport_service.go (nodePortServiceChanged): Preserve spec.clusterIPs, spec.ipFamilies, and spec.ipFamilyPolicy.

Set explicit values for the liveness, readiness, and startup probes in the router deployment's yaml manifest. Previously, the router deployment yaml manifest omitted these values when the intended values were equal to the default values. This meant that the API server would set the default values for these parameters, and so the probes ultimately ended up with the intended parameter values. However, omitting these values in the yaml manifest also had the unintended side-effect of making the operator log bogus diffs for these parameters whenever it updated the deployment. After this commit, the log output for a deployment update has a less noisy diff. * pkg/manifests/assets/router/deployment.yaml: Specify explicit values for the failureThreshold, httpGet.scheme, periodSeconds, successThreshold, and timeoutSeconds parameters for the liveness, readiness, and startup probes. * pkg/operator/controller/ingress/deployment.go (copyProbe): Add a copyDefaultValues parameter. If this parameter is true, copyProbe will copy the value for a field even if is equal to the default value. (hashableProbe): Call copyProbe with copyDefaultValues set to false so that empty and default values result in the same hash. (deploymentConfigChanged): Call copyProbe with copyDefaultValues set to true so that default values get copied to the updated object gets diffed for logging. * pkg/operator/controller/ingress/deployment_test.go (checkProbes): New helper function to verify that the given container has the expected probes and probe parameters. (TestDesiredRouterDeploymentSpecAndNetwork): (TestDesiredRouterDeploymentVariety): Use checkProbes.

Miciah · 2024-05-17T10:04:44Z

I realized that I was still missing defaultMode for some configmap and secret volumes, and also that the unit tests were rather lax (and in one case misleading, checking for a nonexistent "error-pages" volume). https://github.com/openshift/cluster-ingress-operator/compare/4d0f52090379819b1b50af60d060d167a1629ba1..2f0e7872151af6cd62bc88ad26403d166f1259c9 addresses defaultMode for some additional volumes and improves test coverage.

Miciah · 2024-05-17T12:56:03Z

/test e2e-aws-operator
/test e2e-azure-operator
/test e2e-gcp-operatotr

Miciah · 2024-05-17T12:56:26Z

/test e2e-gcp-operator

Miciah · 2024-05-17T14:08:09Z

/test e2e-gcp-operator

candita · 2024-05-17T19:12:57Z

For deployments, the diffs have some noise for the volume DefaultModes parameter and for various probe parameters. I can push a commit like 1badfc6 to fix those if you like.

As this PR is blocked on #1054 anyway, I went ahead and pushed commits to fix these bogus diffs. I'll check the diffs in the ingress-operator logs when we get back the results from a new CI run.

I noticed your changes. I think this is causing an issue in a 4.12 bug that I'm looking into. Posted in slack.

Update: we may need to backport to 4.12.

candita · 2024-05-17T19:19:09Z

Internal service and NodePort diffs I am seeing in 4.12:

- \t\tClusterIPs:               []string{\"10.20.134.56\"},\n
+ \t\tClusterIPs:               nil,\n  

- \t\tSessionAffinity:          \"None\",\n
+ \t\tSessionAffinity:          \"\",\n  

- \t\tIPFamilies:                    []v1.IPFamily{\"IPv4\"},\n
+ \t\tIPFamilies:                    nil,\n

- \t\tIPFamilyPolicy:                &\"SingleStack\",\n
+ \t\tIPFamilyPolicy:                nil,\n  

- \t\tInternalTrafficPolicy:         &\"Cluster\",\n
+ \t\tInternalTrafficPolicy:         nil,\n

candita · 2024-05-17T21:10:40Z

Looks like nodes never became available:

level=error msg=ReadyIngressNodesAvailable: Authentication requires functional ingress which requires at least one schedulable and ready node. Got 0 worker nodes, 3 master nodes, 0 custom target nodes (none are schedulable or ready for ingress pods).

/test e2e-gcp-ovn

candita · 2024-05-17T21:19:25Z

Regarding this failure in e2e-hypershift:

{"level":"error","ts":"2024-05-17T10:20:44Z","msg":"failed to setup shared OIDC provider","error":"failed to create OIDC provider: LimitExceeded: Cannot exceed quota for OpenIdConnectProvidersPerAccount: 100\n\tstatus code: 409, request id: 9f217247-4e4b-456b-9ab8-a8484f6ccb08","stacktrace":"github.com/openshift/hypershift/test/e2e.main\n\t/go/src/github.com/openshift/hypershift/test/e2e/e2e_test.go:170\ngithub.com/openshift/hypershift/test/e2e.TestMain\n\t/go/src/github.com/openshift/hypershift/test/e2e/e2e_test.go:136\nmain.main\n\t_testmain.go:69\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:267"}

Slack thread discussing quota related issues for Hypershift: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1715956957922319

Miciah · 2024-05-18T01:29:02Z

For deployments, the diffs have some noise for the volume DefaultModes parameter and for various probe parameters. I can push a commit like 1badfc6 to fix those if you like.

As this PR is blocked on #1054 anyway, I went ahead and pushed commits to fix these bogus diffs. I'll check the diffs in the ingress-operator logs when we get back the results from a new CI run.

I noticed your changes. I think this is causing an issue in a 4.12 bug that I'm looking into. Posted in slack.

Update: we may need to backport to 4.12.

Right, it makes sense to backport at least add17d7 to 4.12 as #864 was backported to 4.12.

Internal service and NodePort diffs I am seeing in 4.12:

- \t\tClusterIPs:               []string{\"10.20.134.56\"},\n
+ \t\tClusterIPs:               nil,\n  

- \t\tSessionAffinity:          \"None\",\n
+ \t\tSessionAffinity:          \"\",\n  

- \t\tIPFamilies:                    []v1.IPFamily{\"IPv4\"},\n
+ \t\tIPFamilies:                    nil,\n

- \t\tIPFamilyPolicy:                &\"SingleStack\",\n
+ \t\tIPFamilyPolicy:                nil,\n

Right, these should be fixed by d21abc8 and 1badfc6, so we can backport those too.

- \t\tInternalTrafficPolicy:         &\"Cluster\",\n
+ \t\tInternalTrafficPolicy:         nil,\n

This should have been fixed by 5579aa1, which is proposed for backport to 4.13. If you're seeing OCPBUGS-13190 on 4.12, please add 4.12 to OCPBUGS-13190's "Affects Versions", and we can propose backporting #927 to 4.12 as well.

The diffs in the ingress-operator logs from the latest e2e-gcp-operator run look really clean now. They are too large to paste in a comment, but you can review them using the following commands:

% curl -s 'https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/1052/pull-ci-openshift-cluster-ingress-operator-master-e2e-gcp-operator/1791470779821461504/artifacts/e2e-gcp-operator/gather-extra/artifacts/pods/openshift-ingress-operator_ingress-operator-57f7fb8f95-8p27t_ingress-operator.log' -o ingress-operator.log
% sed < ingress-operator.log -n -e $'/diff/ {s/\\\\n/\\\n/g;s/\\\\t/\t/g;p;}'

However, three anomalies jumped out at me. First, the LoadBalancer-type service for TestAllowedSourceRanges has an inexplicable, delayed second update:

2024-05-17T15:07:39.258Z        INFO    operator.ingress_controller     ingress/load_balancer_service.go:321    updated load balancer service   {"namespace": "openshift-ingress", "name": "router-sourcerange", "diff": "  &v1.Service{
        TypeMeta:   {},
        ObjectMeta: {Name: \"router-sourcerange\", Namespace: \"openshift-ingress\", UID: \"bf35e2d3-bb10-498e-b771-0659315fc65e\", ResourceVersion: \"44908\", ...},
        Spec: v1.ServiceSpec{
                ... // 6 identical fields
                SessionAffinity:          \"None\",
                LoadBalancerIP:           \"\",
-               LoadBalancerSourceRanges: []string{\"127.0.0.0/8\"},
+               LoadBalancerSourceRanges: []string{\"10.0.0.0/8\"},
                ExternalName:             \"\",
                ExternalTrafficPolicy:    \"Local\",
                ... // 8 identical fields
        },
        Status: {LoadBalancer: {Ingress: {{IP: \"34.29.9.29\"}}}},
  }
"}
2024-05-17T15:25:51.134Z        INFO    operator.ingress_controller     ingress/load_balancer_service.go:321    updated load balancer service   {"namespace": "openshift-ingress", "name": "router-sourcerange", "diff": "  &v1.Service{
        TypeMeta:   {},
        ObjectMeta: {Name: \"router-sourcerange\", Namespace: \"openshift-ingress\", UID: \"021a0f6b-e316-4d0a-9aaa-15189dded4f6\", ResourceVersion: \"58305\", ...},
        Spec: v1.ServiceSpec{
                ... // 6 identical fields
                SessionAffinity:          \"None\",
                LoadBalancerIP:           \"\",
-               LoadBalancerSourceRanges: []string{\"127.0.0.0/8\"},
+               LoadBalancerSourceRanges: []string{\"10.0.0.0/8\"},
                ExternalName:             \"\",
                ExternalTrafficPolicy:    \"Local\",
                ... // 8 identical fields
        },
        Status: {LoadBalancer: {Ingress: {{IP: \"34.16.7.141\"}}}},
  }
"}

The second update happens about 18 minutes after the first update. It also happens after the serial tests appear to have finished. I suspect it has something to do with the fact that TestAllowedSourceRanges uses t.Cleanup to delete its IngressController; maybe the cleanup is deleted for that reason, though that doesn't fully explain why the service gets updated a second time (it should just get deleted when the cleanup happens).

The second anomaly that I noticed was this update from the TestRouteHardStopAfterEnableOnIngressConfig test:

                                                        {
                                                                Name:      \"ROUTER_DISABLE_HTTP2\",
-                                                               Value:     \"false\",
+                                                               Value:     \"true\",
                                                                ValueFrom: nil,
                                                        },
                                                        {Name: \"ROUTER_DISABLE_NAMESPACE_OWNERSHIP_CHECK\", Value: \"false\"},
                                                        {Name: \"ROUTER_DOMAIN\", Value: \"apps.ci-op-tg7xmq00-76b3b.origin-ci-int-gce.dev.rhcloud.com\"},
+                                                       {Name: \"ROUTER_HARD_STOP_AFTER\", Value: \"3h20m0s\"},

The "3h20m" value has to be from this update:

cluster-ingress-operator/test/e2e/hard_stop_after_test.go

Line 49 in 487fe81

    
           if err := hardStopAfterTestIngressConfig(t, kclient, ingressConfig, routerDeployment, (200 * time.Minute).String()); err != nil {

However, TestRouteHardStopAfterEnableOnIngressConfig is a serial test, and so are TestRouteHTTP2EnableAndDisableIngressController and TestRouteHTTP2EnableAndDisableIngressConfig, so it isn't clear to me why a single update would modify both ROUTER_DISABLE_HTTP2 and ROUTER_HARD_STOP_AFTER. @frobware, any thoughts on this?

The third anomaly I noticed was this:

% grep -e 'successfully updated Infra CR with Ingress Load Balancer IPs' -m 1 -- ingress-operator.log       
2024-05-17T14:46:01.434Z	INFO	operator.ingress_controller	ingress/controller.go:326	successfully updated Infra CR with Ingress Load Balancer IPs
% grep -e 'successfully updated Infra CR with Ingress Load Balancer IPs' -c -- ingress-operator.log 
142

#1016 has a logic error, which causes the operator to log this message even when it didn't do an update:

cluster-ingress-operator/pkg/operator/controller/ingress/controller.go

Lines 1135 to 1145 in 009644a

    
           // If the lbService exists for the "default" IngressController, then update Infra CR's PlatformStatus with the Ingress LB IPs. 
        
           if haveLB && ci.Name == manifests.DefaultIngressControllerName { 
        
           	if updated, err := computeUpdatedInfraFromService(lbService, infraConfig); err != nil { 
        
           		errs = append(errs, fmt.Errorf("failed to update Infrastructure PlatformStatus: %w", err)) 
        
           	} else if updated { 
        
           		if err := r.client.Status().Update(context.TODO(), infraConfig); err != nil { 
        
           			errs = append(errs, fmt.Errorf("failed to update Infrastructure CR after updating Ingress LB IPs: %w", err)) 
        
           		} 
        
           	} 
        
           	log.Info("successfully updated Infra CR with Ingress Load Balancer IPs") 
        
           }

(I've actually noticed this before, but I never got around to filing a bug report...). #1016 was backported to 4.15, so it would be nice to fix it and backport the fix to 4.15. It's just log noise, though, so it might not be justifiable to backport on its own. It is rather noisy, though, and it's trivial to fix, so if you like, I can bundle a fix for this log message into this PR.

ShudiLi · 2024-05-20T07:30:41Z

/label qe-approved

tested it 4.16.0-0.ci.test-2024-05-20-061242-ci-ln-qyfl0pt-latest

1.
% oc get clusterversion
NAME      VERSION                                                   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-0.ci.test-2024-05-20-061242-ci-ln-qyfl0pt-latest   True        False         10m     Cluster version is 4.16.0-0.ci.test-2024-05-20-061242-ci-ln-qyfl0pt-latest

2.
% oc create -f ingress-hostnetwork.yaml 
ingresscontroller.operator.openshift.io/int1 created
% cat ingress-hostnetwork.yaml                                                         
apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: int1
  namespace: openshift-ingress-operator
spec:
  endpointPublishingStrategy:
    hostNetwork:
      httpPort: 23080
      httpsPort: 23443
      statsPort: 23936
    type: HostNetwork
  domain: int1.ci-ln-qyfl0pt-c1627.vmc-ci.devcluster.openshift.com

3.
% oc -n openshift-ingress get pods
NAME                             READY   STATUS    RESTARTS   AGE
router-default-74f5cbcf6-qgtqx   1/1     Running   0          38m
router-default-74f5cbcf6-t742h   1/1     Running   0          38m
router-int1-c56c7dd96-4b72b      1/1     Running   0          11m
router-int1-c56c7dd96-4nvb5      1/1     Running   0          11m

4.
% oc -n openshift-ingress get svc router-internal-int1 -o=jsonpath="{.spec.ports}"
[{"name":"http","port":80,"protocol":"TCP","targetPort":"http"},{"name":"https","port":443,"protocol":"TCP","targetPort":"https"},{"name":"metrics","port":1936,"protocol":"TCP","targetPort":"metrics"}]% 

5.
 % oc -n openshift-ingress get ep router-internal-int1 -o=jsonpath="{.subsets[0].ports}"
[{"name":"metrics","port":23936,"protocol":"TCP"},{"name":"https","port":23443,"protocol":"TCP"},{"name":"http","port":23080,"protocol":"TCP"}]% 

6.
% oc -n openshift-ingress get deployment router-int1 -oyaml | grep -E -A9 "ivenessProbe:|readinessProbe:|startupProbe:"
        livenessProbe:
          failureThreshold: 3
          httpGet:
            host: localhost
            path: /healthz
            port: 23936
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          terminationGracePeriodSeconds: 10
--
        readinessProbe:
          failureThreshold: 3
          httpGet:
            host: localhost
            path: /healthz/ready
            port: 23936
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
--
        startupProbe:
          failureThreshold: 120
          httpGet:
            host: localhost
            path: /healthz/ready
            port: 23936
            scheme: HTTP
          periodSeconds: 1
          successThreshold: 1
          timeoutSeconds: 1

7.
% oc -n openshift-ingress get deployment router-int1 -oyaml | grep -A6 " - configMap:"
      - configMap:
          defaultMode: 420
          items:
          - key: service-ca.crt
            path: service-ca.crt
          name: service-ca-bundle
          optional: false

8.
% oc -n openshift-ingress get svc router-internal-int1 -o=jsonpath="{.spec.sessionAffinity}" 
None%                    

9.  create a load balancer ingress controller and check its service
% print $(oc -n openshift-ingress get svc router-int2 -o=jsonpath="{.spec.type}\n{.spec.sessionAffinity}")
LoadBalancer
None

candita · 2024-05-20T17:40:00Z

/test e2e-hypershift

candita · 2024-05-20T18:02:03Z

@Miciah thanks for the thorough analysis and the bug numbers we should plan to backport to fix all the spurious updates seen in 4.12.40. At the moment I think we can safely ignore the logs from CI testing. I have a summary of all the spurious updates seen in 4.12.40 and will coordinate the backports of this bug and any other new one needed, including one to address #1016

candita · 2024-05-20T18:05:06Z

pkg/manifests/assets/router/deployment.yaml

Thanks! That is a lot of overlooked deployment updates.

candita · 2024-05-20T18:06:32Z

pkg/operator/controller/ingress/deployment.go

@@ -1514,7 +1520,7 @@ func hashableProbe(probe *corev1.Probe) *corev1.Probe {

 	var hashableProbe corev1.Probe

-	copyProbe(probe, &hashableProbe)
+	copyProbe(probe, &hashableProbe, false)


This change will make a big difference.

candita · 2024-05-20T18:11:58Z

pkg/operator/controller/ingress/deployment.go

@@ -1645,7 +1652,7 @@ func copyProbe(a, b *corev1.Probe) {
 			Port: a.ProbeHandler.HTTPGet.Port,
 			Host: a.ProbeHandler.HTTPGet.Host,


Should the entire function be gated by the copyDefaultValues? I see here it still allows changes to HTTPGet and TerminationGracePeriod. In the future, how will developers know which values to allow changes, and which to prevent changes?

Port is required, and the default value for Host is the empty string, so I believe we can copy them unconditionally.

In that case, if you wouldn't mind, can you add a comment like Values that are always updated and group them, and another comment Values that are only updated when requested and group them all under a
if copyDefaultValues {. I feel like this may help/force future developers to make a decision about what to do with values in copyProbe.

Done in 3faf391.

candita · 2024-05-20T20:05:52Z

/test e2e-aws-ovn-single-node

Miciah · 2024-05-21T08:03:02Z

we should plan to backport to fix all the spurious updates seen in 4.12.40.

There have been a number of fixes for spurious updates since 4.12 (though not all of these have been confirmed to affect 4.12 generally or 4.12.40 specifically):

OCPBUGS-6701: Avoid spurious updates for scope in IngressClass #879 to fix OCPBUGS-6701.
OCPBUGS-13190: Avoid spurious updates for internalTrafficPolicy #927 to fix OCPBUGS-13190.
OCPBUGS-14995: desiredRouterDeployment: Set HostPort if needed #947 to fix OCPBUGS-14995.
OCPBUGS-14994: Don't add clientca-configmap finalizer if deleting #948 to fix OCPBUGS-14994.
OCPBUGS-23221: internalServiceChanged: Fix target port logic #1052 (this PR) to fix OCPBUGS-23221 and other issues.

At the moment I think we can safely ignore the logs from CI testing.

I don't understand what you mean by that. I think we should always be looking at the logs from CI testing, especially when we are modifying update logic, and most especially when we are specifically trying to address spurious updates. Do you mean that we should ignore the three issues that I mentioned in #1052 (comment) for now and not try to fix the "successfully updated Infra CR with Ingress Load Balancer IPs" log message or diagnose the other anomalies?

I have a summary of all the spurious updates seen in 4.12.40 and will coordinate the backports of this bug and any other new one needed, including one to address #1016

All right, I will hold off on fixing the "successfully updated Infra CR with Ingress Load Balancer IPs" issue until a separate bug report is filed.

In my opinion, we should backport in chronological order (#879 first, then #927, then #947, and so on) in order to keep the history consistent among branches and avoid conflicts as much as possible, but we will encounter merge conflicts along the way in any case with some of these PRs (certainly with #1052).

candita · 2024-05-21T22:03:56Z

I don't understand what you mean by that. I think we should always be looking at the logs from CI testing, especially when we are modifying update logic, and most especially when we are specifically trying to address spurious updates. Do you mean that we should ignore the three issues that I mentioned in #1052 (comment) for now and not try to fix the "successfully updated Infra CR with Ingress Load Balancer IPs" log message or diagnose the other anomalies?

Yes, that is what I meant.

candita · 2024-05-23T22:41:14Z

Three bugs created for the anomalies listed in #1052 (comment) :

* pkg/operator/controller/ingress/deployment.go (copyProbe): Group together the assignments for the case that copyDefaultValues is true.

openshift-ci-robot · 2024-05-24T08:52:21Z

@Miciah: This pull request references Jira Issue OCPBUGS-23221, which is invalid:

expected the bug to target either version "4.17." or "openshift-4.17.", but it targets "4.16.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

internalServiceChanged: Fix target port logic

Fix internalServiceChanged to copy target ports from the expected service rather than preserving target ports from the current service. Only copy NodePort and other fields that we want to preserve.

Before this PR, the internalServiceChanged function explicitly preserved the current service's ports' target ports when reconciling the service. However, the function was intended to update the target ports if they changed.

In particular, if the current service specifies a numeric target port for the metrics port, then the operator should replace the port number with a port name so that the service uses the correct port even if the pod container's port number changes. Without this change, updating an IngressController's spec.endpointPublishingStrategy.hostNetwork.statsPort API field can break metrics as the service's metrics port's numeric target port may no longer match the pod container's metrics port.

Follow-up to #864.

Test_*Changed: Verify that some update was made

This is a follow-up to the previous change for consistency.

crdChanged, nodePortServiceChanged: Fix preserve logic

Fix the update logic in crdChanged and nodePortServiceChanged to preserve fields that are ignored when checking whether an update is necessary.

Follow-up to #567, #890, and #927.

Set SessionAffinity to avoid confusing diffs

If a service does not specify spec.sessionAffinity, the API sets a default value of "None". This means that when creating or updating a service, the empty value and "None" are equivalent, and so when reconciling a service, cluster-ingress-operator treats these values as equal. This results in the correct update behavior: The operator ignores the API defaulting, but if someone sets any value other than the empty value or "None", the operator reverts the update; and if the operator leaves spec.sessionAffinity empty when reverting the update, the API again defaults it to "None".

However, even the updating behavior is correct, it can cause confusing log output when the operator logs the diff with the changes: If the operator has the empty value in the updated object but the current value is actually "None", this shows up in the diff, even though the actual update does not change the field value when taking defaulting into account.

This change adds an explicit spec.sessionAffinity: None setting to the services that the operator manages to avoid having the API defaulting show up in these diffs.

Follow-up to #407.

Test_desiredLoadBalancerService: Add assertions

Add assertions for service type, externalTrafficPolicy, internalTrafficPolicy, and ports, for consistency with Test_desiredInternalIngressControllerService.

Test_desiredInternalIngressControllerService: Fix godoc

Fix capitalization of the test function name in the godoc.

TestHostNetworkPortBinding: Use t.Log, t.Cleanup

Update TestHostNetworkPortBinding to comply with current conventions. Convert some comments into t.Log statements, replace defer with t.Cleanup, and tweak some error messages.

TestHostNetworkPortBinding: Check service ports

Update TestHostNetworkPortBinding to verify that the internal service has the expected ports. In particular, the metrics port should have its target port set to the name "metrics" rather than a port number.

Test_deploymentConfigChanged: Use k8s.io/utils/ptr

Replace use of the deprecated "k8s.io/utils/pointer" package by using "k8s.io/utils/ptr" instead.

Set "service-ca-bundle" volume's default mode

Set an explicit value of 420 decimal (0644 octal) for the router deployment's "service-ca-bundle" volume's default mode.

Previously, the router deployment yaml manifest had defaultMode indented incorrectly, and so the operator did not actually set the intended value for the volume's default mode. Because the intended value is equal to the default value, the volume ultimately ends up with the intended default mode anyway. However, the fact that the operator did not specify the default mode made log output show a bogus diff for the volume when the operator updated the deployment.

After this change, the yaml is correctly indented, and the log output for a deployment update has a less noisy diff.

Follow-up to #399.

Specify default mode on configmap/secret volumes

Set an explicit value of 420 decimal (0644 octal) for the default mode of all configmap or secret volumes in the router deployment that are defined in Go code (as opposed to being defined in the yaml manifest): client-ca, default-certificate, error-pages, metrics-certs, rsyslog-config, service-ca-bundle, and stats-auth.

After this change, the log output for a deployment update has a less noisy diff.

Specify parameter values for router probes

Set explicit values for the liveness, readiness, and startup probes in the router deployment's yaml manifest.

Previously, the router deployment yaml manifest omitted these values when the intended values were equal to the default values. This meant that the API server would set the default values for these parameters, and so the probes ultimately ended up with the intended parameter values. However, omitting these values in the yaml manifest also had the unintended side-effect of making the operator log bogus diffs for these parameters whenever it updated the deployment.

After this change, the log output for a deployment update has a less noisy diff.

copyProbe: Refactor to group copying of defaults

Group together the assignments for the case that copyDefaultValues is true.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Miciah · 2024-05-24T08:52:35Z

/jira refresh

openshift-ci-robot · 2024-05-24T08:52:40Z

@Miciah: This pull request references Jira Issue OCPBUGS-23221, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.17.0) matches configured target version for branch (4.17.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @ShudiLi

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2024-05-24T12:00:39Z

@Miciah: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-azure-ovn	`3faf391`	link	false	`/test e2e-azure-ovn`
ci/prow/e2e-aws-gatewayapi	`3faf391`	link	false	`/test e2e-aws-gatewayapi`
ci/prow/e2e-gcp-ovn	`3faf391`	link	false	`/test e2e-gcp-ovn`
ci/prow/e2e-aws-ovn-single-node	`3faf391`	link	false	`/test e2e-aws-ovn-single-node`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Apr 30, 2024

openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Apr 30, 2024

openshift-ci bot requested review from ShudiLi, frobware and gcs278 April 30, 2024 20:06

Miciah force-pushed the OCPBUGS-23221-internalServiceChanged-fix-target-port-logic branch from 30433aa to 53baea6 Compare May 1, 2024 00:53

openshift-ci bot assigned candita May 1, 2024

Miciah force-pushed the OCPBUGS-23221-internalServiceChanged-fix-target-port-logic branch from 53baea6 to 584ef42 Compare May 1, 2024 23:14

Miciah force-pushed the OCPBUGS-23221-internalServiceChanged-fix-target-port-logic branch from 584ef42 to 600ed28 Compare May 1, 2024 23:34

candita reviewed May 2, 2024

View reviewed changes

candita reviewed May 3, 2024

View reviewed changes

Miciah force-pushed the OCPBUGS-23221-internalServiceChanged-fix-target-port-logic branch from 3a56b8d to 3d06500 Compare May 3, 2024 22:01

candita reviewed May 3, 2024

View reviewed changes

Miciah added 3 commits May 3, 2024 18:13

Miciah force-pushed the OCPBUGS-23221-internalServiceChanged-fix-target-port-logic branch from 4d0f520 to 2f0e787 Compare May 17, 2024 10:03

candita reviewed May 20, 2024

View reviewed changes

pkg/manifests/assets/router/deployment.yaml Outdated

Copy link

Contributor

candita May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! That is a lot of overlooked deployment updates.

candita reviewed May 20, 2024

View reviewed changes

copyProbe: Refactor to group copying of defaults

3faf391

* pkg/operator/controller/ingress/deployment.go (copyProbe): Group together the assignments for the case that copyDefaultValues is true.

openshift-ci-robot added jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. and removed jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels May 24, 2024

openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label May 24, 2024

openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label May 24, 2024

	// Ignore fields that the API, other controllers, or user may
	// have modified. Note: This list must be kept consistent with
	// the updated.Spec.Foo = current.Spec.Foo assignments below!

	// Preserve fields that the API, other controllers, or user may have
	// modified. Note: This list must be kept consistent with
	// serviceCmpOpts above!

	updated.Spec.ClusterIP = current.Spec.ClusterIP
	updated.Spec.ClusterIPs = current.Spec.ClusterIPs
	updated.Spec.ExternalIPs = current.Spec.ExternalIPs
	updated.Spec.HealthCheckNodePort = current.Spec.HealthCheckNodePort
	updated.Spec.IPFamilies = current.Spec.IPFamilies
	updated.Spec.IPFamilyPolicy = current.Spec.IPFamilyPolicy
	for i, updatedPort := range updated.Spec.Ports {
	for _, currentPort := range current.Spec.Ports {
	if currentPort.Name == updatedPort.Name {
	updated.Spec.Ports[i].NodePort = currentPort.NodePort

	t.Error("serviceMonitorChangedChanged reported changes but did not make any update")
	t.Error("serviceMonitorChanged reported changes but did not make any update")

		@@ -1645,7 +1652,7 @@ func copyProbe(a, b *corev1.Probe) {
		Port: a.ProbeHandler.HTTPGet.Port,
		Host: a.ProbeHandler.HTTPGet.Host,

OCPBUGS-23221: internalServiceChanged: Fix target port logic #1052

Are you sure you want to change the base?

OCPBUGS-23221: internalServiceChanged: Fix target port logic #1052

Conversation

Miciah commented Apr 30, 2024 • edited

internalServiceChanged: Fix target port logic

Test_*Changed: Verify that some update was made

crdChanged, nodePortServiceChanged: Fix preserve logic

Set SessionAffinity to avoid confusing diffs

Test_desiredLoadBalancerService: Add assertions

Test_desiredInternalIngressControllerService: Fix godoc

TestHostNetworkPortBinding: Use t.Log, t.Cleanup

TestHostNetworkPortBinding: Check service ports

Test_deploymentConfigChanged: Use k8s.io/utils/ptr

Set "service-ca-bundle" volume's default mode

Specify default mode on configmap/secret volumes

Specify parameter values for router probes

copyProbe: Refactor to group copying of defaults

openshift-ci-robot commented Apr 30, 2024

internalServiceChanged: Fix target port logic

Test_*Changed: Verify that some update was made

Miciah commented May 1, 2024

Miciah commented May 1, 2024

candita commented May 1, 2024

candita commented May 1, 2024

openshift-ci-robot commented May 1, 2024

internalServiceChanged: Fix target port logic

Test_*Changed: Verify that some update was made

Miciah commented May 1, 2024

Miciah commented May 1, 2024

Miciah commented May 1, 2024

Miciah commented May 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Miciah May 3, 2024 • edited

Choose a reason for hiding this comment

Miciah commented May 3, 2024

Miciah commented May 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

candita May 3, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Miciah commented May 17, 2024

Miciah commented May 17, 2024

Miciah commented May 17, 2024

Miciah commented May 17, 2024

candita commented May 17, 2024 • edited

candita commented May 17, 2024 • edited

candita commented May 17, 2024

candita commented May 17, 2024

Miciah commented May 18, 2024

ShudiLi commented May 20, 2024 • edited

candita commented May 20, 2024

candita commented May 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

candita May 20, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

candita commented May 20, 2024

Miciah commented May 21, 2024

candita commented May 21, 2024

candita commented May 23, 2024

openshift-ci-robot commented May 24, 2024

internalServiceChanged: Fix target port logic

Test_*Changed: Verify that some update was made

Miciah commented Apr 30, 2024 •

edited

`internalServiceChanged:` Fix target port logic

`Test_*Changed`: Verify that some update was made

`crdChanged`, `nodePortServiceChanged`: Fix preserve logic

`Test_desiredLoadBalancerService`: Add assertions

`Test_desiredInternalIngressControllerService`: Fix godoc

`TestHostNetworkPortBinding`: Use `t.Log`, `t.Cleanup`

`TestHostNetworkPortBinding`: Check service ports

`Test_deploymentConfigChanged`: Use k8s.io/utils/ptr

`copyProbe`: Refactor to group copying of defaults

`internalServiceChanged:` Fix target port logic

`Test_*Changed`: Verify that some update was made

`internalServiceChanged:` Fix target port logic

`Test_*Changed`: Verify that some update was made

Miciah May 3, 2024 •

edited

candita May 3, 2024 •

edited

candita commented May 17, 2024 •

edited

candita commented May 17, 2024 •

edited

ShudiLi commented May 20, 2024 •

edited

candita May 20, 2024 •

edited

`internalServiceChanged:` Fix target port logic

`Test_*Changed`: Verify that some update was made

`crdChanged`, `nodePortServiceChanged`: Fix preserve logic

`Test_desiredLoadBalancerService`: Add assertions

`Test_desiredInternalIngressControllerService`: Fix godoc

`TestHostNetworkPortBinding`: Use `t.Log`, `t.Cleanup`

`TestHostNetworkPortBinding`: Check service ports

`Test_deploymentConfigChanged`: Use k8s.io/utils/ptr

`copyProbe`: Refactor to group copying of defaults