Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MGMT-17700: Assign none platform node-ips based on connected addresses and etcd restrictions #6257

Merged
merged 1 commit into from May 27, 2024

Conversation

ori-amizur
Copy link
Contributor

@ori-amizur ori-amizur commented May 2, 2024

When none platform is in use, if there is ambiguity in node-ip assignment, then incorrect assignment might lead to installation failure. This happens when etcd detects that the socket address from an etcd node does not match the expected address in the peer certificate. In this case etcd rejects such connection.

Example: assuming two networks - net1 and net2.
master node 1 has 1 address that belongs to net1.
master node 2 has 2 addresses. one that belongs to net 1, and another that belongs to net 2 master node 3 has 1 address that belongs to net 1. If the selected node-ip of master node 2 belongs to net 2, then when it will create a connection with any other master node, the socket address will be the address that belongs to net 1. Since etcd expects it to be the same as the node-ip, it will reject the connection.

This can be solved by node-ips selection that will not cause such a conflict. Node ips assignment should be done through ignition. To correctly set bootstrap ip, the machine-network for the cluster must be set to match the selected node-ip for that host.

MGMT-17701: Add capability to calculate none platform node-ips based on L3 connected addresses and connectivity

Calculate the node-ips for none platform cluster. Node ip calculation is actually calculation of the node-ip, hint, and cidr. Node-ip is the ip address that exists on the host that was selected as the node-ip.
Hint is actually an IP address that does not exist on the host, but must belong to the subnet of the node-ip.
Cidr is the subnet in cidr notation that the node-ip and hint belong to. Node-ip calculation is either done for all cluster hosts or none of them.

MGMT-17702: Modify ignition to use calculated node-ips for none platform

In order to set node-ip, ignition is modified. A file called /etc/default/nodeip-configuration contains the hint as was set by node-ip generation.
In addition the bootstrap-ip is set in ignition as was set in node-ip generation. This is set as environment variable called "OPENSHIFT_INSTALL_BOOTSTRAP_NODE_IP".

MGMT-17703: Modify machine-network based on calculated node-ips

The etcd in boostrap uses he machine-network to set the IP address. Therefore during install-config generation the machine-network may be replaced by the cidr from the node-ip generation.

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

/cc @tsorya
/cc @paul-maidment

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 2, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented May 2, 2024

@ori-amizur: This pull request references MGMT-17700 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

When none platform is in use, if there is ambiguity in node-ip assignment, then incorrect assignment might lead to installation failure. This happens when etcd detects that the socket address from an etcd node does not match the expected address in the peer certificate. In this case etcd rejects such connection.

Example: assuming to networks - net1 and net2.
master node 1 has 1 address that belongs to net1.
master node 2 has 2 addresses. one that belongs to net 1, and another that belongs to net 2 master node 3 has 1 address that belongs to net 1. If the selected node-ip of master node 2 belongs to net 2, then when it will create a connection with any other master node, the socket address will be the address that belongs to net 1. Since etcd expects it to be the same as the node-ip, it will reject the connection.

This can be solved by node-ips selection that will not cause such a conflict. Node ips assignment should be done through ignition. To correctly set bootstrap ip, the machine-network for the cluster must be set to match the selected node-ip for that host.

MGMT-17701: Add capability to calculate none platform node-ips based on L3 connected addresses and connectivity

Calculate the node-ips for none platform cluster. Node ip calculation is actually calculation of the node-ip, hint, and cidr. Node-ip is the ip address that exists on the host that was selected as the node-ip.
Hint is actually an IP address that does not exist on the host, but must belong to the subnet of the node-ip.
Cidr is the subnet in cidr notation that the node-ip and hint belong to. Node-ip calculation is either done for all cluster hosts or none of them.

MGMT-17702: Modify ignition to use calculated node-ips for none platform

In order to set node-ip, ignition is modified. A file called /etc/default/nodeip-configuration contains the hint as was set by node-ip generation.
In addition the bootstrap-ip is set in ignition as was set in node-ip generation. This is set as environment variable called "OPENSHIFT_INSTALL_BOOTSTRAP_NODE_IP".

MGMT-17703: Modify machine-network based on calculated node-ips

The etcd in boostrap uses he machine-network to set the IP address. Therefore during install-config generation the machine-network may be replaced by the cidr from the node-ip generation.

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

/cc @tsorya
/cc @paul-maidment

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 2, 2024
Copy link

openshift-ci bot commented May 2, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ori-amizur

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 2, 2024
Copy link

codecov bot commented May 2, 2024

Codecov Report

Attention: Patch coverage is 78.12500% with 42 lines in your changes are missing coverage. Please review.

Project coverage is 68.43%. Comparing base (4f4c464) to head (c2e4687).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6257      +/-   ##
==========================================
+ Coverage   68.25%   68.43%   +0.17%     
==========================================
  Files         244      245       +1     
  Lines       35869    36060     +191     
==========================================
+ Hits        24483    24676     +193     
+ Misses       9223     9205      -18     
- Partials     2163     2179      +16     
Files Coverage Δ
internal/common/common.go 23.90% <0.00%> (-0.20%) ⬇️
internal/ignition/installmanifests.go 55.67% <66.66%> (+0.27%) ⬆️
...ernal/network/none_platform_node_ips_allocation.go 80.72% <80.72%> (ø)

... and 3 files with indirect coverage changes

@ori-amizur
Copy link
Contributor Author

/test edge-subsystem-kubeapi-aws

@openshift-ci-robot
Copy link

openshift-ci-robot commented May 8, 2024

@ori-amizur: This pull request references MGMT-17700 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

When none platform is in use, if there is ambiguity in node-ip assignment, then incorrect assignment might lead to installation failure. This happens when etcd detects that the socket address from an etcd node does not match the expected address in the peer certificate. In this case etcd rejects such connection.

Example: assuming two networks - net1 and net2.
master node 1 has 1 address that belongs to net1.
master node 2 has 2 addresses. one that belongs to net 1, and another that belongs to net 2 master node 3 has 1 address that belongs to net 1. If the selected node-ip of master node 2 belongs to net 2, then when it will create a connection with any other master node, the socket address will be the address that belongs to net 1. Since etcd expects it to be the same as the node-ip, it will reject the connection.

This can be solved by node-ips selection that will not cause such a conflict. Node ips assignment should be done through ignition. To correctly set bootstrap ip, the machine-network for the cluster must be set to match the selected node-ip for that host.

MGMT-17701: Add capability to calculate none platform node-ips based on L3 connected addresses and connectivity

Calculate the node-ips for none platform cluster. Node ip calculation is actually calculation of the node-ip, hint, and cidr. Node-ip is the ip address that exists on the host that was selected as the node-ip.
Hint is actually an IP address that does not exist on the host, but must belong to the subnet of the node-ip.
Cidr is the subnet in cidr notation that the node-ip and hint belong to. Node-ip calculation is either done for all cluster hosts or none of them.

MGMT-17702: Modify ignition to use calculated node-ips for none platform

In order to set node-ip, ignition is modified. A file called /etc/default/nodeip-configuration contains the hint as was set by node-ip generation.
In addition the bootstrap-ip is set in ignition as was set in node-ip generation. This is set as environment variable called "OPENSHIFT_INSTALL_BOOTSTRAP_NODE_IP".

MGMT-17703: Modify machine-network based on calculated node-ips

The etcd in boostrap uses he machine-network to set the IP address. Therefore during install-config generation the machine-network may be replaced by the cidr from the node-ip generation.

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

/cc @tsorya
/cc @paul-maidment

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 8, 2024
@ori-amizur ori-amizur requested a review from tsorya May 8, 2024 14:23
@openshift-ci openshift-ci bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 8, 2024
@ori-amizur
Copy link
Contributor Author

/test?

Copy link

openshift-ci bot commented May 8, 2024

@ori-amizur: The following commands are available to trigger required jobs:

  • /test e2e-agent-compact-ipv4
  • /test edge-assisted-operator-catalog-publish-verify
  • /test edge-ci-index
  • /test edge-e2e-ai-operator-ztp
  • /test edge-e2e-ai-operator-ztp-sno-day2-workers
  • /test edge-e2e-ai-operator-ztp-sno-day2-workers-late-binding
  • /test edge-e2e-metal-assisted
  • /test edge-e2e-metal-assisted-4-11
  • /test edge-e2e-metal-assisted-4-12
  • /test edge-e2e-metal-assisted-cnv
  • /test edge-e2e-metal-assisted-lvm
  • /test edge-e2e-metal-assisted-odf
  • /test edge-images
  • /test edge-lint
  • /test edge-subsystem-aws
  • /test edge-subsystem-kubeapi-aws
  • /test edge-unit-test
  • /test edge-verify-generated-code
  • /test images
  • /test mce-images

The following commands are available to trigger optional jobs:

  • /test e2e-agent-ha-dualstack
  • /test e2e-agent-sno-ipv6
  • /test edge-e2e-ai-operator-disconnected-capi
  • /test edge-e2e-ai-operator-ztp-3masters
  • /test edge-e2e-ai-operator-ztp-capi
  • /test edge-e2e-ai-operator-ztp-compact-day2-masters
  • /test edge-e2e-ai-operator-ztp-compact-day2-workers
  • /test edge-e2e-ai-operator-ztp-disconnected
  • /test edge-e2e-ai-operator-ztp-hypershift-zero-nodes
  • /test edge-e2e-ai-operator-ztp-multiarch-3masters-ocp
  • /test edge-e2e-ai-operator-ztp-multiarch-sno-ocp
  • /test edge-e2e-ai-operator-ztp-node-labels
  • /test edge-e2e-ai-operator-ztp-sno-day2-masters
  • /test edge-e2e-ai-operator-ztp-sno-day2-workers-ignitionoverride
  • /test edge-e2e-metal-assisted-4-13
  • /test edge-e2e-metal-assisted-4-14
  • /test edge-e2e-metal-assisted-4-15
  • /test edge-e2e-metal-assisted-bond
  • /test edge-e2e-metal-assisted-bond-4-14
  • /test edge-e2e-metal-assisted-day2
  • /test edge-e2e-metal-assisted-day2-arm-workers
  • /test edge-e2e-metal-assisted-day2-single-node
  • /test edge-e2e-metal-assisted-external
  • /test edge-e2e-metal-assisted-external-4-14
  • /test edge-e2e-metal-assisted-ipv4v6
  • /test edge-e2e-metal-assisted-ipv6
  • /test edge-e2e-metal-assisted-kube-api-late-binding-single-node
  • /test edge-e2e-metal-assisted-kube-api-late-unbinding-ipv4-single-node
  • /test edge-e2e-metal-assisted-kube-api-net-suite
  • /test edge-e2e-metal-assisted-mce-4-11
  • /test edge-e2e-metal-assisted-mce-4-12
  • /test edge-e2e-metal-assisted-mce-4-13
  • /test edge-e2e-metal-assisted-mce-4-14
  • /test edge-e2e-metal-assisted-mce-4-15
  • /test edge-e2e-metal-assisted-mce-sno
  • /test edge-e2e-metal-assisted-metallb
  • /test edge-e2e-metal-assisted-none
  • /test edge-e2e-metal-assisted-onprem
  • /test edge-e2e-metal-assisted-single-node
  • /test edge-e2e-metal-assisted-static-ip-suite
  • /test edge-e2e-metal-assisted-static-ip-suite-4-14
  • /test edge-e2e-metal-assisted-tang
  • /test edge-e2e-metal-assisted-tpmv2
  • /test edge-e2e-metal-assisted-upgrade-agent
  • /test edge-e2e-nutanix-assisted
  • /test edge-e2e-nutanix-assisted-2workers
  • /test edge-e2e-nutanix-assisted-4-14
  • /test edge-e2e-oci-assisted
  • /test edge-e2e-oci-assisted-4-14
  • /test edge-e2e-oci-assisted-iscsi
  • /test edge-e2e-vsphere-assisted
  • /test edge-e2e-vsphere-assisted-4-12
  • /test edge-e2e-vsphere-assisted-4-13
  • /test edge-e2e-vsphere-assisted-4-14
  • /test edge-e2e-vsphere-assisted-umn
  • /test okd-scos-images
  • /test push-pr-image

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-assisted-service-master-e2e-agent-compact-ipv4
  • pull-ci-openshift-assisted-service-master-edge-ci-index
  • pull-ci-openshift-assisted-service-master-edge-e2e-ai-operator-ztp
  • pull-ci-openshift-assisted-service-master-edge-e2e-metal-assisted
  • pull-ci-openshift-assisted-service-master-edge-e2e-metal-assisted-external
  • pull-ci-openshift-assisted-service-master-edge-e2e-metal-assisted-none
  • pull-ci-openshift-assisted-service-master-edge-e2e-nutanix-assisted
  • pull-ci-openshift-assisted-service-master-edge-e2e-nutanix-assisted-4-14
  • pull-ci-openshift-assisted-service-master-edge-e2e-oci-assisted
  • pull-ci-openshift-assisted-service-master-edge-e2e-oci-assisted-4-14
  • pull-ci-openshift-assisted-service-master-edge-e2e-vsphere-assisted
  • pull-ci-openshift-assisted-service-master-edge-images
  • pull-ci-openshift-assisted-service-master-edge-lint
  • pull-ci-openshift-assisted-service-master-edge-subsystem-aws
  • pull-ci-openshift-assisted-service-master-edge-subsystem-kubeapi-aws
  • pull-ci-openshift-assisted-service-master-edge-unit-test
  • pull-ci-openshift-assisted-service-master-edge-verify-generated-code
  • pull-ci-openshift-assisted-service-master-images
  • pull-ci-openshift-assisted-service-master-mce-images

In response to this:

/test?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ori-amizur
Copy link
Contributor Author

/test edge-e2e-metal-assisted-none

@ori-amizur
Copy link
Contributor Author

/test edge-e2e-metal-assisted

internal/common/common.go Show resolved Hide resolved
internal/ignition/ignition.go Outdated Show resolved Hide resolved
internal/ignition/ignition.go Outdated Show resolved Hide resolved
@ori-amizur ori-amizur force-pushed the MGMT-17700 branch 2 times, most recently from b75c07c to 366ca49 Compare May 20, 2024 15:34
@ori-amizur ori-amizur requested a review from eranco74 May 20, 2024 15:44
@ori-amizur
Copy link
Contributor Author

/retest-required

@ori-amizur
Copy link
Contributor Author

/retest-required

@tsorya
Copy link
Contributor

tsorya commented May 24, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 24, 2024
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 4f4c464 and 2 for PR HEAD 7425516 in total

@ori-amizur
Copy link
Contributor Author

/retest-required

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label May 24, 2024
…s and etcd restrictions

When none platform is in use, if there is ambiguity in node-ip assignment,
then incorrect assignment might lead to installation failure. This happens
when etcd detects that the socket address from an etcd node does not match the
expected address in the peer certificate. In this case etcd rejects such connection.

Example: assuming two networks - net1 and net2.
master node 1 has 1 address that belongs to net1.
master node 2 has 2 addresses. one that belongs to net 1, and another that belongs to net 2
master node 3 has 1 address that belongs to net 1.
If the selected node-ip of master node 2 belongs to net 2, then when it will create a connection
with any other master node, the socket address will be the address that belongs to net 1.
Since etcd expects it to be the same as the node-ip, it will reject the connection.

This can be solved by node-ips selection that will not cause such a conflict.
Node ips assignment should be done through ignition.
To correctly set bootstrap ip, the machine-network for the cluster
must be set to match the selected node-ip for that host.

MGMT-17701: Add capability to calculate none platform node-ips based on L3 connected addresses and connectivity

Calculate the node-ips for none platform cluster.  Node ip calculation
is actually calculation of the node-ip, hint, and cidr.
Node-ip is the ip address that exists on the host that was selected as
the node-ip.
Hint is actually an IP address that does not exist on the host, but must
belong to the subnet of the node-ip.
Cidr is the subnet in cidr notation that the node-ip and hint belong to.
Node-ip calculation is either done for all cluster hosts or none of
them.

MGMT-17702: Modify ignition to use calculated node-ips for none platform

In order to set node-ip, ignition is modified.  A file called
/etc/default/nodeip-configuration contains the hint as was set by
node-ip generation.
In addition the bootstrap-ip is set in ignition as was set in node-ip
generation.  This is set as environment variable called
"OPENSHIFT_INSTALL_BOOTSTRAP_NODE_IP".

MGMT-17703: Modify machine-network based on calculated node-ips

The etcd in boostrap uses he machine-network to set the IP address.
Therefore during install-config generation the machine-network may be
replaced by the cidr from the node-ip generation.
@ori-amizur
Copy link
Contributor Author

/test edge-e2e-ai-operator-ztp

Copy link

openshift-ci bot commented May 25, 2024

@ori-amizur: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/edge-e2e-nutanix-assisted c2e4687 link false /test edge-e2e-nutanix-assisted
ci/prow/edge-e2e-nutanix-assisted-4-14 c2e4687 link false /test edge-e2e-nutanix-assisted-4-14
ci/prow/edge-e2e-oci-assisted c2e4687 link false /test edge-e2e-oci-assisted
ci/prow/edge-e2e-vsphere-assisted c2e4687 link false /test edge-e2e-vsphere-assisted

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@tsorya
Copy link
Contributor

tsorya commented May 27, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 27, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 8a63485 into openshift:master May 27, 2024
17 of 21 checks passed
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-agent-installer-api-server-container-v4.17.0-202405271211.p0.g8a63485.assembly.stream.el9 for distgit ose-agent-installer-api-server.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants