Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to force the location of the Seed to be in the same region as Kyma cluster [EPIC] #18182

Open
2 of 17 tasks
TorstenD-SAP opened this issue Sep 15, 2023 · 10 comments
Assignees
Labels
area/security Issues or PRs related to security Epic size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Comments

@TorstenD-SAP
Copy link

TorstenD-SAP commented Sep 15, 2023

Description

The user who creates a Kyma cluster in the BTP cockpit should be able to enforce the location of the Control Plane to be in the same region as the Hyperscaler account where the Worker Nodes of the cluster are deployed. If it is not possible to have the Control Plane in the same region, the user should see an error message allowing him to proceed without this enforcement. In all cases it has to be transparent to the customer in which region the Control Plane is hosted.

Reasons

The region of the Control Plane is automatically chosen by Gardener (https://gardener.cloud/docs/gardener/concepts/scheduler/). Because of this the Control Plane could sometimes be deployed in a different region than the worker nodes, among others because Gardener doesn't have Seed clusters in all the regions Kyma can be deployed. This can lead to a violation of the law because the Control Plane could be in another legal area than the Worker Nodes and the customer is storing personal data (e. g. names, email addresses) on the Control Plane. We also have customers which are very sensitive regarding the regions where sensitive data is stored.

AC (Added by PK)

  • Phase 1: do the full investigation where we need to put implementation efforts, areas: (Gophers estimation: Size/M)
  • Phase 2: Feature delivery
    • 2.1) KEB implementation + feature flag for dev, stage, prod
    • 2.2) Provisioner implementation
    • 2.3) Set KEB feature flag on DEV to true and write new SKR e2e integration test
    • 2.4) Register new schema on CIS - DEV
    • 2.5) Register new schema on CIS - STAGE
    • 2.6) Register new schema on CIS - PROD
    • 2.7) Update sap help sap portal, docs
    • 2.8) RN
    • 2.9) Synchronise timeouts between Provisioner and KEB
@kyma-bot
Copy link
Contributor

This issue or PR has been automatically marked as stale due to the lack of recent activity.
Thank you for your contributions.

This bot triages issues and PRs according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 7d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Close this issue or PR with /close

If you think that I work incorrectly, kindly raise an issue with the problem.

/lifecycle stale

@kyma-bot kyma-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 14, 2023
@varbanv varbanv removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 15, 2023
@tobiscr tobiscr added the area/security Issues or PRs related to security label Dec 5, 2023
@TorstenD-SAP
Copy link
Author

A label seed.gardener.cloud/region was added to each Gardener seed. This label can be used to restrict the seeds allowed for a shoot cluster by using the spec.seedSelector in the shoot spec.

Copy link

This issue has been automatically marked as stale due to the lack of recent activity. It will soon be closed if no further activity occurs.
Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 13, 2024
@TorstenD-SAP TorstenD-SAP removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 13, 2024
@tobiscr
Copy link
Contributor

tobiscr commented Mar 25, 2024

We agreed with @kyma-project/gopher to offer this feature under following constraints:

  • We know that not all regions have yet their own seed, thats why we will show in the UI already a link to the documentation that this feature is not in all regions supported and can lead to failed Kyma clusters (because Gardener rejected the cluster creation)
  • KIM and KEB will not check up-front if a seed exists in the requested regions and follow a "trail and error" approach: if Gardener could create the cluster all is fine otherwise the customer get's an error replied.

@tobiscr tobiscr added the Epic label Apr 9, 2024
@PK85 PK85 changed the title Add the ability to force the location of the Control Plane to be in the same region than the Nodes Add the ability to force the location of the Seed to be in the same region as Kyma cluster Apr 24, 2024
@PK85 PK85 added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 24, 2024
@ralikio ralikio self-assigned this May 16, 2024
@IwonaLanger IwonaLanger self-assigned this May 21, 2024
@ralikio
Copy link
Member

ralikio commented May 21, 2024

I have tested seedSelector field mentioned in #18182 (comment). @PK85 suggested that for our test scenario we should select a shoot region that does not contain any seeds in it - ap-northeast-1. Just by creating a shoot with default configuration it got assigned a aws-ha-us2 region. Creation of another shoot with seedCluster set to ap-northeast-1 resulted in the following status:

*Status*
Create Pending

*Last Message*
Failed to schedule Shoot: none out of the ... seeds has the matching labels required by 
seed selector of 'Shoot' (selector: 'seed.gardener.cloud/region=ap-northeast-1')

Status: Create Pending seems counterintuitive to @kyma-project/gopher and @kyma-project/framefrog and will be consulted with Gardener Team.

@ralikio
Copy link
Member

ralikio commented May 24, 2024

Proposed request sent to Provisioner's graphql API with new field shootAndSeedSameRegion:

{
	runtimeInput: {
		...	
	},
	clusterConfig:{
		gardenerConfig: {
			...	
			shootAndSeedSameRegion: false (default) | true,
		},
		...
	},
}

@ralikio
Copy link
Member

ralikio commented May 24, 2024

@tobiscr
Copy link
Contributor

tobiscr commented May 27, 2024

JFYI - added a draft PR for Gardener to extract the Seed determining logic into separate struct to make it reusable for other apps over their API:

gardener/gardener#9843

@ralikio
Copy link
Member

ralikio commented May 27, 2024

Two additional tests cases conducted regarding Gardener's spec.controlPlane.highAvailability.failureTolerance.type: zone and seedSelector. From the gardener documentation https://gardener.cloud/docs/gardener/high-availability/ we learn that:

Regarding the seed cluster selection, the only constraint is that shoot clusters with failure tolerance type zone are only allowed to run on seed clusters with at least three zones. All other shoot clusters (non-HA or those with failure tolerance type node) can run on seed clusters with any number of zones.

Case I - Creating a non-HA shoot on a region that only contains HA seeds - contains HA in its name

Provider: aws
Seed Selector: eu-north-1 - a region with two HA seeds
HA options: spec.controlPlane.highAvailability.failureTolerance.type: zone not set
Result: shoot gets created successfully.

Case II - Creating a HA shoot on a region that only contains non-HA seeds - no HA in its name

Provider: gcp
Seed Selector: europe-west-3 - a region with one non-HA seed
HA options: spec.controlPlane.highAvailability.failureTolerance.type: zone enabled
Result:

Create Pending - Failed to schedule Shoot: 0/1 seed cluster candidate(s) are eligible for scheduling: {*** => shoot does not tolerate the seed's taints}

Case III - Creating a HA shot in the region that contains one HA seed - contains HA in its name

Provider: gcp
Seed Selector: me-central2 - a region with one HA seed
HA options: spec.controlPlane.highAvailability.failureTolerance.type: zone enabled
Result:

Create Pending - Failed to schedule Shoot: 0/1 seed cluster candidate(s) are eligible for scheduling: {*** => shoot does not tolerate the seed's taints}

@ralikio
Copy link
Member

ralikio commented May 28, 2024

Rendering of schema changes:

Image

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/security Issues or PRs related to security Epic size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

No branches or pull requests

7 participants