Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETCD X KWOK EPIC #4

Closed
2 of 13 tasks
Sharpz7 opened this issue Dec 6, 2023 · 2 comments
Closed
2 of 13 tasks

ETCD X KWOK EPIC #4

Sharpz7 opened this issue Dec 6, 2023 · 2 comments
Assignees

Comments

@Sharpz7
Copy link
Owner

Sharpz7 commented Dec 6, 2023

This Epic focussed on getting the minimum requirements to achieve the main objective.

Main Objective

  1. Get a PR merged that improves the ETCD max DB limit.

Sub Objectives

These are things that may or may not need to be achieved before the Main Objective, or happen as a consequence of the Main Objective:

  1. Create Documentation on the new limits for ETCD
  2. Create Analysis that gets merged into the repo
  3. Create a AWS Terraform Suite for testing
  4. Merge a Performance Framework into the etcd repo

Step By Step Guide

Based on this repo: https://github.com/Sharpz7/kwok-etcd

  • Configure KWOK to allow for etcd's database limit to be increased Support increasing the etcd size limit kubernetes-sigs/kwok#864
  • Ensure all k8s or kwok imposed QPS limits are disabled
  • Create a 100kB Pod Spec for testing
  • Try and Break KWOK from its own binary, KinD and the Docker Image.
  • Decide if extra pod states are needed to get to defrag state
  • Use and Benchmark https://github.com/ahrtr/etcd-defrag against current GR standard (From email)
  • Decide if Hardware Framework is needed
  • Implement Hardware Framework between TIG and AWS
  • Design good analysis program to benchmark results
  • Try and find problems in the codebase
  • Implement Fixes to Codebase
  • Test Against created performance benchmark
  • Create Ticket w/ Results

General Notes

The below are general notes that will tie to certain objectives by note.

Laymen's Terms Blog on what the ETCD DB issue is: https://aws.amazon.com/blogs/containers/managing-etcd-database-size-on-amazon-eks-clusters (SUB OBJECTIVE 1)

etcd-io/etcd#15354 - recent notes about how etcd almost certainly can handle higher limits. Lack of ETCD-only Performance Framework stops them from publishing it officially.

etcd-io/etcd#12690 - Comment about how increased limits need to be verified on your own, and how good hardware is important.

etcd-io/etcd#9771 - More detailed look at ETCD performance, coming to the same conclusions as above.

IMPORTANT: The above suggest there is a chance that no improvements are actually needed at all, just better documentation on hardware configurations and better well published analysis

@Sharpz7 Sharpz7 self-assigned this Dec 6, 2023
@Sharpz7
Copy link
Owner Author

Sharpz7 commented Dec 12, 2023

Some interesting results from testing:

@Sharpz7
Copy link
Owner Author

Sharpz7 commented Dec 12, 2023

@Sharpz7 Sharpz7 closed this as completed Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant