Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd backend sharding support #63

Open
wilsonwang371 opened this issue Feb 4, 2021 · 4 comments
Open

etcd backend sharding support #63

wilsonwang371 opened this issue Feb 4, 2021 · 4 comments

Comments

@wilsonwang371
Copy link

As the number of nodes in our k8s cluster increases significantly, we saw etcd gradually becoming our cluster performance bottleneck. apiserver already supports different objects using different etcd cluster. But in our case, we saw a large number of pods objects and we investigate if it is possible to use multiple shards within one resource type.

Some companies already do the sharding based on the key, similar to what TiKV did. However, I want to discuss the possibilities to use etcd shards based on the key hash.

For example, I have key k1 and key k2, after getting their md5 and mod ops, I will put k1 to etcd shard1 and k2 to etcd shard2. This will balance the loads among different etcd clusters and get higher throughput.

AFAIK, apiserver uses only a limited number of operations supported by etcd. These are Range/Txn/Watch operations. Also, regarding Txn operation, they are only simple transactions doing single Create, Update or Delete operation.

If we use the proposed sharding, for single Create/Update/Delete operations, it seems simple. But for Watch and Range request, apiserver needs to maintain a connection to each one of the etcd shard. Could there be any issue with this regarding the Range/Watch performance?

When apiserver holds multiple etcd shard connections, apiserver also need to remember each etcd shard's latest revision. Regarding this, I am thinking make changes to the APIObjectVersioner object so that a resource version vector is supported. It will be something similar to "{Shard1:Rev1,Shard2:Rev2}". So that the revision positions in each of the shard connection is kept.

Do you guys see any issue with this general design? I want to get some feedback from you guys so that if it is do-able, we can make this change and contribute this back to the opensource community.

@xww
Copy link

xww commented Apr 22, 2022

why close?i think it's a good idea

@AlexanderYastrebov
Copy link

Seems to be a duplicate of kubernetes/kubernetes#98814

@anveshreddy18
Copy link

anveshreddy18 commented Mar 29, 2024

Can we reopen this @wilsonwang371 ? the duplicate was closed as they wanted to put it as a KEP but it didn't go through. Having both the issues closed regarding an important issue doesn't seem ideal. I guess apiserver is the right repo to track this.

Or if the issue or proposal is being continued at a different place, can you pls refer it here to help others. Thanks

@wilsonwang371 wilsonwang371 reopened this Apr 8, 2024
@wilsonwang371
Copy link
Author

wilsonwang371 commented Apr 8, 2024

I was thinking about this before, but finally, I made some contributions to etcd 3.5 and the TXN performance got significantly improved. So that's why I did not work on this for a long time.

It seems like many people are interested in this topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants