Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable concurrency per replica setting #12

Open
samos123 opened this issue Nov 4, 2023 · 4 comments · May be fixed by #52
Open

Configurable concurrency per replica setting #12

samos123 opened this issue Nov 4, 2023 · 4 comments · May be fixed by #52
Labels
good first issue Good for newcomers

Comments

@samos123
Copy link
Contributor

samos123 commented Nov 4, 2023

Currently it seems Lingo is quickly creating more replicas as requests are incoming while the pod isn't ready to serve yet. It should be configurable how many requests a single pod can handle concurrently.

This could be done by using the following annotation in the deployment:

lingo.substratus.ai/concurrency: 100

In this case Lingo should only scale up when a single pod is handling more than 100 HTTP requests in parallel. I think a good default value is 100 which is also what knative uses: https://knative.dev/docs/serving/autoscaling/concurrency/#soft-versus-hard-concurrency-limits

@nstogner
Copy link
Contributor

nstogner commented Nov 4, 2023

The setting is here:

lingo/main.go

Line 78 in 4350d67

fifo := NewFIFOQueueManager(1, 1000)

In this case, 1 is the concurrency setting.

@samos123
Copy link
Contributor Author

samos123 commented Nov 5, 2023

I have already changed the default from 1 to 100, but what's left is having this configurable through the annotation and reconciling on this as needed

@samos123 samos123 added this to the 0.1 release milestone Nov 5, 2023
@alpe
Copy link
Contributor

alpe commented Dec 19, 2023

Do I understand this correct, that you suggest a new annotation on the model deployment so that instead of a global value in lingo main this can be customized on the model level? This sounds very reasonable to me.

The deployment manager receives updates on Reconcile and could trigger a queue resize on the instance

@samos123
Copy link
Contributor Author

@alpe Yes that's correct. There should be a default global value. In addition, each deployment should be able to override the default global value by setting an annotation.

@nstogner nstogner removed this from the 0.1 Release milestone Jan 16, 2024
@samos123 samos123 added the good first issue Good for newcomers label Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants