Configurable concurrency per replica setting #12

samos123 · 2023-11-04T05:08:18Z

Currently it seems Lingo is quickly creating more replicas as requests are incoming while the pod isn't ready to serve yet. It should be configurable how many requests a single pod can handle concurrently.

This could be done by using the following annotation in the deployment:

lingo.substratus.ai/concurrency: 100

In this case Lingo should only scale up when a single pod is handling more than 100 HTTP requests in parallel. I think a good default value is 100 which is also what knative uses: https://knative.dev/docs/serving/autoscaling/concurrency/#soft-versus-hard-concurrency-limits

The text was updated successfully, but these errors were encountered:

nstogner · 2023-11-04T15:22:32Z

The setting is here:

lingo/main.go

Line 78 in 4350d67

fifo := NewFIFOQueueManager(1, 1000)

In this case, 1 is the concurrency setting.

samos123 · 2023-11-05T18:09:56Z

I have already changed the default from 1 to 100, but what's left is having this configurable through the annotation and reconciling on this as needed

alpe · 2023-12-19T13:53:10Z

Do I understand this correct, that you suggest a new annotation on the model deployment so that instead of a global value in lingo main this can be customized on the model level? This sounds very reasonable to me.

The deployment manager receives updates on Reconcile and could trigger a queue resize on the instance

samos123 · 2024-01-13T02:21:51Z

@alpe Yes that's correct. There should be a default global value. In addition, each deployment should be able to override the default global value by setting an annotation.

samos123 added this to the 0.1 release milestone Nov 5, 2023

alpe mentioned this issue Dec 19, 2023

Fine tuning backend limits #41

Closed

This was referenced Jan 8, 2024

Start configurable concurrency alpe/lingo#1

Closed

Spike: Configurable concurrency #52

Draft

nstogner removed this from the 0.1 Release milestone Jan 16, 2024

samos123 added the good first issue Good for newcomers label Mar 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable concurrency per replica setting #12

Configurable concurrency per replica setting #12

samos123 commented Nov 4, 2023

nstogner commented Nov 4, 2023

samos123 commented Nov 5, 2023

alpe commented Dec 19, 2023 •

edited

samos123 commented Jan 13, 2024

Configurable concurrency per replica setting #12

Configurable concurrency per replica setting #12

Comments

samos123 commented Nov 4, 2023

nstogner commented Nov 4, 2023

samos123 commented Nov 5, 2023

alpe commented Dec 19, 2023 • edited

samos123 commented Jan 13, 2024

alpe commented Dec 19, 2023 •

edited