Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guarantee execution time per the invocation request like AWS Lambda #961

Open
huasiy opened this issue Mar 19, 2024 · 8 comments
Open

Guarantee execution time per the invocation request like AWS Lambda #961

huasiy opened this issue Mar 19, 2024 · 8 comments
Labels
enhancement New feature or request

Comments

@huasiy
Copy link

huasiy commented Mar 19, 2024

Describe the enhancement
In Lambda, the maximum execution time for each request is 15 minutes, and the instance executing the request will not be terminated prematurely. However, based on experiments and existing issues in Knative, requests in Knative may be terminated prematurely. To prevent pods from being prematurely deleted, I currently have to increase the stable window, but this will lower resource utilization. Does anyone have a better solution?

@huasiy huasiy added the enhancement New feature or request label Mar 19, 2024
@leokondrashov
Copy link
Contributor

Hi, this is not entirely true. We can increase the grace period so that the ongoing invocation has a chance to finish execution. With correct termination handling, the instance would only exist during the execution and be destroyed as soon as the request is done. So it won't consume more resources for terminating pods.

@huasiy
Copy link
Author

huasiy commented Apr 12, 2024

Thank you for your response. The method you mentioned indeed ensures that requests are not prematurely terminated. Now, I am wondering whether Knative can provide a Lambda-like warm-up mechanism, where instances can be kept in a warm state if requests are sent at a certain frequency ?

@leokondrashov
Copy link
Contributor

By default, instances are controlled by the autoscaler, which decides when to scale up and down. It does this based on the observed concurrent request count over a window. So, it holds the instances for some time, although the logic is a bit more complicated compared to keep-alive policies that are commonly used.

If we are talking about keeping a single instance warm, you can trigger the execution once a window period (60s by default). Keeping two instances warm is much trickier; I won't even try to reason how to make that happen. Also, increasing the autoscaling window helps store warm instances for longer periods of time.

@huasiy
Copy link
Author

huasiy commented May 12, 2024

Unfortunately, I need to maintain a pool of instances with an arbitrary number (greater than 1), does this mean I can't use Knative to achieve this? Are there any other software solutions that can provide a "warm" instance mechanism similar to AWS Lambda?

@leokondrashov
Copy link
Contributor

Sorry, didn't state the obvious solution. My previous answer focused on a trick to make the instance warm, but there is a proper solution to retain a specific scale. You can set minimum scale (docs) of the function. That works like provisioned concurrency in AWS Lambda, you will always have at least this amount of instances for this function with the ability scale up automatically.

@huasiy
Copy link
Author

huasiy commented May 12, 2024

Apologies, my previous discussion is not clear. I don't want to pre-allocate instances. As discussed previously, I am wondering whether Knative can provide a Lambda-like warm-up mechanism, where instances can be kept in a warm state if requests are sent at a certain frequency. As you've pointed out, knative autoscaler scales up and down based on the observed concurrent request count over a window. However. when knative scales down, it does't know which instances are processing request, so it will kill processing instances. I want to know if knative can know the state of each instance, and only kill free instances. If knative can't, what other software can do?

@leokondrashov
Copy link
Contributor

First, yes, it is not possible right now in knative to terminate a specific pod, so we may terminate the pod with a running request. This is a feature request in knative and k8s for 5 years (you added the link in the original question).

Second, I don't think that this is a problem. The process of pod termination is not instant. At the transition to the terminating state, the pod receives SIGTERM, and it is given a grace period (in the case of knative, it is set to function timeout) to finish current request(s) and terminate gracefully. After the grace period, it is forcefully terminated with SIGKILL. So, the autoscaler works the same way as before; resource utilization is still higher (we need to retain instances in the terminating state until they are finished with requests), but I don't think that's a big issue.

Third, I'm still confused about the warm-up mechanism that you are talking about. If we are discussing creating additional requests to keep the instance warm, I'd say that is not a mechanism but a trick that exploits AWS' keepalive policy. The proper way to ensure the presence of at least a specific number of instances is to request it directly from the provider. I don't see the connection to the original question: the instance with long-running requests won't receive additional requests until it handles the existing one (if the instance can get only one request at a time). So, firing additional requests to keep it warm won't affect the instance's lifetime since they would be routed to another instance.

If something is still unclear, please provide an example of what you need to do/want to see, so we can discuss it.

@huasiy
Copy link
Author

huasiy commented May 28, 2024

Sorry for late response. Actually, I want to have an independent timer for each instance, such that when a request is routed to an instance, it resets this timer. The instance should only be released if no requests are routed to it after a predefined period (like Knative's default 90 seconds), and each incoming request would effectively reset this timer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants