-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guarantee execution time per the invocation request like AWS Lambda #961
Comments
Hi, this is not entirely true. We can increase the grace period so that the ongoing invocation has a chance to finish execution. With correct termination handling, the instance would only exist during the execution and be destroyed as soon as the request is done. So it won't consume more resources for terminating pods. |
Thank you for your response. The method you mentioned indeed ensures that requests are not prematurely terminated. Now, I am wondering whether Knative can provide a Lambda-like warm-up mechanism, where instances can be kept in a warm state if requests are sent at a certain frequency ? |
By default, instances are controlled by the autoscaler, which decides when to scale up and down. It does this based on the observed concurrent request count over a window. So, it holds the instances for some time, although the logic is a bit more complicated compared to keep-alive policies that are commonly used. If we are talking about keeping a single instance warm, you can trigger the execution once a window period (60s by default). Keeping two instances warm is much trickier; I won't even try to reason how to make that happen. Also, increasing the autoscaling window helps store warm instances for longer periods of time. |
Unfortunately, I need to maintain a pool of instances with an arbitrary number (greater than 1), does this mean I can't use Knative to achieve this? Are there any other software solutions that can provide a "warm" instance mechanism similar to AWS Lambda? |
Sorry, didn't state the obvious solution. My previous answer focused on a trick to make the instance warm, but there is a proper solution to retain a specific scale. You can set minimum scale (docs) of the function. That works like provisioned concurrency in AWS Lambda, you will always have at least this amount of instances for this function with the ability scale up automatically. |
Apologies, my previous discussion is not clear. I don't want to pre-allocate instances. As discussed previously, I am wondering whether Knative can provide a Lambda-like warm-up mechanism, where instances can be kept in a warm state if requests are sent at a certain frequency. As you've pointed out, knative autoscaler scales up and down based on the observed concurrent request count over a window. However. when knative scales down, it does't know which instances are processing request, so it will kill processing instances. I want to know if knative can know the state of each instance, and only kill free instances. If knative can't, what other software can do? |
First, yes, it is not possible right now in knative to terminate a specific pod, so we may terminate the pod with a running request. This is a feature request in knative and k8s for 5 years (you added the link in the original question). Second, I don't think that this is a problem. The process of pod termination is not instant. At the transition to the terminating state, the pod receives SIGTERM, and it is given a grace period (in the case of knative, it is set to function timeout) to finish current request(s) and terminate gracefully. After the grace period, it is forcefully terminated with SIGKILL. So, the autoscaler works the same way as before; resource utilization is still higher (we need to retain instances in the terminating state until they are finished with requests), but I don't think that's a big issue. Third, I'm still confused about the warm-up mechanism that you are talking about. If we are discussing creating additional requests to keep the instance warm, I'd say that is not a mechanism but a trick that exploits AWS' keepalive policy. The proper way to ensure the presence of at least a specific number of instances is to request it directly from the provider. I don't see the connection to the original question: the instance with long-running requests won't receive additional requests until it handles the existing one (if the instance can get only one request at a time). So, firing additional requests to keep it warm won't affect the instance's lifetime since they would be routed to another instance. If something is still unclear, please provide an example of what you need to do/want to see, so we can discuss it. |
Sorry for late response. Actually, I want to have an independent timer for each instance, such that when a request is routed to an instance, it resets this timer. The instance should only be released if no requests are routed to it after a predefined period (like Knative's default 90 seconds), and each incoming request would effectively reset this timer. |
Describe the enhancement
In Lambda, the maximum execution time for each request is 15 minutes, and the instance executing the request will not be terminated prematurely. However, based on experiments and existing issues in Knative, requests in Knative may be terminated prematurely. To prevent pods from being prematurely deleted, I currently have to increase the stable window, but this will lower resource utilization. Does anyone have a better solution?
The text was updated successfully, but these errors were encountered: