You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tested with Apache Benchmark to test how many api calls Openllm can handle at the same time.
I set concurrent user 4, requests num 40(10 requests per one).
For 40 requests, 35 returned 200 success and the remaining 5 returned runtime errors.
This is the error log.
File "/usr/local/lib/python3.8/dist-packages/openllm_core/_schemas.py", line 165, in from_runner
structured = orjson.loads(data)
orjson.JSONDecodeError: unexpected character: line 1 column 1 (char 0)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/openllm/_llm.py", line 116, in generate_iterator
generated = GenerationOutput.from_runner(out).with_options(prompt=prompt)
File "/usr/local/lib/python3.8/dist-packages/openllm_core/_schemas.py", line 167, in from_runner
raise ValueError(f'Failed to parse JSON from SSE message: {data!r}') from e
ValueError: Failed to parse JSON from SSE message: 'Service Busy'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/bentoml/_internal/server/http_app.py", line 341, in api_func
output = await api.func(*args)
File "/usr/local/lib/python3.8/dist-packages/openllm/_service.py", line 23, in generate_v1
return (await llm.generate(**llm_model_class(**input_dict).model_dump())).model_dump()
File "/usr/local/lib/python3.8/dist-packages/openllm/_llm.py", line 55, in generate
async for result in self.generate_iterator(
File "/usr/local/lib/python3.8/dist-packages/openllm/_llm.py", line 125, in generate_iterator
raise RuntimeError(f'Exception caught during generation: {err}') from err
RuntimeError: Exception caught during generation: Failed to parse JSON from SSE message: 'Service Busy'
Do I have to use a load balancer like nginx or gobetween to solve it?
Is this problem can't solve only in openllm?
The text was updated successfully, but these errors were encountered:
I tested with Apache Benchmark to test how many api calls Openllm can handle at the same time.
I set concurrent user 4, requests num 40(10 requests per one).
For 40 requests, 35 returned 200 success and the remaining 5 returned runtime errors.
This is the error log.
Do I have to use a load balancer like nginx or gobetween to solve it?
Is this problem can't solve only in openllm?
The text was updated successfully, but these errors were encountered: