You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you ask a question that has a very long content, it never replies back. If you go to the server logs you can see that it failed but nothing has been displayed to the user
ValueError: Requested tokens (4237) exceed context window of 2048
INFO: 10.88.0.1:37666 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
Exception: Requested tokens (4237) exceed context window of 2048
Traceback (most recent call last):
File "/usr/local/lib64/python3.9/site-packages/llama_cpp/server/errors.py", line 171, in custom_route_handler
response = await original_route_handler(request)
File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "/usr/local/lib64/python3.9/site-packages/llama_cpp/server/app.py", line 462, in create_chat_completion
] = await run_in_threadpool(llama.create_chat_completion, **kwargs)
File "/usr/local/lib/python3.9/site-packages/starlette/concurrency.py", line 42, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
File "/usr/local/lib/python3.9/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 851, in run
result = context.run(func, *args)
File "/usr/local/lib64/python3.9/site-packages/llama_cpp/llama.py", line 1657, in create_chat_completion
return handler(
File "/usr/local/lib64/python3.9/site-packages/llama_cpp/llama_chat_format.py", line 599, in chat_completion_handler
completion_or_chunks = llama.create_completion(
File "/usr/local/lib64/python3.9/site-packages/llama_cpp/llama.py", line 1493, in create_completion
completion: Completion = next(completion_or_chunks) # type: ignore
File "/usr/local/lib64/python3.9/site-packages/llama_cpp/llama.py", line 972, in _create_completion
raise ValueError(
The text was updated successfully, but these errors were encountered:
If you ask a question that has a very long content, it never replies back. If you go to the server logs you can see that it failed but nothing has been displayed to the user
The text was updated successfully, but these errors were encountered: