Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silent failure when text is too long when using the playground #1021

Open
lstocchi opened this issue Apr 30, 2024 · 0 comments
Open

Silent failure when text is too long when using the playground #1021

lstocchi opened this issue Apr 30, 2024 · 0 comments

Comments

@lstocchi
Copy link
Contributor

If you ask a question that has a very long content, it never replies back. If you go to the server logs you can see that it failed but nothing has been displayed to the user

ValueError: Requested tokens (4237) exceed context window of 2048
INFO:     10.88.0.1:37666 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
Exception: Requested tokens (4237) exceed context window of 2048
Traceback (most recent call last):
  File "/usr/local/lib64/python3.9/site-packages/llama_cpp/server/errors.py", line 171, in custom_route_handler
    response = await original_route_handler(request)
  File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/usr/local/lib64/python3.9/site-packages/llama_cpp/server/app.py", line 462, in create_chat_completion
    ] = await run_in_threadpool(llama.create_chat_completion, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/starlette/concurrency.py", line 42, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/usr/local/lib/python3.9/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "/usr/local/lib64/python3.9/site-packages/llama_cpp/llama.py", line 1657, in create_chat_completion
    return handler(
  File "/usr/local/lib64/python3.9/site-packages/llama_cpp/llama_chat_format.py", line 599, in chat_completion_handler
    completion_or_chunks = llama.create_completion(
  File "/usr/local/lib64/python3.9/site-packages/llama_cpp/llama.py", line 1493, in create_completion
    completion: Completion = next(completion_or_chunks)  # type: ignore
  File "/usr/local/lib64/python3.9/site-packages/llama_cpp/llama.py", line 972, in _create_completion
    raise ValueError(
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant