We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kalosm-language should support batched generation for faster local inference. This can be very useful when generating many unrelated streams of text
We can change the Model trait to optionally support batched generation then implement batched generation for each of the kalosm models.
The text was updated successfully, but these errors were encountered:
This doesn't appear to be much faster in kalosm-llama at least on the CPU. We can revisit this once GPU support is added
Sorry, something went wrong.
Accelerator support is now enabled. This may provide more performance benefits on CUDA/metal
No branches or pull requests
Specific Demand
Kalosm-language should support batched generation for faster local inference. This can be very useful when generating many unrelated streams of text
Implement Suggestion
We can change the Model trait to optionally support batched generation then implement batched generation for each of the kalosm models.
The text was updated successfully, but these errors were encountered: