Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batched generation support #118

Open
ealmloff opened this issue Dec 23, 2023 · 2 comments
Open

Batched generation support #118

ealmloff opened this issue Dec 23, 2023 · 2 comments
Labels
Kalosm Related to the Kalosm library performance

Comments

@ealmloff
Copy link
Collaborator

Specific Demand

Kalosm-language should support batched generation for faster local inference. This can be very useful when generating many unrelated streams of text

Implement Suggestion

We can change the Model trait to optionally support batched generation then implement batched generation for each of the kalosm models.

@ealmloff ealmloff added Kalosm Related to the Kalosm library performance labels Dec 23, 2023
@ealmloff
Copy link
Collaborator Author

ealmloff commented Feb 8, 2024

This doesn't appear to be much faster in kalosm-llama at least on the CPU. We can revisit this once GPU support is added

@ealmloff ealmloff closed this as completed Feb 8, 2024
@ealmloff
Copy link
Collaborator Author

Accelerator support is now enabled. This may provide more performance benefits on CUDA/metal

@ealmloff ealmloff reopened this Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Kalosm Related to the Kalosm library performance
Projects
Status: Done
Development

No branches or pull requests

1 participant