Batched generation support #118

ealmloff · 2023-12-23T21:55:15Z

Specific Demand

Kalosm-language should support batched generation for faster local inference. This can be very useful when generating many unrelated streams of text

We can change the Model trait to optionally support batched generation then implement batched generation for each of the kalosm models.

ealmloff · 2024-02-08T02:33:16Z

This doesn't appear to be much faster in kalosm-llama at least on the CPU. We can revisit this once GPU support is added

ealmloff · 2024-04-26T14:41:30Z

Accelerator support is now enabled. This may provide more performance benefits on CUDA/metal

ealmloff added Kalosm Related to the Kalosm library performance labels Dec 23, 2023

ealmloff closed this as completed Feb 8, 2024

ealmloff reopened this Apr 26, 2024