Support forced spreading for multi GPU #4266

dhiltgen · 2024-05-08T21:34:47Z

Our default behavior today is to try to fit into a single GPU if possible. Some users would prefer the old behavior of always spreading across multiple GPUs even if the model can fit into one. This exposes that tunable behavior.

Fixes #4198

Our default behavior today is to try to fit into a single GPU if possible. Some users would prefer the old behavior of always spreading across multiple GPUs even if the model can fit into one. This exposes that tunable behavior.

kungfu-eric · 2024-05-17T23:03:00Z

This should be good because it's not clear without inferencing a long sequence whether the sequence will fit in memory or not. The model and some default overhead (2k?) might fit but changing the num_ctx parameter to longer may lead to OOM on this single GPU config

Support forced spreading for multi GPU

96ec776

Our default behavior today is to try to fit into a single GPU if possible. Some users would prefer the old behavior of always spreading across multiple GPUs even if the model can fit into one. This exposes that tunable behavior.

dhiltgen mentioned this pull request May 18, 2024

Enhanced GPU discovery and multi-gpu support with concurrency #4517

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support forced spreading for multi GPU #4266

Support forced spreading for multi GPU #4266

dhiltgen commented May 8, 2024

kungfu-eric commented May 17, 2024

Support forced spreading for multi GPU #4266

Are you sure you want to change the base?

Support forced spreading for multi GPU #4266

Conversation

dhiltgen commented May 8, 2024

kungfu-eric commented May 17, 2024