Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[WIP] [Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support)
#4837
opened May 15, 2024 by
afeldman-nm
•
Draft
[Bugfix] fix rope error when load models with different dtypes
#4835
opened May 15, 2024 by
jinzhen-lin
Loading…
[Bugfix][Model] Add base class for vision-language models
#4809
opened May 14, 2024 by
DarkLight1337
Loading…
[Speculative decoding] Enable TP>1 speculative decoding
#4808
opened May 14, 2024 by
cadedaniel
Loading…
[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model
#4799
opened May 14, 2024 by
linxihui
Loading…
[Build/CI] Extending the set of AMD tests with Regression, Basic Correctness, Distributed, Engine, Llava Tests
#4797
opened May 13, 2024 by
Alexei-V-Ivanov-AMD
Loading…
[Kernel] add bfloat16 support for gptq marlin kernel
#4788
opened May 13, 2024 by
jinzhen-lin
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2024-05-12.