vllm-project / vllm Public

Notifications
Fork 2.6k
Star 19.5k

Code
Issues 819
Pull requests 227
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q2 2024

#3861 opened Apr 4, 2024 by simon-mo

Open 23

Virtual Office Hours: May 15 2pm ET

#4538 opened May 1, 2024 by robertgshaw2-neuralmagic

Open 1

Labels 41 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

819 Open 1,902 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Bug]: llava, output is truncated, not fully displayed bug

Something isn't working

#4822 opened May 15, 2024 by xiaoyudxy

[Bug]: Llama 3 - Out of memory - RTX 4060 TI bug

Something isn't working

#4821 opened May 15, 2024 by savi8sant8s

Remove EOS token before passing the tokenized input to model misc

#4814 opened May 14, 2024 by VallabhMahajan1

[Usage]: convert llava-v1.5-7b to liuhaotian/llava-v1.5-7b-hf format usage

How to use vllm

#4811 opened May 14, 2024 by xiaoyudxy

Qwen1.5-14B-Chat-GPTQ-Int4: quantization is not fully optimized yet. The speed can be slower than non-quantized models. bug

Something isn't working

#4810 opened May 14, 2024 by lostsollar

[Bug]: ModelRegistry.load_model_cls() circular import error on llama-llava bug

Something isn't working

#4807 opened May 14, 2024 by datta-nimmaturi

[Performance]: Qwen 7b chat model, under 128 concurrency, the CPU utilization rate is 100%, and the GPU SM utilization rate is only about 60%-75%. Is it a CPU bottleneck? performance

Performance-related issues

#4806 opened May 14, 2024 by markluofd

[Usage]: Seems nn.module definition may affect the output tokens. Don't know the reason. usage

How to use vllm

#4805 opened May 14, 2024 by Zhenzhong1

[Bug]: logprobs is not compatible with the OpenAI spec bug

Something isn't working

good first issue

Good for newcomers

help wanted

Extra attention is needed

#4795 opened May 13, 2024 by GabrielBianconi

[Bug]: Async engine hangs with 0.4.* releases bug

Something isn't working

#4789 opened May 13, 2024 by glos-nv

[Bug]: RAM OOM Error Loading 480GB MoE Model Despite Fix in PR #1395 bug

Something isn't working

#4786 opened May 13, 2024 by hxer7963

[Bug]: multi-gpu for baichuan2-13B-Chat benchmark_serving bug

Something isn't working

#4785 opened May 13, 2024 by shudct

[Bug]: deploy Phi-3-mini-128k-instruct AssertionError bug

Something isn't working

#4784 opened May 13, 2024 by hxujal

[Usage]: How to change the batch size when testing the throughput of VLLM by running benchmark_throughput usage

How to use vllm

#4783 opened May 13, 2024 by Ourspolaire1

[Doc]: Doc for using tensorizer_uri with LLM is incorrect documentation

Improvements or additions to documentation

#4782 opened May 13, 2024 by GRcharles

[Feature]: Support the OpenAI Batch Chat Completions file format feature request

#4777 opened May 13, 2024 by wuisawesome

[Bug]: Unexpected Special Tokens in prompt_logprobs Output for Llama3 Prompt bug

Something isn't working

#4772 opened May 12, 2024 by leejamesss

[Feature]: Host CPU Docker image on Docker Hub feature request

#4771 opened May 12, 2024 by VMinB12

[Feature]: CI: Test on NVLink-enabled machine feature request

#4770 opened May 12, 2024 by youkaichao

[Feature]: could paged_attention_v1 support parameter 'attn_bias' feature request

#4766 opened May 11, 2024 by cillinzhang

[Feature]: Support W4A8KV4 Quantization(QServe/QoQ) feature request

#4763 opened May 11, 2024 by bratao

[Performance]: Why the avg. througput generation is low? performance

Performance-related issues

#4760 opened May 11, 2024 by rvsh2

[Bug]: CUDA error when running mistral-7b + lora with tensor_para=8 bug

Something isn't working

#4756 opened May 11, 2024 by sfc-gh-zhwang

Regression in support of customized "role" in OpenAI compatible API (v.0.4.2) good first issue

Good for newcomers

#4755 opened May 10, 2024 by simon-mo

[Usage]: prompt_logprompt from endpoint usage

How to use vllm

#4747 opened May 10, 2024 by basma-b

Previous 1 2 3 4 5 … 32 33 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly