Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Llama 3 - Out of memory - RTX 4060 TI #4821

Closed
savi8sant8s opened this issue May 15, 2024 · 1 comment
Closed

[Bug]: Llama 3 - Out of memory - RTX 4060 TI #4821

savi8sant8s opened this issue May 15, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@savi8sant8s
Copy link

Your current environment

vllm 0.4.2

馃悰 Describe the bug

Hello. Would anyone have an example of how I could run Llama 3 on an NVIDIA RTX 4060 TI 16GB? I tried to do inference with this model https://huggingface.co/rhaymison/Llama-3-portuguese-Tom-cat-8b-instruct and a lora adapter but it always gives out of memory. I enabled enforce eager, increased the max GPU memory to 1, even reduced the size of the SWAP but it still overflowed the memory.

@savi8sant8s savi8sant8s added the bug Something isn't working label May 15, 2024
@robertgshaw2-neuralmagic
Copy link
Collaborator

Llama3 at fp16 > 16GB of RAM.

Try using a quantized model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants