[Bug]: Llama 3 - Out of memory - RTX 4060 TI #4821

savi8sant8s · 2024-05-15T01:55:37Z

Your current environment

vllm 0.4.2

🐛 Describe the bug

Hello. Would anyone have an example of how I could run Llama 3 on an NVIDIA RTX 4060 TI 16GB? I tried to do inference with this model https://huggingface.co/rhaymison/Llama-3-portuguese-Tom-cat-8b-instruct and a lora adapter but it always gives out of memory. I enabled enforce eager, increased the max GPU memory to 1, even reduced the size of the SWAP but it still overflowed the memory.

robertgshaw2-neuralmagic · 2024-05-16T19:26:16Z

Llama3 at fp16 > 16GB of RAM.

Try using a quantized model.

savi8sant8s added the bug Something isn't working label May 15, 2024

robertgshaw2-neuralmagic closed this as completed May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Llama 3 - Out of memory - RTX 4060 TI #4821

[Bug]: Llama 3 - Out of memory - RTX 4060 TI #4821

savi8sant8s commented May 15, 2024

robertgshaw2-neuralmagic commented May 16, 2024

[Bug]: Llama 3 - Out of memory - RTX 4060 TI #4821

[Bug]: Llama 3 - Out of memory - RTX 4060 TI #4821

Comments

savi8sant8s commented May 15, 2024

Your current environment

🐛 Describe the bug

robertgshaw2-neuralmagic commented May 16, 2024