[Bug]: llava, output is truncated, not fully displayed #4822

AmazDeng · 2024-05-15T03:23:43Z

Your current environment

The output of `python collect_env.py`

🐛 Describe the bug

VLLM now supports the Llava model. We know that the Llava model has various chat templates, such as the Vicuna template and the Llama2 template. I have two questions:

1.How do I specify a chat template?
2.When I use the Llava module in the LLama language model for prediction, I find that the output is usually truncated and not fully displayed. The length after truncation is approximately 14 characters. What's going on with this? How can I avoid this situation?

import argparse
import os
import subprocess
from PIL import Image
import torch
from PIL import Image

from vllm import LLM
from vllm.sequence import MultiModalData

llm = LLM(
        model= "/media/star/8T/model/gpt/llava/llava-hf/llava-1.5-7b-hf",
        image_input_type="pixel_values",
        image_token_id=32000,
        image_input_shape="1,3,336,336",
        image_feature_size=576,
        gpu_memory_utilization=0.3, 
        swap_space=8
    )

from transformers import CLIPVisionModel, CLIPImageProcessor, CLIPVisionConfig
image_processor= CLIPImageProcessor.from_pretrained("/media/star/8T/model/clip/openai_clip/clip-vit-large-patch14-336")

image_path="/media/star/8T/tmp/gpt4v/1/1.png"
image_data = Image.open(image_path).convert("RGB")

image_tensor = image_processor.preprocess(image_data, return_tensors='pt')['pixel_values'].half().to("cuda")

question="desc the image in detail "
prompt = "<image>" * 576 + (
        f"\n USER: desc the image in detail \nASSISTANT:")

sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
RequestOutput = llm.generate(prompt,
                           multi_modal_data=MultiModalData(
                               type=MultiModalData.Type.IMAGE, data=image_tensor))
print(RequestOutput[0].outputs[0].text)

The text was updated successfully, but these errors were encountered:

AmazDeng · 2024-05-15T11:37:11Z

I hava found the solution. By adding max_tokens to the sampling_params, inference result is ok.
sampling_params = SamplingParams(temperature=1, top_p=0.01,max_tokens=1024)

RequestOutput = llm.generate(prompt,sampling_params,
                           multi_modal_data=MultiModalData(
                               type=MultiModalData.Type.IMAGE, data=image_tensor))

so, this issue can be closed.

AmazDeng added the bug Something isn't working label May 15, 2024

AmazDeng closed this as completed May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: llava, output is truncated, not fully displayed #4822

[Bug]: llava, output is truncated, not fully displayed #4822

AmazDeng commented May 15, 2024

AmazDeng commented May 15, 2024

[Bug]: llava, output is truncated, not fully displayed #4822

[Bug]: llava, output is truncated, not fully displayed #4822

Comments

AmazDeng commented May 15, 2024

Your current environment

🐛 Describe the bug

AmazDeng commented May 15, 2024