Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: llava, output is truncated, not fully displayed #4822

Closed
AmazDeng opened this issue May 15, 2024 · 1 comment
Closed

[Bug]: llava, output is truncated, not fully displayed #4822

AmazDeng opened this issue May 15, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@AmazDeng
Copy link

Your current environment

The output of `python collect_env.py`

馃悰 Describe the bug

VLLM now supports the Llava model. We know that the Llava model has various chat templates, such as the Vicuna template and the Llama2 template. I have two questions:

1.How do I specify a chat template?
2.When I use the Llava module in the LLama language model for prediction, I find that the output is usually truncated and not fully displayed. The length after truncation is approximately 14 characters. What's going on with this? How can I avoid this situation?

import argparse
import os
import subprocess
from PIL import Image
import torch
from PIL import Image

from vllm import LLM
from vllm.sequence import MultiModalData

llm = LLM(
        model= "/media/star/8T/model/gpt/llava/llava-hf/llava-1.5-7b-hf",
        image_input_type="pixel_values",
        image_token_id=32000,
        image_input_shape="1,3,336,336",
        image_feature_size=576,
        gpu_memory_utilization=0.3, 
        swap_space=8
    )

from transformers import CLIPVisionModel, CLIPImageProcessor, CLIPVisionConfig
image_processor= CLIPImageProcessor.from_pretrained("/media/star/8T/model/clip/openai_clip/clip-vit-large-patch14-336")

image_path="/media/star/8T/tmp/gpt4v/1/1.png"
image_data = Image.open(image_path).convert("RGB")

image_tensor = image_processor.preprocess(image_data, return_tensors='pt')['pixel_values'].half().to("cuda")

question="desc the image in detail "
prompt = "<image>" * 576 + (
        f"\n USER: desc the image in detail \nASSISTANT:")

sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
RequestOutput = llm.generate(prompt,
                           multi_modal_data=MultiModalData(
                               type=MultiModalData.Type.IMAGE, data=image_tensor))
print(RequestOutput[0].outputs[0].text)

123

@AmazDeng AmazDeng added the bug Something isn't working label May 15, 2024
@AmazDeng
Copy link
Author

I hava found the solution. By adding max_tokens to the sampling_params, inference result is ok.
sampling_params = SamplingParams(temperature=1, top_p=0.01,max_tokens=1024)

RequestOutput = llm.generate(prompt,sampling_params,
                           multi_modal_data=MultiModalData(
                               type=MultiModalData.Type.IMAGE, data=image_tensor))

so, this issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant