You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
VLLM now supports the Llava model. We know that the Llava model has various chat templates, such as the Vicuna template and the Llama2 template. I have two questions:
1.How do I specify a chat template?
2.When I use the Llava module in the LLama language model for prediction, I find that the output is usually truncated and not fully displayed. The length after truncation is approximately 14 characters. What's going on with this? How can I avoid this situation?
import argparse
import os
import subprocess
from PIL import Image
import torch
from PIL import Image
from vllm import LLM
from vllm.sequence import MultiModalData
llm = LLM(
model= "/media/star/8T/model/gpt/llava/llava-hf/llava-1.5-7b-hf",
image_input_type="pixel_values",
image_token_id=32000,
image_input_shape="1,3,336,336",
image_feature_size=576,
gpu_memory_utilization=0.3,
swap_space=8
)
from transformers import CLIPVisionModel, CLIPImageProcessor, CLIPVisionConfig
image_processor= CLIPImageProcessor.from_pretrained("/media/star/8T/model/clip/openai_clip/clip-vit-large-patch14-336")
image_path="/media/star/8T/tmp/gpt4v/1/1.png"
image_data = Image.open(image_path).convert("RGB")
image_tensor = image_processor.preprocess(image_data, return_tensors='pt')['pixel_values'].half().to("cuda")
question="desc the image in detail "
prompt = "<image>" * 576 + (
f"\n USER: desc the image in detail \nASSISTANT:")
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
RequestOutput = llm.generate(prompt,
multi_modal_data=MultiModalData(
type=MultiModalData.Type.IMAGE, data=image_tensor))
print(RequestOutput[0].outputs[0].text)
The text was updated successfully, but these errors were encountered:
I hava found the solution. By adding max_tokens to the sampling_params, inference result is ok.
sampling_params = SamplingParams(temperature=1, top_p=0.01,max_tokens=1024)
Your current environment
馃悰 Describe the bug
VLLM now supports the Llava model. We know that the Llava model has various chat templates, such as the Vicuna template and the Llama2 template. I have two questions:
1.How do I specify a chat template?
2.When I use the Llava module in the LLama language model for prediction, I find that the output is usually truncated and not fully displayed. The length after truncation is approximately 14 characters. What's going on with this? How can I avoid this situation?
The text was updated successfully, but these errors were encountered: