Fix: Allow null in Finish reason streaming response in vLLM #3684

gavrishp · 2024-05-13T14:35:05Z

What this PR does / why we need it:
As we allow null value in finish_reason in the openai schema now. Update the same in vllm response

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Type of changes
Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Special notes for your reviewer:

None

Checklist:

Have you added unit/e2e tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

Release note:

Re-running failed tests

/rerun-all - rerun all failed workflows.
/rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

yuzisun · 2024-05-15T04:07:57Z

/lgtm
/approve

oss-prow-bot · 2024-05-15T04:08:03Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gavrishp, yuzisun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [yuzisun]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

* upgrade vllm/transformers version (#3671) upgrade vllm version Signed-off-by: Johnu George <johnugeorge109@gmail.com> * Add openai models endpoint (#3666) Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * feat: Support customizable deployment strategy for RawDeployment mode. Fixes #3452 (#3603) * feat: Support customizable deployment strategy for RawDeployment mode Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * regen Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * lint Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Correctly apply rollingupdate Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * address comments Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Add validation Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Enable dtype support for huggingface server (#3613) * Enable dtype for huggingface server Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Set float16 as default. Fixup linter Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Add small comment to make the changes understandable Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Fixup linter Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Adapt to new huggingfacemodel Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Fixup merge :) Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Explicitly mention the behaviour of dtype flag on auto. Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Default to FP32 for encoder models Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Selectively add --dtype to parser. Use FP16 for GPU and FP32 for CPU Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Fixup linter Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Update poetry Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Use torch.float32 forr tests explicitly Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> --------- Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Add method for checking model health/readiness (#3673) Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * fix for extract zip from gcs (#3510) * fix for extract zip from gcs Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * initial commit for gcs model download unittests Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * unittests for model download from gcs Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * black format fix Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * code verification Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> --------- Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * Update Dockerfile and Readme (#3676) Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Update huggingface readme (#3678) * update wording for huggingface README small update to make readme easier to understand Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> * Update README.md Signed-off-by: Alexa Griffith agriffith50@bloomberg.net * Update python/huggingfaceserver/README.md Co-authored-by: Filippe Spolti <filippespolti@gmail.com> Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> * update vllm Signed-off-by: alexagriffith <agriffith50@bloomberg.net> * Update README.md --------- Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> Signed-off-by: Alexa Griffith agriffith50@bloomberg.net Signed-off-by: alexagriffith <agriffith50@bloomberg.net> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Filippe Spolti <filippespolti@gmail.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net> * fix: HPA equality check should include annotations (#3650) * fix: HPA equality check should include annotations Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Only watch related autoscalerclass annotation Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * simplify Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Add missing delete action Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * fix logic Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Fix: huggingface runtime in helm chart (#3679) fix huggingface runtime in chart Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Fix: model id and model dir check order (#3680) * fix huggingface runtime in chart Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Allow model_dir to be specified on template Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Default model_dir to /mnt/models for HF Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Lint format Signed-off-by: Dan Sun <dsun20@bloomberg.net> --------- Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Fix:vLLM Model Supported check throwing circular dependency (#3688) * Fix:vLLM Model Supported check throwing circular dependency Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * remove unwanted comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * remove unwanted comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix return case Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix to check all arch in model config forr vllm support Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fixlint Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> --------- Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Fix: Allow null in Finish reason streaming response in vLLM (#3684) Fix: allow null in Finish reason Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> --------- Signed-off-by: Johnu George <johnugeorge109@gmail.com> Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> Signed-off-by: Alexa Griffith agriffith50@bloomberg.net Signed-off-by: alexagriffith <agriffith50@bloomberg.net> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Curtis Maddalozzo <cmaddalozzo@users.noreply.github.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Datta Nimmaturi <39181234+Datta0@users.noreply.github.com> Co-authored-by: Andrews Arokiam <87992092+andyi2it@users.noreply.github.com> Co-authored-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> Co-authored-by: Alexa Griffith <agriffith50@bloomberg.net> Co-authored-by: Filippe Spolti <filippespolti@gmail.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net>

Fix: allow null in Finish reason

a502b4d

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>

oss-prow-bot bot requested review from israel-hdez and lizzzcai May 13, 2024 14:35

gavrishp changed the title ~~Fix: Allow null in Finish reason streaming response~~ Fix: Allow null in Finish reason streaming response in vLLM May 13, 2024

oss-prow-bot bot assigned yuzisun May 15, 2024

oss-prow-bot bot added the lgtm label May 15, 2024

oss-prow-bot bot added the approved label May 15, 2024

yuzisun merged commit 8771c3d into kserve:master May 15, 2024
57 of 58 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Allow null in Finish reason streaming response in vLLM #3684

Fix: Allow null in Finish reason streaming response in vLLM #3684

gavrishp commented May 13, 2024

yuzisun commented May 15, 2024

oss-prow-bot bot commented May 15, 2024

Fix: Allow null in Finish reason streaming response in vLLM #3684

Fix: Allow null in Finish reason streaming response in vLLM #3684

Conversation

gavrishp commented May 13, 2024

yuzisun commented May 15, 2024

oss-prow-bot bot commented May 15, 2024