Releases: kserve/kserve
v0.13.0-rc1
What's Changed
- upgrade vllm/transformers version by @johnugeorge in #3671
- Add openai models endpoint by @cmaddalozzo in #3666
- feat: Support customizable deployment strategy for RawDeployment mode. Fixes #3452 by @terrytangyuan in #3603
- Enable dtype support for huggingface server by @Datta0 in #3613
- Add method for checking model health/readiness by @cmaddalozzo in #3673
- fix for extract zip from gcs by @andyi2it in #3510
- Update Dockerfile and Readme by @gavrishp in #3676
- Update huggingface readme by @alexagriffith in #3678
- fix: HPA equality check should include annotations by @terrytangyuan in #3650
- Fix: huggingface runtime in helm chart by @yuzisun in #3679
- Fix: model id and model dir check order by @yuzisun in #3680
- Fix:vLLM Model Supported check throwing circular dependency by @gavrishp in #3688
- Fix: Allow null in Finish reason streaming response in vLLM by @gavrishp in #3684
- Unify the log configuration using kserve logger by @sivanantha321 in #3577
- Remove conversion webhook from kubeflow manifest patch by @sivanantha321 in #3700
- Add the field ResponseStartTimeoutSeconds to create ksvc by @houshengbo in #3705
New Contributors
Full Changelog: v0.13.0-rc0...v0.13.0-rc1
v0.13.0-rc0
🌈 What's New?
- add support for async streaming in predict by @alexagriffith in #3475
- Fix: Support model parallelism in HF transformer by @gavrishp in #3459
- Support model revision and tokenizer revision in huggingface server by @lizzzcai in #3558
- OpenAI schema by @tessapham in #3477
- Support OpenAIModel in ModelRepository by @grandbora in #3590
- updated xgboost to support json and ubj models by @andyi2it in #3551
- Add OpenAI API support to Huggingfaceserver by @cmaddalozzo in #3582
- VLLM support for OpenAI Completions in HF server by @gavrishp in #3589
- Add a user friendly error message for http exceptions by @grandbora in #3581
- feat: Provide minimal distribution of CRDs by @terrytangyuan in #3492
- set default SAFETENSORS_FAST_GPU and HF_HUB_DISABLE_TELEMETRY in HF Server by @lizzzcai in #3594
- Enabled the multiple domains support on an inference service by @houshengbo in #3615
- Add base model for proxying request to an OpenAI API enabled model server by @cmaddalozzo in #3621
- Add headers to predictor exception logging by @grandbora in #3658
- Enhance controller setup based on available CRDs by @israel-hdez in #3472
⚠️ What's Changed
- Remove conversion webhook from manifests by @Jooho in #3476
- Remove cluster level list/watch for configmaps, serviceaccounts, secrets by @sivanantha321 in #3469
- chore: Remove Seldon Alibi dependencies. Fixes #3380 by @terrytangyuan in #3443
- docs: Move Alibi explainer to docs by @terrytangyuan in #3579
- Remove generate endpoints by @cmaddalozzo in #3654
🐛 What's Fixed
- Fix:Support Parallelism in vllm runtime by @gavrishp in #3464
- fix: Instantiate HuggingfaceModelRepository only when model cannot be loaded. Fixes #3423 by @terrytangyuan in #3424
- Fix isADirectoryError in Azure blob download by @tjandy98 in #3502
- Fix bug: Remove redundant helm chart affinity on predictor CRD by @trojaond in #3481
- Make the modelcar injection idempotent by @rhuss in #3517
- Only pad left for decode-only architecture models. by @sivanantha321 in #3534
- fix lint typo on Makefile by @spolti in #3569
- fix: Set writable cache folder to avoid permission issue. Fixes #3562 by @terrytangyuan in #3576
- Fix model unload in server stop method by @sivanantha321 in #3587
- Fix golint errors by @andyi2it in #3552
- Fix make deploy-dev-storage-initializer not working by @sivanantha321 in #3617
- Fix Pydantic 2 warnings by @cmaddalozzo in #3622
- build: Fix CRD copying in generate-install.sh by @terrytangyuan in #3620
- Only load from model repository if model binary is not found under model_dir by @sivanantha321 in #3559
- build: Remove misleading logs from minimal-crdgen.sh by @terrytangyuan in #3641
- Assign device to input tensors in huggingface server with huggingface backend by @saileshd1402 in #3657
- Fix Huggingface server stopping criteria by @cmaddalozzo in #3659
- Explicitly specify pad token id when generating tokens by @sivanantha321 in #3565
- Fix quick install does not cleans up Istio installer by @sivanantha321 in #3660
⬆️ Version Upgrade
- Upgrade orjson to version 3.9.15 by @spolti in #3488
- feat: upgrade to new fastapi, update models to handle both pydantic v… by @timothyjlaurent in #3374
- Update cert manager version in quick install script by @shauryagoel in #3496
- ci: Bump minikube version to work with newer K8s version by @terrytangyuan in #3498
- upgrade knative to 1.13 by @andyi2it in #3457
- Upgrade istio to 1.20 works for the Github Actions by @houshengbo in #3529
- chore: Bump ModelMesh version to v0.12.0-rc0 in Helm chart by @terrytangyuan in #3642
🔨 Project SDLC
- Enhance CI environment by @sivanantha321 in #3440
- Fixed go lint error using golangci-lint tool. by @andyi2it in #3378
- chore: Update list of reviewers by @ckadner in #3484
- build: Add helm docs update to make generate command by @terrytangyuan in #3437
- Added v2 infer test for supported model frameworks. by @andyi2it in #3349
- fix the quote format same with others and docstrings by @leyao-daily in #3490
- remove unnecessary Istio settings from quick_install.sh by @peterj in #3493
- Remove GOARCH by @mkumatag in #3523
- GH Alert: Potential file inclusion via variable by @spolti in #3520
- Update codeQL to v3 by @spolti in #3548
- switch e2e test inference graph to raw mode by @andyi2it in #3511
- Black lint by @cmaddalozzo in #3568
- Fix python linter by @sivanantha321 in #3571
- build: Add flake8 and black to pre-commit hooks by @terrytangyuan in #3578
- build: Allow pre-commit to keep changes in reformatted code by @terrytangyuan in #3604
- Allow rerunning failed workflows by comment by @andyi2it in #3550
- add re-run info in the PR templates by @spolti in #3633
- Add e2e tests for huggingface by @sivanantha321 in #3600
- Test image builds for ARM64 arch in CI by @sivanantha321 in #3629
- workflow file for cherry-pick on comment by @andyi2it in #3653
CVE patches
- CVE-2024-24762 - update fastapi to 0.109.1 by @spolti in #3556
- golang.org/x/net Allocation of Resources Without Limits or Throttling by @spolti in #3596
- Fix CVE-2023-45288 for qpext by @sivanantha321 in #3618
- Security fix - CVE 2024 24786 by @andyi2it in #3585
📝 Documentation Update
- qpext: fix a typo in qpext doc by @daixiang0 in #3491
- Update KServe project description by @yuzisun in #3524
- Update kserve cake diagram by @yuzisun in #3530
- Remove white background for the kserve diagram by @yuzisun in #3531
- fix a typo in OPENSHIFT_GUIDE.md by @marek-veber in #3544
- Fix typo in README.md by @terrytangyuan in #3575
New Contributors
- @leyao-daily made their first contribution in #3490
- @peterj made their first contribution in #3493
- @timothyjlaurent made their first contribution in #3374
- @shauryagoel made their first contribution in #3496
- @mkumatag made their first contribution in #3523
- @marek-veber made their first contribution in #3544
- @trojaond made their first contribution in #3481
- @grandbora made their first contribution in #3590
- @saileshd1402 made their first contribution in #3657
Full Changelog: v0.12.1...v0.13.0-rc0
v0.12.1
What's Changed
- [release-0.12] Update fastapi to 0.109.1 and Support ray 2.10 by @sivanantha321 in #3609
- [release-0.12] Pydantic 2 support by @cmaddalozzo in #3614
- [release-0.12] Make the modelcar injection idempotent by @sivanantha321 in #3612
- Prepare for release 0.12.1 by @sivanantha321 in #3610
- release-0.12 pin back ray to 2.10 by @yuzisun in #3616
- [release-0.12] Fix docker build failure for ARM64 by @sivanantha321 in #3627
Full Changelog: v0.12.0...v0.12.1
v0.12.0
🌈 What's New?
Core Inference & Serving Runtimes
- Implement HuggingFace model server by @yuzisun in #3334
- eat: Add HuggingFace runtime out-of-the-box support by @terrytangyuan in #3395
- Implement support for vllm as alternative backend by @gavrishp in #3415
- Torchserve grpc v2 by @andyi2it in #3247
- feat: CA bundle mount options for storage initializer by @Jooho in #3250
- Add support for modelcars by @rhuss in #3110
- Add compatibility for Istio CNI plugin by @israel-hdez in #3316
- feat: Allow to disable ingress creation for raw deployment mode by @terrytangyuan in #3436
Advanced Inference
- RawDeployment support for Inference Graph by @bmopuri in #3199, @bmopuri in #3194
- Added custom request timeout for inferencegraph. by @andyi2it in #3173
- Add regex support for propagating IG headers by @sivanantha321 in #3178
KServe Python SDK, Storage
- Unpack archive files for hdfs by @sivanantha321 in #3093
- feat: Support S3 transfer acceleration by @terrytangyuan in #3305
⚠️ What's Changed
- Change the default value for enableDirectPvcVolumeMount to true by @Jooho in #3371
- Add model arguments to API and update BERT inference example by @yuzisun in #3332
--model_name
, --predictor_host
, --predictor_use_ssl
, --predictor_request_timeout_seconds
are added to the kserve model server and no longer need to be defined in the custom predictor or transformer. --protocol
is deprecated and superceded by --predictor_protocol
. More details can be found on API reference doc.
🐛 What's Fixed
- Removing update op from pod-mutator webhook by @rachitchauhan43 in #3163
- Fix quick install script by @dtrifiro in #3164
- Fix self-signed-ca installation by @sivanantha321 in #3165
- Add S3_VERIFY_SSL to storage.py for S3 by @Jooho in #3172
- Fix runtime not found for triton due to wrong default protocolVersion by @sivanantha321 in #3177
- Make ModelServer to stop correctly when using more than 1 worker by @andyi2it in #3174
- Fix serving runtime webhook cert namespace for kubeflow installation by @sivanantha321 in #3188
- Fix knative config-defaults values overrided by kserve by @sivanantha321 in #3130
- Fix qpext metrics port by @yuzisun in #3209
- Added async with postprocess method. by @andyi2it in #3204
- Fix lightgbm model input conversion when input is list of lists by @sivanantha321 in #3226
- Validation added for ensuring same model format has same priority for runtime by @andyi2it in #3181
- Fix: Unexpected Panic in Inference graph when it fails to create http request by @HAO2167 in #3079
- Support verify variable with storage-config json style (fix-3263) by @Jooho in #3267
- s3 storage initializer: only set environment variables if variables are set in storage secret json by @dtrifiro in #3259
- Fix tensorflow e2e test fails due to OOM error by @sivanantha321 in #3293
- fix: Properly handle the creation and closure of success file in DownloadModel() by @terrytangyuan in #3295
- fix: Surface errors when writing graphHandler response by @terrytangyuan in #3308
- Fix qpext hangs during shutdown by @sivanantha321 in #3268
- fix: Check if HPA has the same scaleTargetRef and behavior by @terrytangyuan in #3294
- Updated quick_install script to temporarily fix 0.11.2 release install by @andyi2it in #3311
- image_patch_dev.sh: set pipefail by @dtrifiro in #3274
- Move pmml worker validation to runtime by @sivanantha321 in #3182
- Introduce retry on resource conflict by @sivanantha321 in #3240
- Fix inference request fails when sending with less number of features than the total model features on lightgbm by @sivanantha321 in #3313
- Fix raw deployment service points to predictor container port instead of transformer container port in transformer collocation by @sivanantha321 in #3318
- Restrict storage uri to predictor only in collocation of transformer and predictor by @sivanantha321 in #3280
- feat: Expose defaults for several batcher handler parameters by @terrytangyuan in #3301
- fix: Properly close resources and handle errors in agent and storage. Fixes #3323 by @terrytangyuan in #3321
- Handles s3 download for object name starts with folder name. by @andyi2it in #3205
- chore: Remove unused timeout annotation and flag in batcher by @terrytangyuan in #3341
- Pass missing infer parameters during conversion by @sivanantha321 in #3368
- Add exception handler for model server and Add ability to specify custom handler by @sivanantha321 in #3405
- fix: Add missing volume mount to transformer container when using modelcars by @rhuss in #3384
- fix: Add 'model_version' to InferResponse in python library by @ajstewart in #3466
- Fix v2 model ready url in kserve client by @sivanantha321 in #3403
- Fix parameters value type conversion by pydantic by @sivanantha321 in #3430
- Fix Raw Logger E2E by @israel-hdez in #3434
- Expose qpext aggregate metrics port on container by @sivanantha321 in #3291
- Fix dup metrics aggr port by @yuzisun in #3447
- fix: HuggingFace predictor should not be recognized as multi-model server by @terrytangyuan in #3449
- Fix: bugs for huggingface runtime template by @yuzisun in #3448
- Fix: Add padding and truncation in huggingface tokenizer by @kevinmingtarja in #3450
- Fix: vllm backend does not work with model_dir for huggingface runtime by @yuzisun in #3456
- Fix azure workload identity federation by excluding azure client secret by @robbertvdg in #3390
- Change
certificate
toca_bundle
in json style of s3 storageSecret by @Jooho in #3463
⬆️ Version Upgrade
- Upgrade istio Api and migrate to v1beta1 Api version by @sivanantha321 in #3150
- Bump torchserve version to 0.9.0 by @gavrishp in #3217
- Allow ray >=2.7,<3 by @ddelange in #3075
- Bump istio version to 1.19.4 by @sivanantha321 in #3258
- Updated ray to 2.8.0 and removed detached flag to avoid deprecation error in future by @andyi2it in #3272
- chore: Upgrade to XGBoost v2.0.2. Fixes #3310 by @terrytangyuan in #3309
- chore: Upgrade Go to v1.21 by @terrytangyuan in #3296
- Added 3.11 support for paddle in workflow. by @andyi2it in #3246
- Upgraded poetry version to 1.7.1 by @andyi2it in #3271
- Upgrade cloudevent to v2 by @homily707 in #3255
- Update knative-serving by @spolti in #3362
- Update google-cloud-storage dependecy to >=2.3.0,<3.0.0 and ray dependency to >=2.8.1, <3.0.0 by @sivanantha321 in #3389
🔨 Project SDLC
- chore: Add design doc template links to feature request template by @ckadner in #3155
- Make storage initializer image configurable by @yuzisun in #3145
- Increase pytest workers for kourier e2e test by @sivanantha321 in #3151
- Restrict workflow concurrency by @vignesh-murugani2i in #3167
- Generate client-go for StorageContainer CR by @sivanantha321 in #3152
- Refractor v1 vs. v2 endpoint unit tests in kserve/test/test_server.py… by @guohaoyu110 in #3158
- Verify codegen in CI by @sivanantha321 in ...
v0.12.0-rc1
What's Changed
- docs: Corrections and edits on release process document by @terrytangyuan in #3326
- build: Switch to use kustomize in kubectl to simplify build process. Fixes #3314 by @terrytangyuan in #3315
- feat: Expose defaults for several batcher handler parameters by @terrytangyuan in #3301
- fix: Properly close resources and handle errors in agent and storage. Fixes #3323 by @terrytangyuan in #3321
- Add model arguments to API and update BERT inference example by @yuzisun in #3332
- chore: Update generated APIs and check generated manifests by @terrytangyuan in #3335
- Update python model serving runtime API docstring by @yuzisun in #3338
- Handles s3 download for object name starts with folder name. by @andyi2it in #3205
- chore: Remove unused timeout annotation and flag in batcher by @terrytangyuan in #3341
- ci: Automate release process by @terrytangyuan in #3345
- fixes critical vulnerabilities on ray by @spolti in #3285
- chore: Bump versions to prepare v0.12.0-rc1 release by @terrytangyuan in #3352
- Change version for helm charts in README by @gawsoftpl in #3353
- Fixes CVE-2023-48795 by @spolti in #3354
- Fix Stack-based Buffer Overflow on protobuf by @spolti in #3358
- Update knative-serving by @spolti in #3362
- Fixes vulnerabilities on the otelhttp dependency by @spolti in #3361
- Change the default value for enableDirectPvcVolumeMount to true by @Jooho in #3371
- feat: Automatically generate Helm Chart docs. Fixes #3356 by @terrytangyuan in #3363
- Modified script for include all kserve poetry projects. by @andyi2it in #3350
- RawDeployment support for Inference Graph by @bmopuri in #3199
- Add compatibility for Istio CNI plugin by @israel-hdez in #3316
- Pass missing infer parameters during conversion by @sivanantha321 in #3368
- feat: Support S3 transfer acceleration by @terrytangyuan in #3305
- Implement HuggingFace model server by @yuzisun in #3334
- fix: Add missing volume mount to transformer container when using modelcars by @rhuss in #3384
- align cloudevents/sdk-go dependency by @spolti in #3387
New Contributors
- @gawsoftpl made their first contribution in #3353
Full Changelog: v0.12.0-rc0...v0.12.0-rc1
v0.12.0-rc0
What's Changed
- Make storage initializer image configurable by @yuzisun in #3145
- chore: Add design doc template links to feature request template by @ckadner in #3155
- Increase pytest workers for kourier e2e test by @sivanantha321 in #3151
- Upgrade istio Api and migrate to v1beta1 Api version by @sivanantha321 in #3150
- Unpack archive files for hdfs by @sivanantha321 in #3093
- Removing update op from pod-mutator webhook by @rachitchauhan43 in #3163
- Fix quick install script by @dtrifiro in #3164
- Fix self-signed-ca installation by @sivanantha321 in #3165
- Generate client-go for StorageContainer CR by @sivanantha321 in #3152
- Add S3_VERIFY_SSL to storage.py for S3 by @Jooho in #3172
- Allow disabling creation of the HPA in raw deployment mode by @andyi2it in #3086
- Restrict workflow concurrency by @vignesh-murugani2i in #3167
- Refractor v1 vs. v2 endpoint unit tests in kserve/test/test_server.py… by @guohaoyu110 in #3158
- Fix runtime not found for triton due to wrong default protocolVersion by @sivanantha321 in #3177
- Make ModelServer to stop correctly when using more than 1 worker by @andyi2it in #3174
- Added custom request timeout for inferencegraph. by @andyi2it in #3173
- Fix serving runtime webhook cert namespace for kubeflow installation by @sivanantha321 in #3188
- Add go security scan for PRs and set it up to run on a regular schedule by @sivanantha321 in #3170
- Verify codegen in CI by @sivanantha321 in #3189
- Fix knative config-defaults values overrided by kserve by @sivanantha321 in #3130
- Fix qpext metrics port by @yuzisun in #3209
- docs: fix some typos by @daixiang0 in #3214
- chore: Add new PR reviewers and approvers by @ckadner in #3213
- Added async with postprocess method. by @andyi2it in #3204
- Remove the redundant python lint check in CI environment by @nilakshi104 in #3184
- Move pmml worker validation to runtime by @sivanantha321 in #3182
- Bump torchserve version to 0.9.0 by @gavrishp in #3217
- CVE-2023-44487 - qpext by @spolti in #3203
- Allow ray >=2.7,<3 by @ddelange in #3075
- Fix lightgbm model input conversion when input is list of lists by @sivanantha321 in #3226
- CVE-2023-44487 by @spolti in #3202
- Sanitize a command line argument in agent by @israel-hdez in #3245
- Validation added for ensuring same model format has same priority for runtime by @andyi2it in #3181
- Fix: Unexpected Panic in Inference graph when it fails to create http request by @HAO2167 in #3079
- Add default clusterstoragecontainer cr into resources by @homily707 in #3219
- Support verify variable with storage-config json style (fix-3263) by @Jooho in #3267
- Update qpext docs on image patch by @sivanantha321 in #3266
- Added 3.11 support for paddle in workflow. by @andyi2it in #3246
- Torchserve grpc v2 by @andyi2it in #3247
- Bump istio version to 1.19.4 by @sivanantha321 in #3258
- image_patch_dev.sh: set pipefail by @dtrifiro in #3274
- s3 storage initializer: only set environment variables if variables are set in storage secret json by @dtrifiro in #3259
- feat: CA bundle mount options for storage initializer by @Jooho in #3250
- Fix tensorflow e2e test fails due to OOM error by @sivanantha321 in #3293
- Update Istio-Dex docs by @sivanantha321 in #3260
- chore: Upgrade Go to v1.21 by @terrytangyuan in #3296
- fix: Properly handle the creation and closure of success file in DownloadModel() by @terrytangyuan in #3295
- Updated ray to 2.8.0 and removed detached flag to avoid deprecation error in future by @andyi2it in #3272
- fix: Surface errors when writing graphHandler response by @terrytangyuan in #3308
- Fix qpext hangs during shutdown by @sivanantha321 in #3268
- chore: Upgrade to XGBoost v2.0.2. Fixes #3310 by @terrytangyuan in #3309
- fix: Check if HPA has the same scaleTargetRef and behavior by @terrytangyuan in #3294
- Updated quick_install script to temporarily fix 0.11.2 release install by @andyi2it in #3311
- Remove deprecated protobuf packages by @sivanantha321 in #3328
- Add health check for controller manager by @sivanantha321 in #3289
- Introduce retry on resource conflict by @sivanantha321 in #3240
- Updated Kserve version file path in pyproject.toml. by @andyi2it in #3225
- docs: Add link to OpenShift Container Platform instructions by @terrytangyuan in #3322
- Fix inference request fails when sending with less number of features than the total model features on lightgbm by @sivanantha321 in #3313
- Add a CI_USE_ISVC_HOST for testing with the ISVC hostname by @israel-hdez in #3324
- Upgraded poetry version to 1.7.1 by @andyi2it in #3271
- ci: publish helm chart to ghcr by @davidspek in #3319
- Fix raw deployment service points to predictor container port instead of transformer container port in transformer collocation by @sivanantha321 in #3318
- Upgrade cloudevent to v2 by @homily707 in #3255
- Restrict storage uri to predictor only in collocation of transformer and predictor by @sivanantha321 in #3280
- Add support for modelcars by @rhuss in #3110
- Add regex support for propagating IG headers by @sivanantha321 in #3178
- chore: Prepare v0.12.0-rc0 release by @terrytangyuan in #3325
New Contributors
- @dtrifiro made their first contribution in #3164
- @Jooho made their first contribution in #3172
- @vignesh-murugani2i made their first contribution in #3167
- @guohaoyu110 made their first contribution in #3158
- @bmopuri made their first contribution in #3194
- @daixiang0 made their first contribution in #3214
- @nilakshi104 made their first contribution in #3184
- @gavrishp made their first contribution in #3217
- @spolti made their first contribution in #3203
- @HAO2167 made their first contribution in #3079
- @homily707 made their first contribution in #3219
- @rhuss made their first contribution in #3110
Full Changelog: v0.11.1...v0.12.0-rc0
v0.11.2
What's Changed
- Fix serving runtime webhook cert namespace for kubeflow installation by @sivanantha321 in #3190
- [release-0.11] Fix qpext metrics port (#3209) by @houshengbo in #3210
- [release-0.11] Fix lightgbm model input conversion when input is list of lists by @sivanantha321 in #3229
- [release-0.11]Fix runtime not found for triton due to wrong default protocolVersion by @sivanantha321 in #3232
- [release-0.11]Fix mlserver runtime priority for sklearn by @sivanantha321 in #3233
- [release-0.11] Cherry-picks related to CVE-2023-44487 by @israel-hdez in #3242
- Version bump to 0.11.2 by @israel-hdez in #3244
New Contributors
- @houshengbo made their first contribution in #3210
Full Changelog: v0.11.1...v0.11.2
v0.11.1
What's Changed
- [docker] reduction in the number of layers for controller image by @alekseyolg in #3070
- document status conditions for RoutesReady and LatestDeploymentReady by @tessapham in #3069
- Update indirect dependency golang.org/x/net/html by @israel-hdez in #3072
- Fix kubeflow overlay kustomization by @sivanantha321 in #3083
- Check sys platform before using SIGQUIT - fix windows development by @andyi2it in #3089
- Use knative operator for installing knative in e2e tests by @sivanantha321 in #2984
- Upgrade k8s to 1.27, istio 1.8 in test environment by @sivanantha321 in #3077
- Introduce Storage container CRD by @greenmoon55 in #3060
- List Models v2 REST API by @jvujjini in #2963
- Introduce Priority field in ServingRuntime by @sivanantha321 in #3031
- Storage initializer fix so that it downloads only specific file when provided uri is not a folder by @andyi2it in #3088
- Inference Graph error response handling by @rachitchauhan43 in #3039
- Add doc.go for v1alpha1 API version by @sivanantha321 in #3118
- Fixed torchserve e2e test. by @andyi2it in #3106
- Fix: error response handling for splitter and switch nodes by @rachitchauhan43 in #3116
- Fix validation for custom storageUri by @greenmoon55 in #3134
- Bumping version for 0.11.1 by @rachitchauhan43 in #3141
New Contributors
- @alekseyolg made their first contribution in #3070
- @israel-hdez made their first contribution in #3072
- @jvujjini made their first contribution in #2963
Full Changelog: v0.11.0...v0.11.1
v0.11.0
🌈 What's New?
Core Inference & Serving Runtimes
- Feature enable ingress for path based routing by @kandrio in #2357
- Allow multiple containers in ServingRuntime by @markwinter in #2321
- Add disable ingress configuration for raw deployment by @andyi2it in #2773
- Add support for collocation of transformer and predictor by @sivanantha321 in #2873
- Support setting labels and annotations on the component level by @lizzzcai in #2925
- Add RoutesReady and LastDeploymentReady status conditions by @tessapham in #3008
- Triton FasterTransformer LLM by @cmaddalozzo in #2836
- Implement v2/open inference endpoints for kserve python runtimes by @Suresh-Nakkeran in #2655
- Support mixed input type for kserve python runtimes by @Suresh-Nakkeran in #2789
- Upgrade mlserver version to 1.3.2 by @sivanantha321 in #2910
- TorchServe 0.8.0 for LLM support by @sivanantha321 in #2980
- Bump triton server version to 23.05-py3 by @sivanantha321 in #2992
Advanced Inference
- Allowing setting minReplicas for Inference Graph router. by @rachitchauhan43 in #2679
- Adding pod affinity and resource requirements to IG Spec by @rachitchauhan43 in #2711
- add json mimetype header for IG response by @krazik-intuit in #2877
Storage Provider
- Storage Initializer: Support virtual path style in S3 by @lizzzcai in #2887
- Support Direct VolumeMount for PVC by @lizzzcai in #2738
KServe Python SDK
kserve 0.11.0 now uses poetry for dependency management, cloud storage dependencies are now made optional and you can run pip install kserve[storage]
to install those dependencies.
- Make storage dependency as an optional dependency by @andyi2it in #2700
- Support parameters in InferInput and InferOutput by @Suresh-Nakkeran in #2699
- Dependency resolver using poetry by @andyi2it in #2602
- Allow to override the UvicornServer's default log config by @elukey in #2782
- Move cloud event decode logic from preprocess to decode method by @sivanantha321 in #2881
- Allow using SSL between transformer and predictor/explainer by @greenmoon55 in #2911
- Make postprocess interface consistent with V2 protocol by @cmaddalozzo in #2876
- Add a KServe module level logger by @cmaddalozzo in #2884
- Log exception stack for python runtimes by @yuzisun in #2939
- Checked model readiness before attempting inference. by @andyi2it in #2917
- Load Kubeconfig from python dict by @ShreehariVaasishta in #2924
- Added response id based on request id. by @andyi2it in #3020
⚠️ What's Changed
- Remove "default" suffix from generated component name - updated by @Suresh-Nakkeran in #2508
- Remove(AIX explainer): Remove AIX explainer API and SDK by @Tomcli in #2826
- Fix raw deployment status.address.url displays wrong url by @sivanantha321 in #2830
- Make status.address.url consistent across installations by @sivanantha321 in #2875
- Added support to accept the request body/payload in any format (not just json) by @SatishBethi in #2524
⚠️ Now, you are required to setcontent-type
header toapplication/json
for the server to recognize and decode thejson
type payload.
🐛 What's Fixed
- Loosen protobuf and numpy dependency by @andyi2it in #2673
- Fix dependency issue and remove pinned pip version in xgbserver by @dependabot in #2689
- Fixing the dockerfile to run as non-root user by @chirag-orbittec in #2687
- Add check for adding gpu tag suffix when image field is specified by @yuzisun in #2709
- Use fork for multiprocessing mode by @yuzisun in #2718
- Adjust order of types to default to float,int by @pascalwhoop in #2754
- Fix trained model ready status by @andyi2it in #2774
- Fix missing ingress config options in helm chart by @andyi2it in #2772
- model_server.py: fix documentation for enable_latency_logging by @elukey in #2777
- Handle file exist scenario for local storage by @sivanantha321 in #2794
- Loosen tritonclient and azure storage blob dependencies by @sivanantha321 in #2815
- Add RBAC for inferencegraphs/finalizers by @ReToCode in #2839
- Do not create GRPCServer when grpc is not enabled by @greenmoon55 in #2878
- Make Pod mutator idempotent to support fluid by @hclchimumu in #2896
- Fix incorrect log message in transformer reconciliation by @leecs0503 in #2968
- Reconcile when rollout duration is changed by @henrysecond1 in #2916
- Fix status.address.url for networking layer other than istio by @sivanantha321 in #2908
- Fix IngressReady condition by @tessapham in #2977
- Use standard socket for single process by @yuzisun in #3000
- Remove distutils from Python SDK by @xfu83 in #3010
- Fix raw deployment service port by @sivanantha321 in #2967
- Allow passthrough of InferRequest id by @markgeejw in #2945
- Fix: if there is a symbol '?' in the path, force url. query as the mi… by @Wercurial in #3014
⬆️ Version Upgrade
- Bump torch from 1.13.0 to 1.13.1 in /python/aixexplainer by @dependabot in #2802
- Add support for k8s 1.26 by @sivanantha321 in #2835
- Go 1.20 upgrade by @sivanantha321 in #2914
- Bump the Go version to 1.20 for the builder image by @skonto in #2860
- Update the OpenShift guide to version 4.12 and use OpenShift Serverless by @ReToCode in #2855
- Added support for python 3.10 and removed python 3.7 references by @andyi2it in #2832
- Poetry plugin should update pyproject.toml by @andyi2it in #2899
- Python 3.11 support by @andyi2it in #2933
- Add python 3.11 support for alibi explainer by @sivanantha321 in #3006
🔨 Project SDLC
- Fix running out of disk space in e2e by @andyi2it in #2765
- Updating Knative Serving and Istio to their latest version by @matzew in #2697
- Fix formatting and controller tests by @yuzisun in #2783
- Parametrized docker builds by @peterableda in #2666
- Fix minimum k8s version in quick install by @sivanantha321 in #2791
- Make deployment scheduling behavior configurable by @ddelange in #2627
- Upgrade kustomization.yaml to support kustomize 5.0 by @sivanantha321 in #2841
- Extract modelmesh part in helm chart by @hhk7734 in #2704
- chown python virtual env path to runtime user by @ReToCode in #2845
- Use buildkit for building docker images by @sivanantha321 in #2848
- Implement dynamic versioning using poetry plugin by @andyi2it in #2869
- Break down the Serving and net-istio artifact downloads into their own release trains by @matzew in #2890
- Add poetry lockfile consistency check to CI environment by @sivanantha321 in #2905
- Upgrade controller gen version to 0.12.0 by @sivanantha321 in #2912
- Fix kourier e2e test by @sivanantha321 in #2991
- chore: Free up disk space on E2E test GHA runner node by @ckadner in #2972
- Fix e2e workflow syntax typo that broke e2e tests by @andyi2it in #2998
- Enable Triton e2e test by @sivanantha321 in #3004
- Added manual workflow trigger for release branch. by @andyi2it in https://gith...
v0.11.0-rc1
What's Changed
- Replace unmaintained satori/go.uuid package by @sivanantha321 in #2932
- Log exception stack for python runtimes by @yuzisun in #2939
- Upgrade controller gen version to 0.12.0 by @sivanantha321 in #2912
- Checked model readiness before attempting inference. by @andyi2it in #2917
- moviesentiment storageUri updated. by @andyi2it in #2952
- Python 3.11 support by @andyi2it in #2933
- Update alibi explainer storage uri by @sivanantha321 in #2966
- Updated storageUri for paddle example and e2e test. by @andyi2it in #2956
- Decode avro cloud event by @sivanantha321 in #2929
- Fix incorrect log message in transformer reconciliation by @leecs0503 in #2968
- Reconcile when rollout duration is changed by @henrysecond1 in #2916
- Support setting labels and annotations on the component level by @lizzzcai in #2925
- Load Kubeconfig from python dict by @ShreehariVaasishta in #2924
- Fix status.address.url for networking layer other than istio by @sivanantha321 in #2908
- Fix kourier e2e test by @sivanantha321 in #2991
- fix IngressReady condition by @tessapham in #2977
- chore: Free up disk space on E2E test GHA runner node by @ckadner in #2972
- Fix e2e workflow syntax typo that broke e2e tests by @andyi2it in #2998
- Use standard socket for single process by @yuzisun in #3000
- Bump torchserve version to 0.8.0 by @sivanantha321 in #2980
- Fix params while loading kube config from dict by @ShreehariVaasishta in #3005
- Add python 3.11 support for alibi explainer by @sivanantha321 in #3006
- Enable Triton e2e test by @sivanantha321 in #3004
- Bump triton server version to 23.05-py3 by @sivanantha321 in #2992
- Update ModelMesh version to v0.11.0-rc0 by @rafvasq in #2969
- Fix a typo docs/samples/metrics-and-monitoring/README.md by @fj-ochiai in #3017
- Remove distutils from Python SDK by @xfu83 in #3010
- Fix raw deployment service port by @sivanantha321 in #2967
- Allow passthrough of InferRequest id by @markgeejw in #2945
- Add configmap docs by @sivanantha321 in #3016
- add ServiceReady status conditions by @tessapham in #3008
- Updated onnx-example. by @andyi2it in #3001
- Fixes for Integrating KServe with Openshift by @skonto in #2853
- Added manual workflow trigger for release branch. by @andyi2it in #3002
- fix: if there is a symbol '?' in the path, force url. query as the mi… by @Wercurial in #3014
- Added response id based on request id. by @andyi2it in #3020
- put back PredictorReady in living condition set by @tessapham in #3026
- add documentation about storage-initializer userid for istio-cni by @ReToCode in #3028
- publish RC1 version for v0.11.0 by @tessapham in #3027
New Contributors
- @leecs0503 made their first contribution in #2968
- @henrysecond1 made their first contribution in #2916
- @ShreehariVaasishta made their first contribution in #2924
- @tessapham made their first contribution in #2977
- @fj-ochiai made their first contribution in #3017
- @xfu83 made their first contribution in #3010
- @markgeejw made their first contribution in #2945
- @Wercurial made their first contribution in #3014
Full Changelog: v0.11.0-rc0...v0.11.0-rc1