New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with deploying via Hex-LLM, TPU serving solution built with XLA, which is being developed by Google Cloud. #2768
Comments
@KCFindstr: can you please take a look at this issue? Thanks. |
From the original notebook, the correct machine type is |
I had originally tried with |
Hi @kathyyu-google , would you please take a look at this hex-llm deployment failure? |
Based on the endpoint ID from the logs (projects/81995035742/locations/us-central1/endpoints/6941658909824253952), this endpoint was created in the
|
Expected Behavior
Model Deployed Successfully
Actual Behavior
I am getting this error -
INFO:google.cloud.aiplatform.models:Creating Endpoint
INFO:google.cloud.aiplatform.models:Create Endpoint backing LRO: projects/81995035742/locations/us-central1/endpoints/6941658909824253952/operations/4744238776585289728
Using model from: gs://19865_finetuned_models/gemma-keras-lora-train_20240308_200536
INFO:google.cloud.aiplatform.models:Endpoint created. Resource name: projects/81995035742/locations/us-central1/endpoints/6941658909824253952
INFO:google.cloud.aiplatform.models:To use this Endpoint in another session:
INFO:google.cloud.aiplatform.models:endpoint = aiplatform.Endpoint('projects/81995035742/locations/us-central1/endpoints/6941658909824253952')
INFO:google.cloud.aiplatform.models:Creating Model
INFO:google.cloud.aiplatform.models:Create Model backing LRO: projects/81995035742/locations/us-central1/models/6359646723312189440/operations/7818789947195785216
INFO:google.cloud.aiplatform.models:Model created. Resource name: projects/81995035742/locations/us-central1/models/6359646723312189440@1
INFO:google.cloud.aiplatform.models:To use this Model in another session:
INFO:google.cloud.aiplatform.models:model = aiplatform.Model('projects/81995035742/locations/us-central1/models/6359646723312189440@1')
INFO:google.cloud.aiplatform.models:Deploying model to Endpoint : projects/81995035742/locations/us-central1/endpoints/6941658909824253952
InactiveRpcError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
71 try:
---> 72 return callable(*args, **kwargs)
73 except grpc.RpcError as exc:
11 frames
_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "Machine type "ct4p-hightpu-4t" is not supported."
debug_error_string = "UNKNOWN:Error received from peer ipv4:173.194.196.95:443 {created_time:"2024-03-08T21:11:42.821027279+00:00", grpc_status:3, grpc_message:"Machine type "ct4p-hightpu-4t" is not supported."}"
The above exception was the direct cause of the following exception:
InvalidArgument Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
72 return callable_(*args, **kwargs)
73 except grpc.RpcError as exc:
---> 74 raise exceptions.from_grpc_error(exc) from exc
75
76 return error_remapped_callable
InvalidArgument: 400 Machine type "ct4p-hightpu-4t" is not supported.
Steps to Reproduce the Problem
Run this code - # @title Deploy
@markdown This section uploads the model to Model Registry and deploys it on the Endpoint. It takes 15 minutes to 1 hour to finish.
@markdown Hex-LLM is a High-Efficiency Large Language Model (LLM) TPU serving solution built with XLA, which is being developed by Google Cloud. This notebook uses TPU v5e machines. Click
Show code
to see more details.if LOAD_MODEL_FROM != "Kaggle":
print("Skipped: Expect to load model from Kaggle, got", LOAD_MODEL_FROM)
else:
if "2b" in KAGGLE_MODEL_ID:
# Sets ct5lp-hightpu-1t (1 TPU chip) to deploy Gemma 2B models.
machine_type = "ct5lp-hightpu-1t"
else:
# Sets ct5lp-hightpu-4t (4 TPU chips) to deploy Gemma 7B models.
machine_type = "ct4p-hightpu-4t"
Specifications
The text was updated successfully, but these errors were encountered: