HfAPI().create_inference_endpoint errors and does not follow documentation #2277

SimKennedy · 2024-05-10T08:04:03Z

Describe the bug

Using hf_api.create_inference_endpoint with configuration in documentation raises error.

https://huggingface.co/docs/huggingface_hub/en/package_reference/hf_api#huggingface_hub.HfApi.create_inference_endpoint.task

Instance types appear different in the available vendors too:
https://api.endpoints.huggingface.cloud/v2/provider
https://huggingface.co/docs/inference-endpoints/en/pricing

Terminology in vendor list does not match the API:

Eg. instanceSize "small" != "x1"

Reproduction

from huggingface_hub import HfApi
api = HfApi()
create_inference_endpoint(
    "my-endpoint-name",
    repository="gpt2",
    framework="pytorch",
    task="text-generation",
    accelerator="cpu",
    vendor="aws",
    region="us-east-1",
    type="protected",
    instance_size="medium",
    instance_type="c6i",
)

Bad request:
400: Instance compute 'Cpu' - 'c6i' - 'medium' in 'aws' - 'us-east-1' not found

Logs

Traceback (most recent call last):
  File "/home/sim/.cache/pypoetry/virtualenvs/env-Nnk0OfKl-py3.10/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 358, in hf_raise_for_status
    raise BadRequestError(message, response=response) from e
huggingface_hub.utils._errors.BadRequestError:  (Request ID: cX-WLr)

Bad request:
400: Instance compute 'Cpu' - 'c6i' - 'medium' in 'aws' - 'us-east-1' not found

System info

- huggingface_hub version: 0.23.0
- Platform: Linux-6.8.0-76060800daily20240311-generic-x86_64-with-glibc2.35
- Python version: 3.10.13
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /home/sim/.cache/huggingface/token
- Has saved token ?: True
- Who am I ?: simkennedy
- Configured git credential helpers: 
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.2.1
- Jinja2: 3.1.3
- Graphviz: N/A
- keras: N/A
- Pydot: 2.0.0
- Pillow: 10.2.0
- hf_transfer: N/A
- gradio: N/A
- tensorboard: N/A
- numpy: 1.23.5
- pydantic: 2.6.4
- aiohttp: 3.9.3
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /home/sim/.cache/huggingface/hub
- HF_ASSETS_CACHE: /home/sim/.cache/huggingface/assets
- HF_TOKEN_PATH: /home/sim/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10

The text was updated successfully, but these errors were encountered:

juliensimon · 2024-05-16T08:51:40Z

I see this too with 0.23.0. Copying and pasting the example in https://huggingface.co/blog/tgi-messages-api doesn't work.

Bad request:
400: Instance compute 'Gpu' - 'p4de' - '2xlarge' in 'aws' - 'us-east-1' not found

I tried GCP too, same result. I used the parameters for the curl call in the IE page.

curl https://api.endpoints.huggingface.cloud/v2/endpoint/juliensimon \
-X POST \
-d '{"compute":{"accelerator":"gpu","instanceSize":"x4","instanceType":"nvidia-l4","scaling":{"maxReplica":1,"minReplica":1}},"model":{"framework":"pytorch","image":{"custom":{"health_route":"/health","env":{"MAX_BATCH_PREFILL_TOKENS":"2048","MAX_INPUT_LENGTH":"1024","MAX_TOTAL_TOKENS":"1512","MODEL_ID":"/repository"},"url":"ghcr.io/huggingface/text-generation-inference:2.0.2"}},"repository":"meta-llama/Meta-Llama-3-8B-Instruct","task":"text-generation"},"name":"meta-llama-3-8b-instruct-plc","provider":{"region":"us-east4","vendor":"gcp"},"type":"protected"}' \
-H "Content-Type: application/json" \
-H "Authorization: Bearer XXXXX"

endpoint = create_inference_endpoint(
    name="llama3-8b-julien-demo",
    repository="meta-llama/Meta-Llama-3-8B-Instruct",
    framework="pytorch",
    task="text-generation",
    accelerator="gpu",
    vendor="google",
    region="us-east4",
    type="protected",
    instance_type="nvidia-l4",
    instance_size="x4",
    custom_image={
        "health_route": "/health",
        "env": {
            "MAX_INPUT_LENGTH": "1024",
            "MAX_BATCH_PREFILL_TOKENS": "2048",
            "MAX_TOTAL_TOKENS": "1512",
            "MODEL_ID": "/repository",
        },
        "url": "ghcr.io/huggingface/text-generation-inference:2.0.2", # use this build or newer
    },
)

Bad request:
400: Instance compute 'Gpu' - 'l4' - 'x4' in 'google' - 'us-east4' not found

@philschmid any idea?

philschmid · 2024-05-16T09:03:10Z

The naming was adjust. pinging @co42 here.

philschmid · 2024-05-16T09:08:32Z

The naming here should be correct: https://huggingface.co/docs/inference-endpoints/pricing

can you try intel-icl and x4?

juliensimon · 2024-05-16T09:32:47Z

The Google example works. My mistake was the vendor name: "gcp", not "google" :)

The blog post example works when changed to:

    instance_type="nvidia-a100",
    instance_size="x2",

juliensimon · 2024-05-16T09:35:00Z

huggingface/blog#2073

Wauplin · 2024-05-22T12:22:31Z

Thanks everyone for reporting/fixing this! Just to be sure, is there still something to fix on huggingface_hub's doc side or all good now?

SimKennedy added the bug Something isn't working label May 10, 2024

SimKennedy changed the title ~~HfAPI().create_inference_endpoint errors and does not fit documentation~~ HfAPI().create_inference_endpoint errors and does not follow documentation May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HfAPI().create_inference_endpoint errors and does not follow documentation #2277

HfAPI().create_inference_endpoint errors and does not follow documentation #2277

SimKennedy commented May 10, 2024 •

edited

juliensimon commented May 16, 2024 •

edited

philschmid commented May 16, 2024

philschmid commented May 16, 2024

juliensimon commented May 16, 2024

juliensimon commented May 16, 2024

Wauplin commented May 22, 2024

HfAPI().create_inference_endpoint errors and does not follow documentation #2277

HfAPI().create_inference_endpoint errors and does not follow documentation #2277

Comments

SimKennedy commented May 10, 2024 • edited

Describe the bug

Reproduction

Logs

System info

juliensimon commented May 16, 2024 • edited

philschmid commented May 16, 2024

philschmid commented May 16, 2024

juliensimon commented May 16, 2024

juliensimon commented May 16, 2024

Wauplin commented May 22, 2024

SimKennedy commented May 10, 2024 •

edited

juliensimon commented May 16, 2024 •

edited