Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HfAPI().create_inference_endpoint errors and does not follow documentation #2277

Open
SimKennedy opened this issue May 10, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@SimKennedy
Copy link

SimKennedy commented May 10, 2024

Describe the bug

Using hf_api.create_inference_endpoint with configuration in documentation raises error.

https://huggingface.co/docs/huggingface_hub/en/package_reference/hf_api#huggingface_hub.HfApi.create_inference_endpoint.task

Instance types appear different in the available vendors too:
https://api.endpoints.huggingface.cloud/v2/provider
https://huggingface.co/docs/inference-endpoints/en/pricing

Terminology in vendor list does not match the API:

Eg. instanceSize "small" != "x1"

Reproduction

from huggingface_hub import HfApi
api = HfApi()
create_inference_endpoint(
    "my-endpoint-name",
    repository="gpt2",
    framework="pytorch",
    task="text-generation",
    accelerator="cpu",
    vendor="aws",
    region="us-east-1",
    type="protected",
    instance_size="medium",
    instance_type="c6i",
)

Bad request:
400: Instance compute 'Cpu' - 'c6i' - 'medium' in 'aws' - 'us-east-1' not found

Logs

Traceback (most recent call last):
  File "/home/sim/.cache/pypoetry/virtualenvs/env-Nnk0OfKl-py3.10/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 358, in hf_raise_for_status
    raise BadRequestError(message, response=response) from e
huggingface_hub.utils._errors.BadRequestError:  (Request ID: cX-WLr)

Bad request:
400: Instance compute 'Cpu' - 'c6i' - 'medium' in 'aws' - 'us-east-1' not found

System info

- huggingface_hub version: 0.23.0
- Platform: Linux-6.8.0-76060800daily20240311-generic-x86_64-with-glibc2.35
- Python version: 3.10.13
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /home/sim/.cache/huggingface/token
- Has saved token ?: True
- Who am I ?: simkennedy
- Configured git credential helpers: 
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.2.1
- Jinja2: 3.1.3
- Graphviz: N/A
- keras: N/A
- Pydot: 2.0.0
- Pillow: 10.2.0
- hf_transfer: N/A
- gradio: N/A
- tensorboard: N/A
- numpy: 1.23.5
- pydantic: 2.6.4
- aiohttp: 3.9.3
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /home/sim/.cache/huggingface/hub
- HF_ASSETS_CACHE: /home/sim/.cache/huggingface/assets
- HF_TOKEN_PATH: /home/sim/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10
@SimKennedy SimKennedy added the bug Something isn't working label May 10, 2024
@SimKennedy SimKennedy changed the title HfAPI().create_inference_endpoint errors and does not fit documentation HfAPI().create_inference_endpoint errors and does not follow documentation May 10, 2024
@juliensimon
Copy link
Contributor

juliensimon commented May 16, 2024

I see this too with 0.23.0. Copying and pasting the example in https://huggingface.co/blog/tgi-messages-api doesn't work.

Bad request:
400: Instance compute 'Gpu' - 'p4de' - '2xlarge' in 'aws' - 'us-east-1' not found

I tried GCP too, same result. I used the parameters for the curl call in the IE page.

curl https://api.endpoints.huggingface.cloud/v2/endpoint/juliensimon \
-X POST \
-d '{"compute":{"accelerator":"gpu","instanceSize":"x4","instanceType":"nvidia-l4","scaling":{"maxReplica":1,"minReplica":1}},"model":{"framework":"pytorch","image":{"custom":{"health_route":"/health","env":{"MAX_BATCH_PREFILL_TOKENS":"2048","MAX_INPUT_LENGTH":"1024","MAX_TOTAL_TOKENS":"1512","MODEL_ID":"/repository"},"url":"ghcr.io/huggingface/text-generation-inference:2.0.2"}},"repository":"meta-llama/Meta-Llama-3-8B-Instruct","task":"text-generation"},"name":"meta-llama-3-8b-instruct-plc","provider":{"region":"us-east4","vendor":"gcp"},"type":"protected"}' \
-H "Content-Type: application/json" \
-H "Authorization: Bearer XXXXX"
endpoint = create_inference_endpoint(
    name="llama3-8b-julien-demo",
    repository="meta-llama/Meta-Llama-3-8B-Instruct",
    framework="pytorch",
    task="text-generation",
    accelerator="gpu",
    vendor="google",
    region="us-east4",
    type="protected",
    instance_type="nvidia-l4",
    instance_size="x4",
    custom_image={
        "health_route": "/health",
        "env": {
            "MAX_INPUT_LENGTH": "1024",
            "MAX_BATCH_PREFILL_TOKENS": "2048",
            "MAX_TOTAL_TOKENS": "1512",
            "MODEL_ID": "/repository",
        },
        "url": "ghcr.io/huggingface/text-generation-inference:2.0.2", # use this build or newer
    },
)

Bad request:
400: Instance compute 'Gpu' - 'l4' - 'x4' in 'google' - 'us-east4' not found

@philschmid any idea?

@philschmid
Copy link
Member

The naming was adjust. pinging @co42 here.

@philschmid
Copy link
Member

The naming here should be correct: https://huggingface.co/docs/inference-endpoints/pricing

can you try intel-icl and x4?

@juliensimon
Copy link
Contributor

The Google example works. My mistake was the vendor name: "gcp", not "google" :)

The blog post example works when changed to:

    instance_type="nvidia-a100",
    instance_size="x2",

@juliensimon
Copy link
Contributor

huggingface/blog#2073

@Wauplin
Copy link
Contributor

Wauplin commented May 22, 2024

Thanks everyone for reporting/fixing this! Just to be sure, is there still something to fix on huggingface_hub's doc side or all good now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants