🦺 Validator util for hf/inference #385

coyotte508 · 2023-11-30T12:47:23Z

cc @julien-c , also see discussion in #384

Standardize output validation with an util (also fix typing in some error messages, and typing of featureExtraction)

Inspired from zod but much more lightweight since only for our usecase.

Later can be moved to ../shared (or exported) to be reused in @huggingface/widgets if we want to validate inputs/outputs. And we can output types in a format we want and dispatch it if needed cc @Wauplin (like in huggingface/api-inference-community#355)

packages/inference/src/tasks/nlp/featureExtraction.ts

Wauplin · 2023-11-30T13:31:28Z

And we can output types in a format we want and dispatch it if needed cc @Wauplin (like in huggingface/api-inference-community#355)

Yes pretty interested in having a unified way for this! At the moment only TGI output values (so text inference task) are validated by the Python client.

vvmnnnkv · 2023-12-01T12:11:37Z

@coyotte508 @Wauplin Just wondering - have you considered having a single schema for inference API, and then code-gen types/validation from there for consumption in python/js/etc? E.g. OpenAPI seems to fit

packages/inference/src/lib/validateOutput.ts

coyotte508 · 2023-12-01T13:51:35Z

@coyotte508 @Wauplin Just wondering - have you considered having a single schema for inference API, and then code-gen types/validation from there for consumption in python/js/etc? E.g. OpenAPI seems to fit

Depends on where the reference for the types is / who maintains it. I can easily convert the validated types to something like ajv schemas to export (or whatever @Wauplin would prefer), but I sure don't want to write the ajv schemas myself as source of truth - syntax like zod's is much more user-friendly.

packages/inference/src/lib/validateOutput.spec.ts

Co-authored-by: Mishig <mishig.davaadorj@coloradocollege.edu>

Wauplin · 2023-12-01T14:09:21Z

Just wondering - have you considered having a single schema for inference API, and then code-gen types/validation from there for consumption in python/js/etc? E.g. OpenAPI seems to fit

Unfortunately I think the main bottleneck will be the maintenance of this source of truth schema. InferenceAPI / InferenceEndpoints / TEI / TGI / etc. do not share the same backend so having a "client to rule them all" also means often breaking the schema to adapt to different cases. Thinking twice about it, I think we should aim for a more unified backend and only then have a generic "source of truth schema" maintained with it.

vvmnnnkv · 2023-12-01T17:44:22Z

Depends on where the reference for the types is / who maintains it.

Input/output types schema and API spec seem to better be kept and maintained together with the server API code. Even possible to generate schema from the server API code itself, although it might be more beneficial to make it language neutral.

osanseviero · 2023-12-01T20:25:18Z

Hey all! It might be good idea to sync about this topic as we're discussing with @julien-c @gary149 and @SBrandeis the unification of input/output expectations per library/task pair as well as exposing this through a language-agnostic format that would allow us to do things such as validation (in widgets or libraries) among other exciting things.

julien-c · 2023-12-01T20:26:40Z

think we should aim for a more unified backend and only then have a generic "source of truth schema" maintained with it.

yes I agree with this.

I was suggesting JSONSchema and @SBrandeis was suggesting OpenAPI, but i haven't dived too much into each one's pros and cons

And IMO it should live in @huggingface/tasks (in each task's folder, as it needs to be defined task by task) or in a dedicated repo

vvmnnnkv · 2023-12-01T20:45:43Z

I was suggesting JSONSchema and @SBrandeis was suggesting OpenAPI, but i haven't dived too much into each one's pros and cons

And IMO it should live in @huggingface/tasks (in each task's folder, as it needs to be defined task by task) or in a dedicated repo

OpenAPI allows to specify data types using jsonschema, and more - urls, headers, error codes, etc.

The nice thing about openapi is that the spec can be auto-generated from some frameworks (FastAPI), making API code itself the single source of truth. It's possible to generate client code from openapi spec as well.

mishig25 · 2023-12-05T10:53:22Z

So how does this plan sound? (compiling all the suggestions above + did some research)

Create OpenAPI for every task in their respective @huggingface/tasks folders (the schema can be either in yml or json). In this current use-case, I think OpenAPI is preferrable over JSONSchema. (As @vvmnnnkv mentioned above, OpenAPI is designed for RESTful APIs (supports urls, headers, error codes, query params) and our use case is "typing" hf inference API. On the contrary, JSONSchema is more generic - for validating any JSON obj).
Use this PR validation function to validate if an object fits the schema, instead of adding new dep like ajv.
Create a util that publishes the @huggingface/tasks schemas on pypi so that it can be useed by huggingface_hub as well
In api-inference (which is a private repo), create a util that submits PRs to @huggingface/tasks schemas (to makre sure the API server & schema are always in-sync)

wdyt?

SBrandeis · 2023-12-05T11:15:21Z

Create OpenAPI for every task in their respective @huggingface/tasks folders (the schema can be either in yml or json).

json is probably a better format (link).

In this current use-case, I think OpenAPI is preferrable over JSONSchema.

Can OpenAPI be used to spec non-HTTP APIs as well?
I think one of our goals here is to standardize all inference APIs, including from code (aka pipelines)

In api-inference (which is a private repo), create a util that submits PRs to @huggingface/tasks schemas (to makre sure the API server & schema are always in-sync)

No big deal, but I think it is saner to ensure @huggingface/tasks is the only source of truth and to have the dependency "flow" the other way - meaning the task schema must be updated first, then the Inference API (or other consumers) adapt their code to conform to it.

Automated PRs from the Inference API to the tasks package would defeat this flow, in my opinion.

Wauplin · 2023-12-05T11:32:13Z

I like the idea of sanitizing tasks as the source of truth and then adapt both servers and clients APIs. I'm just a bit worried if we will succeed in uniforming things. Typically a new example I've learned this morning (cc @datavistics) for the feature-extraction task is that:

text-embedding-inference supports {"inputs": text, "truncate": true, ...} as payload
while InferenceAPI powered by transformers/sentence-transformers supports {"inputs": text, "parameters": {"truncation": "only_first"}}
and I don't know if Inference Endpoints backend is able to truncate input on demand
(related to 1877 add normalize and truncate huggingface_hub#1885).

Having a standard is nice for users and clients. But comes to a higher maintenance cost and less flexibility to implement new stuff server-side.

julien-c · 2023-12-05T12:50:41Z

Create a util that publishes the @huggingface/tasks schemas on pypi so that it can be useed by huggingface_hub as well

IMO we don't need this intermediary step, i.e. a GH Action can run on @huggingface/tasks updates and open a PR to generate code in huggingface_hub (the GH Action can live on either repo, no strong preference a priori)

mishig25 · 2023-12-05T12:51:06Z

I like the idea of sanitizing tasks as the source of truth and then adapt both servers and clients APIs. I'm just a bit worried if we will succeed in uniforming things. Typically a new example I've learned this morning (cc @datavistics) for the feature-extraction task is that:

Maybe having a unified source of truth will help avoid this problem (i.e. {"inputs": text, "truncate": true, ...} VS {"inputs": text, "parameters": {"truncation": "only_first"}})

julien-c · 2023-12-05T12:52:29Z

yes exactly that's the purpose of having a schema to enforce uniformization going forward. With @gary149 @osanseviero and Simon we'll share a doc about API unification later this week

mishig25 · 2023-12-05T12:54:02Z

we'll share a doc about API unification later this week

I can work on putting the schemas in @huggingface/tasks. Let me know when I should do it

julien-c · 2023-12-05T12:57:56Z

I can work on putting the schemas in @huggingface/tasks. Let me know when I should do it

Let's synchronize with @SBrandeis and @Wauplin (no rush, it's a quite important change, so let's take enough time to do it)

coyotte508 · 2023-12-05T13:28:01Z

Sounds exciting ^^

Marking this PR as draft in the mean time

vvmnnnkv · 2023-12-05T15:33:38Z

Can OpenAPI be used to spec non-HTTP APIs as well?
I think one of our goals here is to standardize all inference APIs, including from code (aka pipelines)

@SBrandeis Could you provide examples of other APIs? OpenAPI is mostly for restful HTTP APIs, e.g. it can't spec websocket API. However, the json schema part that describes data types can probably be put in separate JSON schema files and then reused in other specifications, such as AsyncAPI that do support websocket

julien-c · 2024-01-16T16:54:09Z

status update: @SBrandeis @Wauplin are currently spec'ing a set of input/output schemas for all HF tasks, leveraging https://jsontypedef.com/

We'll implement those schemas in huggingface_hub (Python) first but then it should be reasonably easy to support them here too

🦺 Validator util for hf/inference

3750d58

coyotte508 requested a review from mishig25 November 30, 2023 12:47

🔥 Remove z.boolean() for now

270579f

coyotte508 commented Nov 30, 2023

View reviewed changes

packages/inference/src/tasks/nlp/featureExtraction.ts Outdated Show resolved Hide resolved

🐛 Fix error messages

14cde28

coyotte508 added 3 commits November 30, 2023 14:44

✅ Use correct method in test

6dd561a

🐛 Fix documentQuestionAnswering

5ce3be7

🏷️ Fix featureExtraction return type

963b834

coyotte508 marked this pull request as ready for review November 30, 2023 13:51

coyotte508 requested review from vvmnnnkv and radames as code owners November 30, 2023 13:51

mishig25 reviewed Dec 1, 2023

View reviewed changes

packages/inference/src/lib/validateOutput.ts Outdated Show resolved Hide resolved

mishig25 reviewed Dec 1, 2023

View reviewed changes

packages/inference/src/lib/validateOutput.spec.ts Outdated Show resolved Hide resolved

Update packages/inference/src/lib/validateOutput.spec.ts

f06b7e6

Co-authored-by: Mishig <mishig.davaadorj@coloradocollege.edu>

♻️ Add helpers for validateOutput

adf2e55

coyotte508 requested a review from mishig25 December 1, 2023 14:17

coyotte508 marked this pull request as draft December 5, 2023 13:28

coyotte508 mentioned this pull request Dec 7, 2023

Use @huggingface/inference inside @huggingface/widgets #360

Open

coyotte508 force-pushed the main branch from 7c653d5 to 0f29277 Compare February 6, 2024 14:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🦺 Validator util for hf/inference #385

🦺 Validator util for hf/inference #385

coyotte508 commented Nov 30, 2023 •

edited

Wauplin commented Nov 30, 2023 •

edited

vvmnnnkv commented Dec 1, 2023

coyotte508 commented Dec 1, 2023 •

edited

Wauplin commented Dec 1, 2023

vvmnnnkv commented Dec 1, 2023

osanseviero commented Dec 1, 2023

julien-c commented Dec 1, 2023

vvmnnnkv commented Dec 1, 2023

mishig25 commented Dec 5, 2023 •

edited

SBrandeis commented Dec 5, 2023 •

edited

Wauplin commented Dec 5, 2023

julien-c commented Dec 5, 2023

mishig25 commented Dec 5, 2023 •

edited

julien-c commented Dec 5, 2023

mishig25 commented Dec 5, 2023

julien-c commented Dec 5, 2023

coyotte508 commented Dec 5, 2023

vvmnnnkv commented Dec 5, 2023

julien-c commented Jan 16, 2024

🦺 Validator util for hf/inference #385

Are you sure you want to change the base?

🦺 Validator util for hf/inference #385

Conversation

coyotte508 commented Nov 30, 2023 • edited

Wauplin commented Nov 30, 2023 • edited

vvmnnnkv commented Dec 1, 2023

coyotte508 commented Dec 1, 2023 • edited

Wauplin commented Dec 1, 2023

vvmnnnkv commented Dec 1, 2023

osanseviero commented Dec 1, 2023

julien-c commented Dec 1, 2023

vvmnnnkv commented Dec 1, 2023

mishig25 commented Dec 5, 2023 • edited

SBrandeis commented Dec 5, 2023 • edited

Wauplin commented Dec 5, 2023

julien-c commented Dec 5, 2023

mishig25 commented Dec 5, 2023 • edited

julien-c commented Dec 5, 2023

mishig25 commented Dec 5, 2023

julien-c commented Dec 5, 2023

coyotte508 commented Dec 5, 2023

vvmnnnkv commented Dec 5, 2023

julien-c commented Jan 16, 2024

coyotte508 commented Nov 30, 2023 •

edited

Wauplin commented Nov 30, 2023 •

edited

coyotte508 commented Dec 1, 2023 •

edited

mishig25 commented Dec 5, 2023 •

edited

SBrandeis commented Dec 5, 2023 •

edited

mishig25 commented Dec 5, 2023 •

edited