Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add embedding instruction prompt support for to hf-embedder #30967

Open
jobergum opened this issue Apr 18, 2024 · 1 comment
Open

Add embedding instruction prompt support for to hf-embedder #30967

jobergum opened this issue Apr 18, 2024 · 1 comment
Assignees
Milestone

Comments

@jobergum
Copy link
Member

It's becoming the norm to have prompt prefixes for text embedding models. I think we should add this to the hf-embedder.

<component id="snow" type="hugging-face-embedder">
        <transformer-model url="https://huggingface.co/Snowflake/snowflake-arctic-embed-l/resolve/main/onnx/model_int8.onnx"/>
        <tokenizer-model url="https://huggingface.co/Snowflake/snowflake-arctic-embed-l/raw/main/tokenizer.json"/>
        <normalize>true</normalize>
        <pooling-strategy>cls</pooling-strategy>
        <instruction-prompt>
           <query>Represent this sentence for searching relevant passages:</query>
           <document>passage:</document>
       </instruction-prompt>
</component>

The embedder would then prepend the input with these instructions depending on the context (query or indexing)

@jobergum jobergum self-assigned this Apr 24, 2024
@jobergum jobergum added this to the soon milestone Apr 24, 2024
@jobergum
Copy link
Member Author

Alternatives

Alternative 1

<instruction-prompts>
           <query>Represent this sentence for searching relevant passages:</query>
           <document>passage:</document>
</instruction-prompt>

Alternative 2

<prefixes>
           <query>Represent this sentence for searching relevant passages:</query>
           <document>passage:</document>
</prefixes>

Alternative 3

<prepend query="Represent this sentence for searching relevant passages:" document="passage:"/>

Alternative 4

<prepend>
           <query>Represent this sentence for searching relevant passages:</query>
           <document>passage:</document>
</prepend>

Since normalize is a verb I think that prepend is a good alternative, so I'm voting for alternative 4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants