Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User text vector search #8

Open
3 tasks
Spartee opened this issue Aug 8, 2022 · 0 comments
Open
3 tasks

User text vector search #8

Spartee opened this issue Aug 8, 2022 · 0 comments
Labels
enhancement New feature or request

Comments

@Spartee
Copy link
Contributor

Spartee commented Aug 8, 2022

Description

In addition to the current full text search capability, we should be able to offer a natual language based vector search that a user can use to find products.

This is already largely implemented, but is turned off due to the fact that performance is not as good as it should be.

Related Code

Backend -> routes.py API route

@r.post("/vectorsearch/text/user",
       response_model=t.List[Product],
       name="product:find_similar_by_user_text",
       operation_id="compute_user_text_similarity")
async def find_products_by_user_text(similarity_request: UserTextSimilarityRequest) -> t.List[Product]:
    q = create_query(similarity_request.search_type,
                    similarity_request.number_of_results,
                    vector_field_name="text_vector",
                    gender=similarity_request.gender,
                    category=similarity_request.category)

    redis_client = await Redis(host=config.REDIS_HOST, port=config.REDIS_PORT, db=0)

    # obtain vector from text model in top level  __init__.py
    vector = TEXT_MODEL.encode(similarity_request.user_text)
    # obtain results of the query
    results = await redis_client.ft().search(q, query_params={"vec_param": vector.tobytes()})

    # Get Product records of those results
    similar_product_pks = [p.product_pk for p in results.docs]
    similar_products = [await Product.get(pk) for pk in similar_product_pks]
    return similar_products

Backend -> Pydantic Schema for API route

class UserTextSimilarityRequest(BaseModel):
    user_text: str
    number_of_results: int = 15
    search_type: str = "KNN"
    gender: str = ""
    category: str = ""

The huggingface model is held as a global variable within the top-level __init__.py. This could probably be improved.

Frontend -> Header.tsx currently commented out.

               <Button
              onClick={() => queryProductsByUserText()}
              variant="outline-success"
              disabled={searchText.length < 1}>
                Vector Search
              </Button>

Frontend JS to call backend -> api.ts

export const getSemanticallySimilarProductsbyText = async (text: string,
                                                    gender="",
                                                    category="",
                                                    search='KNN',
                                                    limit=15,
                                                    skip=0) => {
      let body = {
      user_text: text,
      search_type: search,
      number_of_results: limit,
      gender: gender,
      category: category
      }

    const url = MASTER_URL + "vectorsearch/text/user";
    return fetchFromBackend(url, 'POST', body);
};

TODO

  • investigate performance of user text vector search
    - This will largely be in the data prep stage. We currently use the description as the text vector data. This is manually cleaned with regex and probably not optimal.
  • Once performance is acceptable, enable the vector search in addition to text search
  • make sure the buttons look right on the front end.
@Spartee Spartee added the enhancement New feature or request label Aug 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant