Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using reticulate for a shiny app for semantic search with BERT from SentenceTransformers: Error in py_call_impl(callable, dots$args, dots$keywords) : NameError: name 'faiss' is not defined #1469

Open
alicesaunders opened this issue Aug 31, 2023 · 1 comment

Comments

@alicesaunders
Copy link

alicesaunders commented Aug 31, 2023

When running the code below I am repeatedly getting the following error:
Error in py_call_impl(callable, dots$args, dots$keywords) :
NameError: name 'faiss' is not defined

I have two scripts, app.R and pythonSemanticSearch.py. Some lines are commented out as they are alternative functions I have tried to use to see if it fixes the error but it has remained the same. My original code is using an index generated and saved by another script that is then being read in here (I will keep the code for this commented out but have replaced it with a different dataset and index generated within this code for reproducibility). A different dataset was used to generate this index and replaces the variable df in the example code below.

Here is the code from app.R:

library(shiny)
 library(reticulate)
 
 use_python("my_env/Scripts/python.exe")
 
 sentence_transformers <- reticulate::import("sentence_transformers")
 SentenceTransformer <- sentence_transformers$SentenceTransformer
 
 for (i in 1:10) {
     gc(full = TRUE)
     system("nvidia-smi | grep MiB | grep Default")
     #model <- SentenceTransformer("trained_model_ALL_300")
     model <- SentenceTransformer('multi-qa-distilbert-cos-v1')
 }
 
 faiss <- reticulate::import("faiss")
 datasets <- reticulate::import(datasets) 
 load_dataset <- datasets$load_dataset
 ds = load_dataset('crime_and_punish', split='train[:100]')
 ds_with_embeddings = ds.map(lambda example: {'embeddings': ctx_encoder(**ctx_tokenizer(example["line"], return_tensors="pt"))[0][0].numpy()})
 ds_with_embeddings.add_faiss_index(column='embeddings')
 
 #faiss <- reticulate::import("faiss")
 #read_index <- faiss$read_index
 #index_path <- "trained_model_index_ALL_300.index"
 #index = read_index(index_path)

 #py_run_string("from sentence_transformers import SentenceTransformer")
 python <- import("pythonSemanticSearch") #import python script
 #python <- py_run_file("pythonSemanticSearch.py", local = TRUE)
 python$import_libraries() #import libraries from python function 
 
 
 #load df (from which the index was generated and the resulting dataframe needs to be based on)
 df = load_dataset('crime_and_punish')
 
 # Define UI for application 
 ui <- fluidPage(
 
     # Application title
     titlePanel("Semantic Search App"),
 
     # Sidebar with a slider input for number of bins 
     sidebarLayout(
         sidebarPanel(
             textInput(input = "query", "Enter your query:", ""),
             actionButton(input = "search", "Search")
         ),
 
         # Show a plot of the generated distribution
         mainPanel(
            tableOutput("results"),
            downloadButton(input = "downloadCSV", "Download CSV")
         )
     )
 )

 # Define server logic 
 server <- function(input, output) {
     
     # import the python module within the server logic 
     python <- import("pythonSemanticSearch")
     
     # import model 
     sentence_transformers <- reticulate::import("sentence_transformers")
     SentenceTransformer <- sentence_transformers$SentenceTransformer
     
     faiss <- reticulate::import("faiss")
     index_path <- "trained_model_index_ALL_300.index"
     index = faiss$read_index(index_path)
 
     for (i in 1:10) {
         gc(full = TRUE)
         system("nvidia-smi | grep MiB | grep Default")
         model <- SentenceTransformer("trained_model_ALL_300")
     }
     # attach to cpu 
     #model <- python$load_model(model)
     
     # import index 
     index <- python$load_index('trained_model_ALL_300.index')
     
     # check query not null and encode using BERT
     query_vector <- reactive({
         query <- input$query
         if (!is.null(query) && nchar(query)>0) {
            python$encode_query(query, model) # .py fcn 
         }
     })
     
     # generate search results 
     results <- eventReactive(input$search, {
         if (!is.null(query_vector())) {
             query_embedding <- query_vector()
             python$vector_search(input$query, query_embedding, model, index, df, num_results=10)
         }
     })
     
     # output table 
     output$results <- renderTable({
         results()
     })
     
     # download csv 
     output$downloadCSV <- downloadHandler(
         filename = function() {
             "semantic_search_results.csv"
         },
         content = function(file) {
             write.csv(results(), file)
         }
     )
 }
 
 # Run the app
 shinyApp(ui, server)

and here is the code from pythonSemanticSearch.py:

python script for the app - functions to read in the model

def import_libraries():
    """
    Import the required libraries.
    """
    import pandas as pd
    import glob
    import numpy as np
    import torch
    import faiss
    from pathlib import Path
    import csv
    from sentence_transformers import SentenceTransformer
    from sentence_transformers import InputExample, losses, datasets
    from tqdm import tqdm


def load_model(model_name):
    """
    Load a SentenceTransformer model.

    Args:
        model_name (str): Name of the SentenceTransformer model.

    Returns:
        model: Loaded SentenceTransformer model.
    """
    #model = SentenceTransformer(model_name)

    # Check if GPU/CPU is available and use it
    if torch.cuda.is_available():
        model = model.to(torch.device("cuda"))
    print(model.device)

    return model

def load_index(index_path):
    """
    Load a FAISS index.

    Args:
        index_path (str): Path to the FAISS index file.

    Returns:
        index: Loaded FAISS index.
    """
    index = faiss.read_index(index_path)
    return index

def encode_query(query, model):
    """
    Encode a query using a SentenceTransformer model.

    Args:
        query (str): User query that should be more than a sentence long.
        model: Sentence-transformers model.

    Returns:
        vector (numpy.array): Encoded vector of the query.
    """
    vector = model.encode([query])
    return vector


def vector_search(query, vector, model, index, df, num_results=10):
    """
    Transform the search query to a vector using a BERT model and find similar vectors using FAISS.
    Create a pandas DataFrame with the search results.

    Args:
        query (str): User query that should be more than a sentence long.
        model_name (str): Name of the SentenceTransformer model.
        index_path (str): Path to the FAISS index file.
        df: DataFrame containing report information.
        num_results (int): Number of results to return.

    Returns:
        results_df: Pandas DataFrame containing the results.
    """
    
    D, I = index.search(np.array(vector).astype("float32"), k=num_results)
    
    def id2details(df, I, column):
        return [list(df[df.UniqueID == idx][column]) for idx in I[0]]
    
    title = id2details(df, I, 'docname')
    text = id2details(df, I, 'paratext')
    
    data = {
        'Title': [item[0] for item in title], 
        'Text': [item[0] for item in text],
        'Search query': query
    }
    
    results_df = pd.DataFrame(data)
    return results_df 

Here is the output from:
reticulate::py_config()

reticulate::py_config()
python: C:/Users/Alice Saunders/Documents/Semantic Search Reticulate/my_env/Scripts/python.exe
libpython: C:/Users/Alice Saunders/AppData/Local/Programs/Python/Python311/python311.dll
pythonhome: C:/Users/Alice Saunders/Documents/Semantic Search Reticulate/my_env
virtualenv: C:/Users/Alice Saunders/Documents/Semantic Search Reticulate/my_env/Scripts/activate_this.py
version: 3.11.5 (tags/v3.11.5:cce6ba9, Aug 24 2023, 14:38:34) [MSC v.1936 64 bit (AMD64)]
Architecture: 64bit
numpy: C:/Users/Alice Saunders/Documents/Semantic Search Reticulate/my_env/Lib/site-packages/numpy
numpy_version: 1.25.2
sentence_transformers:C:\Users\ALICES1\DOCUME1\SEMANT~1\my_env\Lib\site-packages\sentence_transformers

NOTE: Python version was forced by RETICULATE_PYTHON

here is the output from utils::SessionInfo()

R version 4.0.4 (2021-02-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] reticulate_1.24 shiny_1.6.0

loaded via a namespace (and not attached):
[1] Rcpp_1.0.8.3 rstudioapi_0.13 magrittr_2.0.1 rappdirs_0.3.3 xtable_1.8-4
[6] lattice_0.20-41 R6_2.5.0 rlang_0.4.10 fastmap_1.1.0 tools_4.0.4
[11] grid_4.0.4 png_0.1-7 jquerylib_0.1.3 withr_2.4.1 htmltools_0.5.1.1
[16] ellipsis_0.3.1 digest_0.6.27 lifecycle_1.0.0 crayon_1.4.1 Matrix_1.2-18
[21] later_1.1.0.1 sass_0.4.1 promises_1.2.0.1 cachem_1.0.4 mime_0.10
[26] compiler_4.0.4 bslib_0.2.4 jsonlite_1.7.2 httpuv_1.5.5

The model and index used are files that I have already generated in a previous script. A base model from SentenceTransformers can be used instead e.g. model <- SentenceTransformer('multi-qa-distilbert-cos-v1').

@t-kalinowski
Copy link
Member

Hi @alicesaunders,

Can you please try to make your example smaller and something I can run locally to reproduce the error?

At a quick glance it looks like there is unmodified python code in the R script (e.g, usage of . and { in ds.map, etc.). Also, the python function import_libraries() seems to me coming from a misunderstanding of the difference in scoping rules between python and R; import does not make the package symbols globally available like library() does in R.

(This issue thread is more for reporting bugs than for support).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants