You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been exploring the use of custom models with the Anonymize and Sensitive scanners within the llm_guard library, as mentioned in the changelog for the latest release. Specifically, I'm interested in integrating the SpaCy model beki/en_spacy_pii_distilbert for PII detection tasks.
Objective
My goal is to leverage the beki/en_spacy_pii_distilbert model, which is not a traditional Hugging Face Transformer model but rather a SpaCy model, for enhanced PII detection accuracy and reduced latency as highlighted in your changelog.
Issue
I encountered difficulties when attempting to load and use this SpaCy model with the Anonymize scanner. Typically, the process for integrating models relies on specifying a model path or configuration that is compatible with Hugging Face's Transformer models. However, given that beki/en_spacy_pii_distilbert is a SpaCy model, the standard approach doesn't seem to apply.
Attempts
Here's an outline of my approach so far, based on the available documentation and examples:
Model Specification: Attempted to specify beki/en_spacy_pii_distilbert directly as a model path or through a configuration dictionary.
Custom Recognizer: Explored creating a custom recognizer to wrap the SpaCy model loading and analysis logic.
Adapter Pattern: Considered using an adapter to bridge the gap between the expected input/output formats of the llm_guard scanners and the SpaCy model.
The last approach is kind of working. But wanted to know best practise to use this model inside llm_guard
custom_recognizer = CustomSpacyRecognizer()
adapter = CustomRecognizerAdapter(custom_recognizer=custom_recognizer)
vault = Vault()
scanner = Anonymize(
vault=vault,
language="en",
use_faker=True,
custom_recognizer=adapter # Passing the adapter as the custom recognizer
)
Could you provide guidance or examples on how to correctly integrate a SpaCy model like beki/en_spacy_pii_distilbert with the Anonymize and Sensitive scanners?
Thank you for developing llm_guard and for your support in enhancing its capabilities. I look forward to your advice on integrating SpaCy models for improved PII detection.
Best regards,
Rakend
The text was updated successfully, but these errors were encountered:
Hey @rakendd , thanks for reaching out. We used to have this model but then realized that it blocked updates to the latest transformers due to dependency on "spacy-transformers>=1.1.8,<1.2.0".
Hello llm_guard Team,
I've been exploring the use of custom models with the Anonymize and Sensitive scanners within the llm_guard library, as mentioned in the changelog for the latest release. Specifically, I'm interested in integrating the SpaCy model beki/en_spacy_pii_distilbert for PII detection tasks.
Objective
My goal is to leverage the beki/en_spacy_pii_distilbert model, which is not a traditional Hugging Face Transformer model but rather a SpaCy model, for enhanced PII detection accuracy and reduced latency as highlighted in your changelog.
Issue
I encountered difficulties when attempting to load and use this SpaCy model with the Anonymize scanner. Typically, the process for integrating models relies on specifying a model path or configuration that is compatible with Hugging Face's Transformer models. However, given that beki/en_spacy_pii_distilbert is a SpaCy model, the standard approach doesn't seem to apply.
Attempts
Here's an outline of my approach so far, based on the available documentation and examples:
Model Specification: Attempted to specify beki/en_spacy_pii_distilbert directly as a model path or through a configuration dictionary.
Custom Recognizer: Explored creating a custom recognizer to wrap the SpaCy model loading and analysis logic.
Adapter Pattern: Considered using an adapter to bridge the gap between the expected input/output formats of the llm_guard scanners and the SpaCy model.
The last approach is kind of working. But wanted to know best practise to use this model inside llm_guard
Could you provide guidance or examples on how to correctly integrate a SpaCy model like beki/en_spacy_pii_distilbert with the Anonymize and Sensitive scanners?
Thank you for developing llm_guard and for your support in enhancing its capabilities. I look forward to your advice on integrating SpaCy models for improved PII detection.
Best regards,
Rakend
The text was updated successfully, but these errors were encountered: