Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Language Detection #72

Open
insightindustry opened this issue Jan 9, 2021 · 1 comment
Open

Support Language Detection #72

insightindustry opened this issue Jan 9, 2021 · 1 comment
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@insightindustry
Copy link
Owner

Given the localization validators outlined in #69 and #71 , it may be helpful to extend the library with language-detection capabilities. Namely to add a validator and checker which can detect the language used in a given string along the lines:

  • validators.in_language(value, ..., standard = None) where:
    • value is the string whose contents should be checked to identify the language
    • standard indicates the standard language codes that are returned in response, though where None returns the Human Readable language (e.g. "American English")
  • checkers.is_in_language(value, languages) which returns True if value is detected to be in one of the languages contained in languages

IMPORTANT: Language detection is non-trivial in its complexity, and there are numerous other third-party libraries out there that try to do this. The key considerations are performance and accuracy, with different libraries getting different marks for value (text content) of varying length or complexity.

@insightindustry insightindustry added the enhancement New feature or request label Jan 9, 2021
@insightindustry insightindustry added this to the 1.6.0 milestone Jan 9, 2021
@insightindustry insightindustry self-assigned this Jan 9, 2021
@insightindustry
Copy link
Owner Author

There are several important questions that need to be answered for this feature:

  1. Should language detection be built in the Validator Collection, or leverage an outside library?
  2. If leveraging an outside library, should that dependency be coupled with the Validator Collection (present in requirements.txt) or should it be considered a conditional dependency?
  3. Should there be an "import selection tree" which tries to optimize for the language detection library that is best for a given value length AND that is available in the runtime environment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant