Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: terminology matcher with normalisation #62

Open
bdura opened this issue Apr 25, 2022 · 1 comment
Open

Feature request: terminology matcher with normalisation #62

bdura opened this issue Apr 25, 2022 · 1 comment
Labels
discussion Discussion about architecture choices enhancement New feature or request

Comments

@bdura
Copy link
Contributor

bdura commented Apr 25, 2022

Feature type

Matcher pipeline to handle the single label/multiple subconcepts use-case.

Description

As discussed in #58, we would certainly benefit from having EDS-NLP handle the nitty-gritty detail of matching a terminology with automatic concept normalisation.

For now, it is reasonably easy to match a terminology wherein the label is the normalisation. However, we could use the kb_id_ attribute (see spaCy documentation) to include a more hierarchical structure.

For instance, paracetamol/tylenol should probably get the label drug and a kb_id_ like ATC=N02BE01.

Proposition

We could modify the eds.matcher component to handle this case natively, or create a new component.

@bdura bdura added enhancement New feature or request discussion Discussion about architecture choices labels Apr 26, 2022
@gozat
Copy link
Contributor

gozat commented Apr 26, 2022

In the spirit of spaCy, I just wonder whether such information has to be put in custom attributes or handled by the EntityLinker, that relates Span to KnowledgeBase (as far as I understand, tell me if I missed something).

For the example of paracétamol (an ingredient in ROMEDI nomenclature), one has several ATC for instance : https://www.romedi.fr/romedi/IN7310nlprjlh2sb3t0apdjfvtk6u0ifp3 and to get a precise ATC instance may or may not be resolved by the EntityLinker using information in the rest of the Doc. In addition, it may or may not be of interest for the user to resolve this entity ; user might be interested by ingredient and prefer fitting the drugs to their ingredients.

In short, instead of thinking in term of terminology, perhaps one could think of entities in terms of graph, and try to understand to which extend one can import graph properties inside the spaCy machinery.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussion about architecture choices enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants