Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Score #64

Open
Camco3 opened this issue Apr 26, 2022 · 1 comment
Open

Feature request: Score #64

Camco3 opened this issue Apr 26, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@Camco3
Copy link

Camco3 commented Apr 26, 2022

Score

Description

We analysed the performance of the pipeline eds.charlson over 100 documents extracted from the Bordeaux CHU medical datawarehouse. We compare Charlson score extracted by edsnlp pipeline with Charlson score extracted by hand. Over the hundred documents we found 5 diverging cases which brings out several issues that might be usefull in a more general context of integer score detection.

Proposition

Here are few points that could help to enhance score detection:

  1. Include Roman numerals (i.e 'Charlson score is about II)
  2. Ranges in score (i.e 'Charlson score lies between 2 and 3)
  3. Fuzziness for mispelling score name (i.e 'Charltson score of 3')
  4. Ordering (i.e 'Charlson score > 7)
@Camco3 Camco3 changed the title Feature request: [feature] Feature request: Score Apr 26, 2022
@bdura bdura added the enhancement New feature or request label Apr 26, 2022
@bdura
Copy link
Contributor

bdura commented Apr 26, 2022

Thanks for the heads up! A few thoughts on this, for future reference:

  1. spaCy's is_num attribute could be helpful there
  2. We could draw inspiration from the eds.measures pipeline to capture these cases, I figure this is related to the concept of composite measures
  3. I admit I'm a bit concerned about optimality there... As discussed, perhaps we should include these typos directly? I reckon the precision shouldn't suffer, what do you think?
  4. I don't have much to add on this, it should definitely be handled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants