Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Latest commit

 

History

History
74 lines (58 loc) · 3.86 KB

developer_guide.md

File metadata and controls

74 lines (58 loc) · 3.86 KB

Developer Guide

The clinical trial parser library contains tools that can be used to translate clinical trial eligibility criteria. For example, it has scripts for downloading data and running the CFG and IE parsers. The library does not contain publicly available data except for 20 clinical trials, which are used to illustrate the functionality of its modules.

CFG Parser

Installation steps:

  • Install Go from https://golang.org/dl/
  • Set GOPATH so that the cloned project is in $GOPATH/src/github.com/facebookresearch/Clinical-Trial-Parser
  • Run ./script/cfg_parse.sh in the project root directory. The script will write the parsed relations to cfg_parsed_clinical_trials.tsv.
  • The program parameters can be changed either by changing the command line arguments in cfg_parse.sh or config parameters in cfg.conf.

cfg_parse.sh demonstrates how the CFG parser could be used. Applications should write their own driver module.

Quality improvements:

CFG does not parse all ordinal and numerical criteria. It may also parse some criteria incorrectly. Errors may be fixed and new capabilities added by:

IE Parser

Installation steps:

The library includes a pre-trained NER binary. Drivers and config files are provided for illustrative purposes in src/cmd and src/resources/config. Applications may write their own driver modules.

Quality improvements:

  • The NER model can be improved by adding new training samples
  • The NEL module can be improved by
    • A better processing of the extracted NER terms
    • Incorporating a vocabulary that has a high match rate with the eligibility criteria terms
    • Adding synonyms to concepts or new synonyms to the custom MeSH files
    • Implementing term clustering to increase the NEL recall
  • Implement RE with negation extraction

Data

The library includes example scripts aact.sh and ingest.sh for downloading and ingesting clinical trials. While the scripts are provided for convenience, applications will most likely need to change them or use other means to do the same. For example, ingest.sh only samples few trials. An obvious place to start is to change the 'where' clauses. Note that these scripts use a postgreSQL database.