📝 IncPar: Fully Incremental Neural Dependency and Constituency Parsing

A Python package for reproducing results of fully incremental dependency and constituency parsers described in:

Note: Our implementation was built from forking yzhangcs' SuPar v1.1.4 repository. The Vector Quantization module was extracted from lucidrains' vector-quantize-pytorch and Sequence Labeling encodings from Polifack's CoDeLin repositories.

Incremental Parsers

Dependency Parsing:
- Sequence Labeling (absolute, relative, PoS-based and bracketing encodings).
- Transition-based w. Arc-Eager.
Constituency Parsing:
- Sequence Labeling (absolute and relative encodings).
- Attach-Juxtapose.

Usage

In order to reproduce our experiments, follow the installation and deployment steps of SuPar, vector-quantize-pytorch and CoDeLin repositories. Supported functionalities are training, evaluation and prediction from CoNLL-U or PTB-bracketed files. We highly suggest to run our parsers using terminal commands in order to train and generate prediction files. In the future 🙌 we'll make available SuPar methods to easily test our parsers' performance from Python terminal.

Training

Dependency Parsing:

Sequence labeling Dependency Parser (SLDependencyParser): Inherits all arguments of the main class Parser and allows the flag --codes to specify encoding to configure the trees linearization (abs, rel, pos, 1p, 2p).

Experiment: Train absolute encoding parser with mGPT as encoder and LSTM layer as decoder to predict labels.

python3 -u -m supar.cmds.dep.sl train -b -c configs/config-mgpt.ini \
    -p ../results/models-dep/english-ewt/abs-mgpt-lstm/parser.pt \
    --codes abs --decoder lstm \
    --train ../treebanks/english-ewt/train.conllu \
    --dev ../treebanks/english-ewt/dev.conllu \
    --test ../treebanks/english-ewt/test.conllu

Model configuration (number and size of layers, optimization parameters, encoder selection) is specified using configuration files (see folder configs/). We provided the main configuration used for our experiments.

Transition-based Dependency Parser w. Arc-Eager (ArcEagerDependencyParser): Inherits the same arguments as the main class Parser.

Experiment: Train Arc-Eager parser using BLOOM-560M as encoder and a MLP-based decoder to predict transitions with delay $k=1$ ( --delay) and Vector Quantization (--use_vq).

python3 -u -m supar.cmds.dep.eager train -b -c configs/config-bloom560.ini \
    -p ../results/models-dep/english-ewt/eager-bloom560-mlp/parser.pt \
    --decoder=mlp --delay=1 --use_vq \
    --train ../treebanks/english-ewt/train.conllu \
    --dev ../treebanks/english-ewt/dev.conllu \
    --test ../treebanks/english-ewt/test.conllu

This will save in folder results/models-dep/english-ewt/eager-bloom560-mlp the following files:

parser.pt: PyTorch trained model.
metrics.pickle: Python object with the evaluation of test set.
pred.conllu: Parser prediction of CoNLL-U test file.

Constituency Parsing

Sequence Labeling Constituency Parser (SLConstituencyParser): Analogously to SLDependencyParser, it allows the flag --codes in order to specify the indexing to use (abs, rel).

python3 -u -m supar.cmds.const.sl train -b -c configs/config-mgpt.ini \
    -p ../results/models-con/ptb/abs-mgpt-lstm/parser.pt \
    --codes abs --decoder lstm \
    --train ../treebanks/ptb-gold/train.trees \
    --dev ../treebanks/ptb-gold/dev.trees \
    --test ../treebanks/ptb-gold/test.trees

Attach-Juxtapose Constituency Parser (AttachJuxtaposeConstituencyParser): From the original SuPar implementation, we added the delay and Vector Quantization flag:

python3 -u -m supar.cmds.const.aj train -b -c configs/config-bloom560.ini \
    -p ../results/models-con/ptb/aj-bloom560-mlp/parser.pt \
    --delay=2 --use_vq \
    --train ../treebanks/ptb-gold/train.trees \
    --dev ../treebanks/ptb-gold/dev.trees \
    --test ../treebanks/ptb-gold/test.trees

Evaluation

Our codes provides two evaluation methods from a .pt PyTorch:

Via Python prompt, loading the model with .load() method and evaluating with .evaluate():

>>> Parser.load('../results/models-dep/english-ewt/abs-mgpt-lstm/paser.pt').evaluate('../data/english-ewt/test.conllu')

Via terminal commands:

python -u -m supar.cmds.dep.sl evaluate -p --data ../data/english-ewt/test.conllu

Prediction

Prediction step can be also executed from Python prompt or terminal commands to generate a CoNLL-U file:

Python terminal with .predict() method:

>>> Parser.load('../results/models-dep/english-ewt/abs-mgpt-lstm/parser.pt')
    .predict(data='../data/english-ewt/abs-mgpt-lstm/test.conllu', 
            pred='../results/models-dep/english-ewt/abs-mgpt-lstm/pred.conllu')

Via terminal commands:

python -u -m supar.cmds.dep.sl predict -p \ 
    --data ../data/english-ewt/test.conllu \
    --pred ../results/models-dep/english-ewt/abs-mgpt-lstm/pred.conllu

Acknowledgments

This work has been funded by the European Research Council (ERC), under the Horizon Europe research and innovation programme (SALSA, grant agreement No 101100615), ERDF/MICINN-AEI (SCANNER-UDC, PID2020-113230RB-C21), Xunta de Galicia (ED431C 2020/11), Cátedra CICAS (Sngular, University of A Coruña), and Centro de Investigación de Galicia ‘‘CITIC’’.

Citation

@thesis{ezquerro-2023-syntactic,
  title     = {{Análisis sintáctico totalmente incremental basado en redes neuronales}},
  author    = {Ezquerro, Ana and Gómez-Rodríguez, Carlos and Vilares, David},
  institution = {University of A Coruña},
  year      = {2023},
  url       = {https://ruc.udc.es/dspace/handle/2183/33269}
}

@inproceedings{ezquerro-2023-challenges,
  title     = {{On the Challenges of Fully Incremental Neural Dependency Parsing}},
  author    = {Ezquerro, Ana and Gómez-Rodríguez, Carlos and Vilares, David},
  booktitle = {Proceedings of ICNLP-AACL 2023},
  year      = {2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.idea		.idea
configs		configs
supar		supar
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

configs

configs

supar

supar

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

📝 IncPar: Fully Incremental Neural Dependency and Constituency Parsing

Incremental Parsers

Usage

Training

Evaluation

Prediction

Acknowledgments

Citation

About

Releases

Packages

Languages

License

anaezquerro/incpar

Folders and files

Latest commit

History

Repository files navigation

📝 IncPar: Fully Incremental Neural Dependency and Constituency Parsing

Incremental Parsers

Usage

Training

Evaluation

Prediction

Acknowledgments

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages