Multilingual Transformer Ensembles for Portuguese Natural Language Tasks

This is the source code for the model that has been submitted by the Deep Learning Brasil team to the II Evaluation of Semantic Textual Similarity and Textual Inference in Portuguese that happened in 2019 during the Symposium in Information and Human Language Technology.

At that time, a submission produced by this code achieved the best scores among all submissions for the entailment task. This particular submission can be reproduced by the file settings/roberta-bert-multilingual-5folds.yml.

We described our approach in the paper “Multilingual Transformer Ensembles for Portuguese Natural Language Tasks”.

The complete test results of all our experiments are in the file reports/full_report.csv.

Setup

Assuming you have already installed Docker and nvidia-docker on your system, clone this repository and run:

PREFIX=roberta-portuguese CUDA_VISIBLE_DEVICES=0 bash scripts/start.sh

or, if you can only run Docker as root:

sudo PREFIX=roberta-portuguese CUDA_VISIBLE_DEVICES=0 bash scripts/start.sh

This command will perform all fine-tuning procedures specified on the configuration files in the settings folder over all datasets ( assin2, assin-ptpt and assin-ptbr ). Configuration files with a name that does not start with PREFIX will be ignored. If the image ruanchaves/assin:2.0 does not exist, it will be created.

Depending on your resources, you may want change the maximum amount of parallel workers allowed on each configuration file. Generally speaking, each worker will consume at most 8 gigabytes of GPU memory.

All intermediate files are deleted by default, and the final submissions will be stored in a folder called submission_<timestamp>, where <timestamp> stands for the system time when the fine-tuning procedure started.

Associated Repositories

You may want to take a look at the ruanchaves/elmo repository. It contains tests which were performed with ELMo and Portuguese word embeddings on the ASSIN datasets.

Citation

@inproceedings{rodrigues_assin2,
Author = {Ruan Chaves Rodrigues and Jéssica Rodrigues da Silva and Pedro Vitor Quinta de Castro and Nádia Félix Felipe da Silva and Anderson da Silva Soares },
Booktitle = {Proceedings of the {ASSIN 2} Shared Task: {E}valuating {S}emantic {T}extual {S}imilarity and {T}extual {E}ntailment in {P}ortuguese},
Pages = {[In this volume]},
Publisher = {CEUR-WS.org},
Series = {{CEUR} Workshop Proceedings},
Title = {Multilingual Transformer Ensembles for Portuguese Natural Language Tasks},
Year = {2020}}

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
epochs		epochs
notebooks		notebooks
presentations		presentations
reports		reports
scripts		scripts
settings		settings
sources		sources
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
assin-eval.py		assin-eval.py
assin.py		assin.py
assin2-test.xml		assin2-test.xml
baseline-majority.py		baseline-majority.py
baseline-overlap.py		baseline-overlap.py
commons.py		commons.py
ensemble.py		ensemble.py
final_submission.py		final_submission.py
report.py		report.py
requirements.txt		requirements.txt
run_glue.py		run_glue.py
run_train.py		run_train.py
settings.json		settings.json
utils_conv.py		utils_conv.py
utils_glue.py		utils_glue.py
utils_submission.py		utils_submission.py

License

ruanchaves/assin

Folders and files

Latest commit

History

Repository files navigation

Setup

Associated Repositories

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages