Skip to content

MorphDiv/TeDDi_sample

Repository files navigation

TeDDi

This is the repository for the Text Data Diversity Sample (TeDDi Sample), a part of the Swiss National Science Foundation funded project: Non-randomness in Morphological Diversity: A Computational Approach Based on Multilingual Corpora.

This repository contains the corpus data and code that processes and analyzes it. This is currently a work in progress.

If you use TeDDi, please cite as:

Steven Moran, Christian Bentz, Ximena Gutierrez-Vasques, Olga Pelloni, and Tanja Samardzic. 2022. TeDDi Sample: Text Data Diversity Sample for Language Comparison and Multilingual NLP. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1150–1158, Marseille, France. European Language Resources Association. Online: https://aclanthology.org/2022.lrec-1.123/

To contribute code or data to the repository, please first refer to our guidelines on contributing.

Different data formats available for direct download.

Main Contributors (alphabetical order):

  • Bentz, Christian
  • Gutierrez-Vasques, Ximena
  • Moran, Steven
  • Samardžić, Tanja
  • Sozinova, Olga

Language-specific contributors (alphabetical order):

  • Kalessa, Jule (Paiwan)
  • Mächler, Alina
  • Rood, David S. (Wichita)
  • Roth, Rainer (Wari')

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

License: CC BY-NC-SA 4.0

License: CC BY-NC-SA 4.0