Skip to content

BDPROTO: a database of phonological inventories from ancient and reconstructed languages

License

Notifications You must be signed in to change notification settings

bdproto/bdproto

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BDPROTO

DOI

BDPROTO is a database of phonological inventories from ancient and reconstructed languages. The aggregated phonological inventory data and associated metadata is available in a flat CSV file in this directory named bdproto.csv. Bibliographic references for data points are available in the sources.bib file.

BDPROTO 1.0 is described in:

  • Marsico, Egidio, Sebastien Flavier, Annemarie Verkerk and Steven Moran. 2018. BDPROTO: A Database of Phonological Inventories from Ancient and Reconstructed Languages. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 1654-1658. May 7-12, Miyazaki, Japan. Online: http://www.lrec-conf.org/proceedings/lrec2018/pdf/534.pdf.

An expanded version, BDPROTO 1.1, is described in:

The BDPROTO data are available under the Creative Commons Attribution Share Alike 4.0 International license. If you use the BDPROTO data in your research, it is recommended that you make use of the most recent release in your own analyses. We archive each release of BDPROTO in Zenodo.

The original source data (and project name) come from:

This legacy resource was converted into Unicode UTF-8 using principles defined in Moran & Cysouw, 2018 and according to the PHOIBLE Unicode IPA conventions.

The original BDPROTO data is available in various formats along with the extraction and transformation scripts at: https://github.com/bdproto/bdproto-legacy. BDPROTO-legacy contains a convenience sample aimed at genealogical diversity and it contains no duplicate inventories for a given reconstruction.

Three additional resources have been compiled to update and extend the coverage of the original BDPROTO sample. These include the raw data in the src directory for the three resources:

These data points contain more recent reconstructions, which in some cases introduces more than one inventory for a given reconstruction.

The ancient-near-east inventories were collected as part of a project on ancient Near East languages at the Department of Comparative Linguistics at the University of Zurich. Additional inventories were also extracted from source references at the Department of Comparative Linguistics (we simply call this source uz). Work by The Hebrew University of Jerusalem includes phonological inventories from recent publications. This source is labeled huji.

For all of the data sources, we have gathered additional metadata (when available) including identifying information such as Glottolog codes and information about estimated time-depths, possible homelands, etc.

The phonological inventory data from the various raw input sources and their metadata are aggregated into a single flat-file table.

We have also collected and curated references for each data point and these are available in the sources.bib file.

Inferred geo-coordinates for homelands, when available, come from:

  • Wichmann, S., Müller, A., & Velupillai, V. 2010. Homelands of the world’s language families: A quantitative approach. Diachronica, 27(2), 247-276.

Preliminary studies based on BDPROTO have been presented at the following conferences:

  • Grossman, Eitan. 2021. Universals of phonological segment borrowing? Questions, evidence, methods, findings. Keynote address at the 43 Jahrestagung der Deutschen Gesellschaft fuer Sprachwissenschaft (DGfS). (Freiburg, Germany, 23-26 February 2021). Slides.

  • Moran, Steven and Eitan Grossman. 2021. Temporal bias: a new type of bias for typologists to worry about. 5th Usage-Based Linguistics Conference (Tel Aviv, Israel, July 5-7 2021). Slides.

Published papers making extensive use of BDPROTO include:

The data in this repository contain the development version, i.e., we continue to add, edit, and refine BDPROTO. We are also making the BDPROTO data available in the Cross-Linguistic Data Format here:

And a website is forthcoming.