Skip to content

Latest commit

 

History

History
38 lines (25 loc) · 974 Bytes

README.md

File metadata and controls

38 lines (25 loc) · 974 Bytes

Command line scripts for processing of lexicographical data from Wikidata.

Usage examples

Binder

Which labels contain spaces?

$ make labels.tsv
$ awk -F $'\t' '$3 ~ / / {print}' labels.tsv

Which properties are used how frequently on lexemes and forms?

$ make properties.tsv
$ awk '{print $2}' properties.tsv | ./histogram

Which language codes are used how often?

$ make languages.tsv

The following extended processings requires to install wikidata-cli.

Which properties are used how frequently, with property labels:

$ make properties.tsv plabels.tsv
$ awk '{print $2}' properties.tsv | sort | join plabels.tsv - | ./histogram