Skip to content

A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the data and parse compound words.

License

Notifications You must be signed in to change notification settings

gambolputty/german-nouns

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

99 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

German nouns

A comma seperated list of ~100 thousand German nouns and their grammatical properties (tense, number, gender) as CSV file. Plus a module to look up the data and parse compound words. Compiled from the WiktionaryDE.

The list can be found here: german_nouns/nouns.csv

If you want to look up nouns or parse compound words, install this package (for Python 3.8+) and follow the instructions below:

Installation

pip install german-nouns

Lookup words

from pprint import pprint
from german_nouns.lookup import Nouns

nouns = Nouns()

# Lookup a word
word = nouns['Fahrrad']
pprint(word)

# Output:
[{'flexion': {'akkusativ plural': 'Fahrräder',
              'akkusativ singular': 'Fahrrad',
              'dativ plural': 'Fahrrädern',
              'dativ singular': 'Fahrrad',
              'dativ singular*': 'Fahrrade',
              'genitiv plural': 'Fahrräder',
              'genitiv singular': 'Fahrrades',
              'genitiv singular*': 'Fahrrads',
              'nominativ plural': 'Fahrräder',
              'nominativ singular': 'Fahrrad'},
  'genus': 'n',
  'lemma': 'Fahrrad',
  'pos': ['Substantiv']}]

# parse compound word
words = nouns.parse_compound('Vermögensbildung')
print(words)

# Output:
['Vermögen', 'Bildung'] # Now lookup nouns['Vermögen'] etc.

Compiling the list

To compile the list yourself, you need Python 3.8+ and Poetry installed.

1. Clone the repository and install dependencies with Poetry:

$ git clone https://github.com/gambolputty/german-nouns
$ cd german-nouns
$ poetry install

2. Compile the list of nouns from a Wiktionary XML file:

Find the latest XML-dump files here: https://dumps.wikimedia.org/dewiktionary/latest, for example this one and download it. Then execute:

$ poetry run python -m german_nouns.parse_dump /path-to-xml-dump-file.xml.bz2

The CSV file will be saved here: german_nouns/nouns.csv.

Remove german_nouns/index.txt to let the script recreate the word-index when using the lookup methods.


License: CC BY-SA 4.0

About

A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the data and parse compound words.

Topics

Resources

License

Stars

Watchers

Forks

Languages