Skip to content

A Ruby parser for the GCIDE English word dictionary that generates friendly structured JSON files for easy mass database import. Includes other resources if you need more data for an English dictionary database.

Notifications You must be signed in to change notification settings

javierjulio/dictionary

Repository files navigation

English Dictionary

Tests

This is a minimally tested and incomplete parser of the Webster Unabriged English Dictionary from the modified GCIDE XML that categorizes content to make it easy to find and parse. I was doing a lot of research on finding a machine readable English dictionary for a project where I didn't want to rely on a third party API (e.g. Wordnik).

Generate Simple JSON

From the project directory, run the following:

ruby parse.rb

This will generate a JSON file for each GCIDE XML file. Each object key is a unique word and the value being an object containing the definitions (array of objects - definition, part of speech, field, and sequence). The files (excluding obsolete content) will contain ~99k unique words and ~160k definitions.

Resources

GCIDE

After reviewing all resources went first with parsing this GCIDE XML. The next best solution seems to be Wikitionary TSV.

Wikitionary TSV

Webster's Unabridged Dictionary (1913 - public domain)

Moby Word Lists

About

A Ruby parser for the GCIDE English word dictionary that generates friendly structured JSON files for easy mass database import. Includes other resources if you need more data for an English dictionary database.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published