Skip to content

English-Chinese-Japanese translation dataset of the terms in Genshin Impact

License

Notifications You must be signed in to change notification settings

xicri/genshin-langdata

Repository files navigation

genshin-langdata

This repository contains the translation dataset for Genshin Dictionary (GitHub) and Genshin Machine Translation (GitHub).

Just want to access translation data programatically?

Use API: https://dataset.genshin-dictionary.com/words.json

API document (Currently only Japanese version available. English version is planned.)

Development

Translation dataset for Genshin Dictionary is included in dataset/ directry. The dataset is written in JSON5.

Directory structure

dataset/
 ├ dictionary/ ― Dataset for Genshin Dictionary. Also used for Genshin Machine Translate.
 │ ├ artifacts.json5
 │ ├ characters.json5
 │ ︙
 │
 ├ translator/ ― Additional translation dataset for Genshin Machine Translate. This is not used for Genshin Dictionary.
 │ ├ characters.json5
 │ ├ domains.json5
 │ ︙
 │
 └ tags.json ― list of tags attached to each word in Genshin Dictionary.

JSON5 format

See API document. (Currently only Japanese version available. English version is planned.)

pinyins

When you add Chinese pronunciation in pinyin, you can use tone numbers (e.g. qia3) in source JSON5 files. It is transformed to tone letters (e.g. qiǎ) on build.

e.g.

  {
    // ...
    zhCN: "天云峠",
    pinyins: [{ char: "峠", pron: "qia3" }],
    // ...
  },

  {
    // ...
    "zhCN": "天云峠",
    "pinyins": [{ "char": "", "pron": "qiǎ" }],
    // ...
  },

Validation

JSON5 validation is not mandatory process because it automatically runs on GitHub Actions when you open a Pull Request. However, if you want to validate JSON5s on your local machine, follow the insturuction below.

You need following requirements:

  • Node.js: The latest LTS version recommended
  • npm: The latest version recommended
  • (Windows only) PowerShell 7+
    • Some npm scripts needs && support

To run validation:

$ cd /path/to/genshin-langdata
$ npm ci
$ npm test
$ npm run lint

Utility scripts

npm run todo lists the words without Chinese translation.

Example:

$ npm run todo

> todo
> node scripts/todo.js

# Words without Chinese translation

  ## characters.json5
    - Snezhevna (シュナイツェフナ)
    - Snezhevich (シュナイツェビッチ)
    ...

  ## quests.json5
    - Break the Sword Cemetery Seal (剣塚封印を探索)
    - Fishing For Jade (海上拾玉)
    ...