Skip to content

dongyuwei/web-pinyin-ime

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 

Repository files navigation

web-pinyin-ime

online pinyin input method

online demo

https://dongyuwei.github.io/web-pinyin-ime/

The pinyin dict source

https://android.googlesource.com/platform/packages/inputmethods/PinyinIME/+/refs/heads/master/jni/data/rawdict_utf16_65105_freq.txt You can download the Android PinyinIME via this link: https://android.googlesource.com/platform/packages/inputmethods/PinyinIME/+archive/refs/heads/master.tar.gz

It is licensed under the Apache License, Version 2.0, see: https://android.googlesource.com/platform/packages/inputmethods/PinyinIME/+/refs/heads/master/NOTICE

The rawdict_utf16_65105_freq.txt and NOTICE are included in ./ime/src/script directory.

The process of the pinyin dict:

  • Convert the file to UTF-8 encoded: iconv -f UTF-16 -t UTF-8 rawdict_utf16_65105_freq.txt > google_pinyin_rawdict_utf8_65105_freq.txt
  • Transform the dict to ./ime/src/pinyin/google_pinyin_dict_utf8_55320.ts, see the nodejs script: ./ime/src/script/dict_preprocess.js
  • Build a packed trie in the transform step. This enables pinyin prefix input.
  • The final pinyin dict is ./ime/src/pinyin/google_pinyin_dict_utf8_55320.ts, which includes the transformed pinyin data and the prepared packed Trie.

Dev prerequisite

  • nodejs(tested with v14.17.0)
  • pnpm(tested with 6.17.2)

Make sure you installed nodejs and pnpm, then istall npm packages: cd ime && pnpm install

For dev

pnpm run dev

The core logic located in ./ime/src/pinyin/ime_engine.js and ./ime/src/pinyin/IME.tsx If you make any changes, make sure to run cd ime && pnpm test, see ./ime/src/pinyin/ime_engine.test.ts.

Tests

pnpm run test

see ./ime/src/pinyin/ime_engine.test.js, 支持

  • 全拼
  • 首字母匹配
  • 拼音前缀匹配
import getCandidates from './ime_engine';

it('should get candidates with full pinyin', () => {
  expect(getCandidates('xihongshi')).toEqual(['西红柿']);
});

it('should get sorted candidates with abbr of pinyin(First chars of pinyin)', () => {
  // `xhs` maybe abbr of `xin hua she`, or `xi hong shi`, etc.
  expect(getCandidates('xhs')).toEqual([
    '新华社',
    '西红柿',
    '小和尚',
    '小护士',
    '巡回赛',
  ]);
});

it('should get sorted candidates with pinyin prefix', () => {
  // `xih` maybe prefix of `xi huan`, or `xi huan ni`, or `xi hong shi`, etc.
  expect(getCandidates('xih')).toEqual([
    '喜欢',
    '喜欢你',
    '西湖',
    '喜好',
    '细化',
    '西红柿',
    '喜欢吃',
    '稀罕',
    '喜欢听',
    '熄火',
    '西汉',
    '洗好',
    '嘻哈',
    '喜获',
    '喜欢什么',
    '喜欢自己',
    '西海岸',
    '西化',
  ]);

  expect(getCandidates('xiho')).toEqual(['西红柿']);
  expect(getCandidates('xihon')).toEqual(['西红柿']);
  expect(getCandidates('xihong')).toEqual(['西红柿']);
  expect(getCandidates('xihongs')).toEqual(['西红柿']);
  expect(getCandidates('xihongsh')).toEqual(['西红柿']);
  expect(getCandidates('xihongshi')).toEqual(['西红柿']);
});

Build

pnpm run build

How to customize?

  • You can build a customized UI using the existing pinyin input method engine(see ime_engine.js), there is only one simple API: getCandidates(inputString)

  • You can get some inspiration from the reference implementation (The IME React Component in IME.ts) and the unit test cases in ime_engine.test.ts

About

online pinyin input method. 基于谷歌拼音开源词库。

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published