Skip to content

Latest commit

 

History

History
49 lines (32 loc) · 2.04 KB

head.md

File metadata and controls

49 lines (32 loc) · 2.04 KB

SITE: Simple IT English

English is the only language of IT community. I know how pain to learn english and remember the english words as a non-native speaker. Many talent engineers were kept out from community because lack of english skill. Most public IT english words books is not related to IT community ,or too academic. I try to create a basic english dictionary from community and for programmer/software engineer . The assumption is developers are able to :

  1. [ x ] Read&Write posts in Stackoverflow.com

  2. [ x ] Read hackernews

  3. [ ] Read&Write comments and readme in github.com.

Word Source:

Source Newest Post Oldest Post Row Count Size
HackerNews comments 2015-10-13 08:44:02 UTC 2006-10-09 19:51:01 UTC 8399417 3.41 GB
HackerNews stories 2015-10-13 08:44:34 UTC 2006-10-09 18:21:51 UTC 1959809 402.71 MB
StackOverflow answers 2019-09-01 05:22:21.463 UTC 2008-08-01 13:16:49.127 UTC 27665009 22.27 GB
StackOverflow questions 2019-09-01 05:23:41.743 UTC 2008-08-03 21:38:52.623 UTC 18154493 28.13 GB
48.8 GB processed

Processes for clean & select words:

  1. [ x ] Select top 16000 most frequently used words from StackOverflow and HackerNews . Thanks to Google cloud public dataset and BigQuery ,it save me some days and coffees.
  2. [ x ] Select english words by words list
  3. [ x ] Exclude too simple english words by voa special and common 3000
  4. [ x ] Group the words inflection to root with nltk steam module.
  5. [ ] Fetch meaning from opted dictionary.
  6. [ ] Fetch example sentence from StackOverflow post.
  7. [ x ] Output markdown file.
  8. [ x ] Output anki package file.

Contribute:

readme.md is generated by script. Not directly edit this readme. Edit the python script and corpus text.

SITE: Simple IT English