Skip to content

jmmgr/Cantonese-sentences

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

You can find all the sentences I already have in: cantonese-sentences.herokuapp.com/sentences

What is this project?

This is a webpage that store sentences in cantonese with their translation in
english and romanization. In the future I would like to add audio.

Why this project?

The reason why I want to storage this sentecens in internet is because I want
to create a memrise.com course for cantonese, since all the courses focuses
in learn cantonese characteres instead whole sentences.

Where did I get the sentences?

In http://tatoeba.org/eng you can find a lot of sentences with their
traductions, you can download the files:

  http://tatoeba.org/files/downloads/sentences.csv
  This file contains: id \t language \t sentence
  we are only interesed in languages "eng" or "yue" (cantonese).

  http://tatoeba.org/files/downloads/links.csv
  This file contains: id1 \t id2
  this links is the relations of the translations, so in this file we will
  find the how to connect the different senteces to others in other languages.

Migrate from files to our own database.

We are going to use Postgress (cause is supported in heroku)
in the file methods.Java you can find all the files we have use to make the
migrations. Lets explain the steps:
  1- Load all the sentences in our db:
    Create the table sentences (firs you have to create a db call cantonese).
      CREATE TABLE sentences(
        id integer,
        language varchar(3),
        sentence varchar(255),
        PRIMARY KEY(id)
        );
        Populate the table with the method: readSentences().
        Only will insert when the language is eng or yue.

  2- Create the table relations:
      CREATE TABLE relations(
      relation1 integer,
      relation2 integer
      );
      Populate it with the method:readRelations().
      This method only will insert a field if the relations exist both of them
      in our sentences table, and the first id will be the cantonese and the
      second id the translate in english.

  3- Create the table cantonese:
        CREATE TABLE cantonese(
          id integer,
          language varchar(3),
          sentence varchar(255),
          PRIMARY KEY(id)
          );
           Populate it with the method copyCantonese().
          Firs we need to save all the table relations in a file
          (with the method writeLinks()), then we will read all the relations
          and when they exist in the table sentences we will insert them in
          the cantonese.
          So we only are going to have in the table cantonese sentences that
          have english translation. (And the english translations);

  4- Create romanization:
      Is important to us to have the romanization of the sentences so we know
      how to read it, for this we have use the method: translateRomanization().
      It makes a call to http://popupcantonese.com/adso/pinyin.php?text=而家中午,
      This return the romanization and we will insert it in the table cantonese.
      As well we will insert the relations in the table.

      We had problems with the codification of the chinese characters, so we
      develop this method: convertIntoUTF_8(), so we can make the petitions.
      for example converts this:而家中午 in: %E8%80%8C%E5%AE%B6%E4%B8%AD%E5%8D%88

So far we have in our database 2558 sentences in cantonese, in total with translations 8394 and 5865 relations, since heroku only accept 10.000 rows we are going to create a new table and migrate all the content.

the table is going to be:
    CREATE TABLE sentences(
      id_cantonese integer,
      cantonese varchar(255),
      id_english integer,
      english varchar(255),
      id_romanization integer,
      romanization varchar(255),
      cantonese_audio varchar(255)
      );
      We add the cantonese_audio just in case, I'm going to create this with
      scaffold to have all the structure.
      rails generate scaffold sentence id_cantonese:integer cantonese:string
      id_english:integer english:string id_romanization:integer romanization:string
      cantonese_audio:string

About

Web page and API Rest with sentences to learn cantonese

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published