A GLR Parser for Natural Language Processing and Translation
GLRParser is not just a parser. It's
- Natural Language Parser which handles ambiguous grammars
- Unification Engine which handles unification of features
- Translation Engine for Syntax-Based Translation of Natural Languages
For a quick start, you can use following commands to install and run an interactive demo for English to Turkish Translation:
pip install GLRParser
python -m GLRParser.main
In interactive demo, you can enter an English sentence to get Turkish translation(s):
Grammar load time: 806,295 mics
Number of rules: 24915
Number of states: 28047
Number of symbols: 5738
Number of NonTerm symbols: 159
Enter Sent> who do you think you are
kim olduğunuzu düşünüyorsunuz
Enter Sent> as long as she is happy i will be happy
mutlu olduğu sürece mutlu olacağım
Enter Sent> his sudden departure had demonstrated how unreliable he was
ani ayrılışı ne kadar güvenilmez olduğunu göstermişti
Enter Sent> attacks were threatening to destabilize the government
saldırılar yönetimi istikrarsızlaştırmakla tehdit ediyordu
Enter Sent> if i had come early she wouldn't have missed her bus
erken gelmiş olsaydım otobüsünü kaçırmış olmazdı
erken gelmiş olsaydım otobüsünü özlemiş olmazdı
You can also visit following url to try interactive translations: https://mdolgun.pythonanywhere.com/
For a list of sample translations check the file: https://github.com/mdolgun/GLRParser/blob/master/GLRParser/grm/main.out.txt
For detailed information about the features and the grammar syntax, you can refer to wiki page: https://github.com/mdolgun/GLRParser/wiki
Sample code for parsing and translation should be like:
from GLRParser import Parser, ParseError, GrammarError, Tree
try:
parser = Parser() # initialize parser object
parser.parse_grammar("GLRParser\grm\simple_trans.grm") # load grammar from a file
sent = "i saw the man in the house with the telescope" # sentence to parse
parser.compile() # constructs parsing tables
parser.parse(sent) # parse the sentence
tree = parser.make_tree() # generates parse forest
ttree = parser.trans_tree(tree) # translate the parse forest
print(ttree.pformatr()) # pretty-print the translated parse forest
for trans in ttree.enum(): # enumerate and print all alternative translations in the parse forest
print(trans.replace(" -","")) # concat suffixes
except GrammarError as ge:
print(ge)
except ParseError as pe:
print(pe))
Simple grammar for English -> Turkish translation (see simple_trans.grm)
S -> NP VP : NP VP
S -> S in NP : NP -de S
S -> S with NP : NP -la S
NP -> i :
NP -> the man : adam
NP -> the telescope : teleskop
NP -> the house : ev
NP -> NP-1 in NP-2 : NP-2 -deki NP-1
NP -> NP-1 with NP-2 : NP-2 -lu NP-1
VP -> saw NP : NP -ı gördüm
Given the above grammar and input string:
i saw the man in the house with the telescope
It produces a parse forest, and 5 alternative translations (of which two are identical):
1. teleskopla evde adamı gördüm
2. teleskopla evdeki adamı gördüm
3. teleskoplu evde adamı gördüm
4. teleskoplu evdeki adamı gördüm
5. teleskoplu evdeki adamı gördüm
The semantic interpretations are:
1. saw(in the house) saw(with the telescope)
2. man(in the house) saw(with the telescope)
3. saw(in the house) house(with the telescope)
4. man(in the house) man(with the telescope)
5. man(in the house) house(with the telescope)