tokenizer
A grammar describes the syntax of a programming language, and might be defined in Backus-Naur form (BNF). A lexer performs lexical analysis, turning text into tokens. A parser takes tokens and builds a data structure like an abstract syntax tree (AST). The parser is concerned with context: does the sequence of tokens fit the grammar? A compiler is a combined lexer and parser, built for a specific grammar.
Here are 1,075 public repositories matching this topic...
Vietnamese tokenizer (Maximum Matching and CRF)
-
Updated
Jan 27, 2014 - Python
Ruben's master thesis
-
Updated
Oct 19, 2015 - Python
-
Updated
Oct 26, 2015 - Java
A small ECMAScript parser, tokenizer and minifier written in JavaScript.
-
Updated
Apr 21, 2016 - JavaScript
Local login app using ExpressJS.
-
Updated
May 12, 2016 - JavaScript
Using VnTokenizer to token document in Java
-
Updated
Jul 29, 2016 - Java
-
Updated
Sep 18, 2016 - Python
break down a corpus of text into lines and tokens
-
Updated
Nov 20, 2016 - JavaScript
USC-Foundations of Artificial Intelligence Codes
-
Updated
Dec 15, 2016 - Java
regular language tools - automata-based tokenizer, LL(1) parser
-
Updated
Dec 17, 2016 - Python
Given a collection of documents, this project does the tokenization and stemming of all the words in the document collections. The implementation is done in java.
-
Updated
Feb 16, 2017 - Java
- Followers
- 10.1k followers
- Wikipedia
- Wikipedia