ParseJS

ParseJS is a simple library I made (and bug-fixed) in the span of two (2) days. Currently, ParseJS (and ParseTS) remain *mostly* complete.

Example in action.

For an example of how you could use this library, check out this Brain**** interpreter I made on SoloLearn's Web-Development code playground.

Installation.

The developer's suggested method of installation is by downloading the contents of the repo into a zip, and extracting them.
But, if you don't want to have the file on your drive, you can use the CDN (Content Delivery Network) script import:

<script src="https://cdn.jsdelivr.net/gh/CalinZBaenen/ParseJS@main/src/parse_string.js"></script>

Description.

English.

The parse_string function of ParseJS takes in a block of text, and a list of keywords (tokens*) and scans the text you provided letter by letter and returns a list.
If there is a keyword that begins with the letter the function is currently looking at, checks if the sequence of letters ahead of the current letter (in combination with the current one) spells out a valid keyword. If it does, a symbol representing the token found will be inserted into the list, otherwise the current letter is inserted instead.

Programmernese.

The parse_string function of ParseJS takes in a string (str) and an array of strings (toks) and returns an array (parsed_array) of string OR symbol (Array<string|symbol>). parse_string iterates over str, and if there is a string (keyword*) in toks that begins with the current character being iterated over, it will check if the following sequence of characters forms a valid keyword. If a valid keyword is found, a symbol (Symbol.for( tok )) is inserted into parsed_array, otherwise the current character is inserted instead.

Examples

Sonic The Hedgehog.

So, we have this code. - Let's try to explain what's going on, and why we get the output we do.

// We tell `parse_string` that we want it to read "Knuckles, Tails, Amy, Sonic", but only
// search for "Sonic", "Tails", and "Knuckles".
parse_string("Knuckles, Tails, Amy, Sonic", [
  "Sonic", // "Sonic" is a keyword because it is included in this list.
  "Tails", // Same for "Tails".
  "Knuckles" // Ditto.
]);

This produces the output:

[
  Symbol.for(Knuckles), // "Knuckles" was a keyword - `parse_string` found "Knuckles".
  
  ',', // The character right after "Knuckles".
  ' ', // Character after the character after knuckles. -- This isn't a keyword, so it's left alone.
  
  Symbol.for(Tails),
  
  ',',
  ' ',
  'A', // "Amy" isn't a keyword, so her name is left alone.
  'm',
  'y',
  ',',
  ' ',
  
  Symbol.for(Sonic) // Sonic's at the end, but he was still found, so his name is "tokenized".
]

Testing. Testing! One (1). Two (2). Three (3).

Now. Lets do some more fiddling around. . .

parse_string("test12 test1 test2 test", [
  "test",
  "test1",
  "test12",
  "test2"
]);

Ok, so, let's walk through this.

Well, we have t. That's a good start. Now we look to see what keywords start with t-
Oh... well, that's strange. It looks like we have four "candidates".
Let's remove the obvious loser: ~~test2~~.

Now, we still have three possible candidates: test, test1, and test12.
How do we know which one to pick? - Simple. First we sort the candidates by length; test12, test1, test. Then, we arrange candidates in dictionary order. For this case, it doesn't change anything.
So. Now what? Well, let's scan ahead, if the next characters are est1, then we could use test1. - BUT, if the next characters are est12, we could use test12.
Since test12 is longer than test1 or test, it takes precedents. I.e. parse_string prefers this longer token because it's more confident this prediction is correct.

Cleaning up the clutter!

As of the latest patch; 0.051, you can remove the extra letters from the returned list.
It turns THIS output:

into THIS output:

So... How do we get this cleaner output?
Well, when you pass in your text and the keywords you want to find, you can also pass in a boolean (a yes or no value) that indicates if you want to keep the clutter. For backwards compatibility, this option is true (yes*) by default.

Bugs

Version 0.01: Tokens aren't sorted in order of length, causing inaccuracies. - Fixed in Version 0.02.
Version 0.01 - Version 0.04: When searching for multiple keyword "candidates", ghost characters would be left behind. See this DEV article for more info. - Fixed in Version 0.05.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
src		src
README.md		README.md
example.png		example.png
example2.png		example2.png
example3.png		example3.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

README.md

README.md

example.png

example.png

example2.png

example2.png

example3.png

example3.png

Repository files navigation

ParseJS

Example in action.

Installation.

Description.

English.

Programmernese.

Examples

Sonic The Hedgehog.

Testing. Testing! One (1). Two (2). Three (3).

Cleaning up the clutter!

Bugs

About

Releases

Packages

Languages

CalinZBaenen/ParseJS

Folders and files

Latest commit

History

Repository files navigation

ParseJS

Example in action.

Installation.

Description.

English.

Programmernese.

Examples

Sonic The Hedgehog.

Testing. Testing! One (1). Two (2). Three (3).

Cleaning up the clutter!

Bugs

About

Topics

Resources

Stars

Watchers

Forks

Languages