Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Cookbook #18

Open
jeffrose opened this issue Sep 16, 2019 · 18 comments
Open

Discussion: Cookbook #18

jeffrose opened this issue Sep 16, 2019 · 18 comments

Comments

@jeffrose
Copy link
Contributor

You may want to start a cookbook, similar to Ramda's that shows ways of solving common problems.

While I have not worked with arcsecond for very long, I've already encountered issues where the basic use cases were easy to solve but the more advanced use cases forced me to take a completely different approach.

For example, padded digits. This works fine for most use cases...
const paddedDigits = sequenceOf( [ possibly( char( '-' ) ), possibly( many( char( '0' ) ) ), digits ] );
... until your input is only 0, then it fails. What's the best way to resolve that problem? Use choice with a char( '0' ) or is there another approach that's better?

Another example is parsing query strings. A basic parser is easy to throw together using sepBy, letters, and digits, but if you want to fully support the spec, you need to take a complete different approach to meet all the edge cases.

Now that you have a robust library in place, I think developer guidance is key to adoption.

@francisrstokes
Copy link
Owner

Hey Jeff - completely agree.

In the 2.x rewrite I did recently I tried to get somewhere close to this with the tutorial, but with this tech there are many gotchas and pieces of accumulated wisdom.

I would like to keep this issue open as a place to track what could make it's way into such a cookbook.

To answer your question about padded digits, I would probably go with a regex here:

const paddedDigits = regex(/^-?0*[0-9]+/);

However, an approach more similar to your original would look like:

const paddedDigits = sequenceOf([
  possibly(char('-')),
  many(char('0'))
    .map(x => x.join(''))
    .chain(zeroResult => {
      if (zeroResult !== '') {
        return possibly(digits).map(digitResult => {
          if (digitResult) {
            return zeroResult + digitResult;
          }
          return zeroResult;
        });
      }
      return digits;
    }),
]).map(x => x.join(''));

This is a bit gnarly, but uses .chain to check if any padding zeros were parsed, and then return either digits or possibly(digits) depending on that result. Basically any time your parsing is dependent on the result, you're going to need to use chain. Luckily regex is a whole beast that takes care of some of that so you can express it much simpler.

Hope this helps a bit!

@jeffrose
Copy link
Contributor Author

jeffrose commented Sep 18, 2019

Using regex felt like a cheat to me under the circumstances, since I was trying to familiarize myself with the API.
I can tell you that this did work for me...

const paddedDigits = choice( [
    sequenceOf( [ possibly( char( '-' ) ), possibly( many( char( '0' ) ) ), digits ] ),
    char( '0' ).map( ( value ) => [ null, null, value ] )
] );

The one advantage this has over using regex is it lets you make decisions on the padding size, etc. by breaking out the padding from the value.
A more robust solution could look like...

const
    optionalSign = possibly( choice( [ char( '-' ), char( '+' ) ] ) ),
    zero = char( '0' ),
    zeroCount = possibly( many( zero ) ).map( ( value ) => value.length ),
    paddedDigits = choice( [
        sequenceOf( [ optionalSign, zeroCount, digits ] ),
        zero.map( ( value ) => [ null, 0, value ] )
    ] );
console.log( paddedDigits.run( '0003' ).result );   // [ null, 3, '3' ]
console.log( paddedDigits.run( '-0010' ).result );  // [ '-', 2, '10' ]
console.log( paddedDigits.run( '+00005' ).result ); // [ '+', 4, '5' ]

@IMax153
Copy link
Contributor

IMax153 commented Oct 11, 2019

Hello Francis,

First of all, let me say that I am thoroughly enjoying the LLJS series on YouTube. I am very much looking forward to the next installment of the VM series.

I have been working on implementing a parser for the Lua programming language with the arcsecond library, but have been running into several issues that I have been unable to solve by referring to your examples/tutorial. I apologize if my question is naive, but I have only just begun using the library.

The most prominent issue occurs in the situation where two choice parsers implement one another and continuously call one another, resulting in an infinite loop. I have read your response to the issue posed by @justgage, and ensured that I have not implemented any parsers in that manner.

Hopefully you can get an idea of what I am referring to from the example code below, which I tried to simplify as much as possible.

// Variable
export const variable = recursiveParser(() =>
  whitespaceSurrounded(
    choice([
      sequenceOf([expressionPrefix, dot, name]),
      sequenceOf([expressionPrefix, betweenBrackets(expression)]),
    ]),
  ),
);

// Expression Prefix
export const expressionPrefix = recursiveParser(() =>
  choice([self, globalVariable, betweenParentheses(expression), variable]),
);

Very curious to hear your thoughts.

@francisrstokes
Copy link
Owner

Hi @IMax153.
It's tricky to say for sure without context, so just a couple of general things to keep in mind:

With choice, the main thing you need to be careful with is that it only works properly with parsers that can fail. An example of a kind of parser that cannot fail is many(something) - if it doesn't find any somethings, then no problem - the result is an empty array and not a failure. That doesn't mix properly with choice because choice relies on a parser failing before it can try the next one.

The other thing is try to look out for places that you might be defining Left recursion. You might need to change how you write the parser to ensure it doesn't get into an infinite loop.

Hope one or both of those help.

@jeffrose
Copy link
Contributor Author

jeffrose commented Nov 19, 2019

Here is my recipe for parsing document.cookie. It complies with most forgiving cookie spec since it's reading and not writing.

import { char, everythingUntil, regex, sepBy, sequenceOf, str, takeRight } from 'arcsecond';
import { fromPairs } from 'ramda';

const
    validCharacters = /^[^;,\s]*/,
    percentEncoded = /(%[0-9A-Za-z]{2})+/g,
    percentDecode = ( value ) => value.replace( percentEncoded, decodeURIComponent ),
    equalsSign = char( '=' ),
    cookie = sequenceOf( [
        everythingUntil( equalsSign ).map( percentDecode ),
        takeRight( equalsSign )( regex( validCharacters ) ).map( percentDecode )
    ] ),
    cookies = sepBy( str( '; ' ) )( cookie ).map( fromPairs );

console.log( cookies.run( 'a=123; b=456; c=%20' ).result );
//=> { "a": "123", "b": "456", "c": " " }

@francisrstokes, I originally had takeRight( equalsSign )( everythingUntil( endOfInput ) ).map( percentDecode ) but that didn't work. Should it have?

@francisrstokes
Copy link
Owner

francisrstokes commented Dec 5, 2019

Hey Jeff,

Sorry for the late response.
This is really nice - and very succinct. Perfect for a cookbook.

I'm not sure I understand where the
takeRight( equalsSign )( everythingUntil( endOfInput ) ).map( percentDecode )
parser would go - does it replace the first item in the cookie sequenceOf?

@jeffrose
Copy link
Contributor Author

jeffrose commented Dec 5, 2019

Hello Francis,

No problem.

Sorry, I wasn't clear. Instead of takeRight( equalsSign )( regex( validCharacters ) ).map( percentDecode ), which is responsible for parsing the cookie value, I had takeRight( equalsSign )( everythingUntil( endOfInput ) ).map( percentDecode ).

I thought it would work because at that point, it should be parsing each individual cookie, e.g. a=123. everythingUntil( equalsSign ).map( percentDecode ) parses the cookie name, leaving =123 behind. I wanted it to take anything after the = and use that as the cookie value, making it an extremely forgiving parser. While introducing validCharacters is not necessarily a bad thing, the code would be slightly more succinct without it.

@francisrstokes
Copy link
Owner

Ah I see. The reason it doesn't work is because the end of input really is the actual end of input, so it will parse until the end of the string if there are more key/values to come.

@francisrstokes
Copy link
Owner

To get concrete and start putting this into the docs - I think we can create a new folder in the project directory for cookbook, with an index.md that gives a brief introduction to the idea (more real world examples that help people get to grips with parser combinators), and then link off to the different sections.

Then each example can be a markdown file, which introduces the parser, shows the code, and gives a breif overview of any concepts involved (if chain is used for example, why, and what does it do).

Would you be up for kicking that off with a PR @jeffrose? We can work together on the intro page and formatting and all that kind of stuff.

@jeffrose
Copy link
Contributor Author

jeffrose commented Dec 7, 2019

@francisrstokes, sounds good. I am currently out of town, but when I get back I will open a PR.

@IMax153
Copy link
Contributor

IMax153 commented Dec 7, 2019

@francisrstokes @jeffrose I would also be interested in contributing examples. I don’t want to steal Jeff’s thunder though, so please let me know how I can help. Perhaps I could work through some other examples for the library? Is there anything you are interested in showcasing?

@IMax153
Copy link
Contributor

IMax153 commented Dec 7, 2019

Two issues I continually ran across when I was trying to use this library for parsing Lua were operator precedence and left-recursion.

Perhaps I could look into how existing work could be used to handle those scenarios? I know it is notoriously difficult to handle these situations with top-down parsing, but it would be an interesting experiment.

@francisrstokes
Copy link
Owner

@IMax153 Indeed - they are some of the trickiest topics in parsing, and ones people tend to bump into pretty quickly. Having a reference for those would be great.

@IMax153
Copy link
Contributor

IMax153 commented Dec 9, 2019

@francisrstokes So I created an example of parsing binary expressions that supports associativity and precedence (see below). Is your vision for these examples to be markdown documents that showcase how the library can be utilized to accomplish these tasks? Or would you prefer I submit a PR to add this example to the examples directory of the project?

import util from 'util';

import {
  between,
  char,
  choice,
  digits,
  optionalWhitespace,
  many,
  many1,
  recursiveParser,
  sequenceOf,
} from "../..";

// These parsers are based heavily on the work documented at
// https://blog.jcoglan.com/2017/07/07/precedence-and-associativity-in-recursive-descent/

// Utility method to print out the entire array of parsed expressions to the console
const printResult = array =>
  console.log(util.inspect(array, { maxArrayLength: null, depth: null }));

const whitespaceSurrounded = parser =>
  between(optionalWhitespace)(optionalWhitespace)(parser);

const betweenParentheses = parser =>
  between(whitespaceSurrounded(char('(')))(whitespaceSurrounded(char(')')))(parser);

const plus = char('+');
const minus = char('-');
const times = char('*');
const divide = char('/');

const binaryExpression = op => parser =>
    sequenceOf([
      whitespaceSurrounded(parser),
      many1(sequenceOf([whitespaceSurrounded(op), whitespaceSurrounded(parser)]))
    ])
    .map(([initialTerm, expressions]) =>
      // Flatten the expressions
      [initialTerm, ...expressions].reduce((acc, curr) =>
        // Reduce the array into a left-recursive tree
        Array.isArray(curr) ? [curr[0], acc, curr[1]] : curr
      )
    );

const factor = recursiveParser(() => choice([digits, betweenParentheses(expression)]));
const term = recursiveParser(() => choice([multiplicationOrDivision, factor]));
const expression = recursiveParser(() => choice([additionOrSubtraction, term]));
const additionOrSubtraction = binaryExpression(choice([plus, minus]))(term);
const multiplicationOrDivision = binaryExpression(choice([times, divide]))(factor);

// Handle precedence of multiplication and division over addition and subtraction
printResult(many(expression).run('9 + 5 - 4 * 4 / 3').result);
// Result: [ [ '-', [ '+', '9', '5' ], [ '/', [ '*', '4', '4' ], '3' ] ] ]

// Handle precedence of expressions in parentheses
printResult(many(expression).run('9 + (5 - 4) * (4 / 3)').result)
// Result: [ [ '+', '9', [ '*', [ '-', '5', '4' ], [ '/', '4', '3' ] ] ] ]

@francisrstokes francisrstokes changed the title Cookbook Discussion: Cookbook Dec 10, 2019
jeffrose added a commit to jeffrose/arcsecond that referenced this issue Dec 11, 2019
@IMax153
Copy link
Contributor

IMax153 commented Dec 13, 2019

Hey guys - just wanted to check ahead of Jeff’s pull request to create the Cookbook file. Is the example above succinct enough for the cookbook?

@francisrstokes
Copy link
Owner

Hey Max - yes I believe so. I merged Jeff's example this morning, so if you want to make a PR, you can use what is there now in master to add your example.

Two things: I would strip out anything not directly related to your example though, just to cut down on noise - so the util/printResult stuff.
And make sure there is couple of sentences explaining the problem and the approach, to put the code in context.

And of course thanks for taking the interest and time to do this 👍

@crapthings
Copy link

Hello, how to match newline \n?

I want to match something like this

   \n hello \n kitty        \n  test     \n something

the whitespace can be everywhere, use newline \n to sep value

I've tried

const A = require('arcsecond')

const test = A.sepBy(A.char('\n'))(A.between(A.optionalWhitespace)(A.optionalWhitespace)(A.letters))

const result = test.run('big \n list \n of \n                  words')

console.dir(result, { depth: null })

but it won't work

@francisrstokes
Copy link
Owner

Hey @crapthings - it's not working because the newlines you want to use to separate are being caught by the optionalWhitespace. The easy solution is to create an optional whitespace parser that doesn't include newlines. Something like this:

const customWhitespace = A.possibly(A.regex(/^( |\t|\r)*/));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants