Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where to look for potential JSON parser improvements #17

Open
turanct opened this issue Dec 12, 2020 · 1 comment
Open

Where to look for potential JSON parser improvements #17

turanct opened this issue Dec 12, 2020 · 1 comment

Comments

@turanct
Copy link
Collaborator

turanct commented Dec 12, 2020

1. whitespace parser

zeroOrMore(satisfy(isCharCode([0x20, 0x0A, 0x0D, 0x09])))

it's potentially faster to do takeWhile on the stream, with the same predicate, skipping the zeroOrMore combinator and the satisfy.

https://github.com/mathiasverraes/parsica/blob/main/src/JSON/JSON.php#L154-L158

2. between

we rely heavily on between, which is based on sequence and bind. If we would find the tiniest speed improvement in those, we would make the parser a lot faster.

https://github.com/mathiasverraes/parsica/blob/main/src/JSON/JSON.php#L96-L103

3. sepby

we rely on this function often in the JSON parser too. It is built in terms of sepBy1 which is written in a "readable" way, but not a really efficient way:

function sepBy1(Parser $separator, Parser $parser): Parser
{
    $prepend = fn($x) => fn(array $xs): array => array_merge([$x], $xs);
    $label = $parser->getLabel() . ", separated by " . $separator->getLabel();
    return pure($prepend)->apply($parser)->apply(many($separator->sequence($parser)))->label($label);
}
  • The prepend uses array_merge to prepend a single element, could probably be faster with array_unshift
  • although the applicative is really readable here, it's also a complex operation under the hood, and like the sequence and bind the tiniest improvement here would probably make a big difference

https://github.com/mathiasverraes/parsica/blob/main/src/JSON/JSON.php#L99-L102
https://github.com/mathiasverraes/parsica/blob/main/src/combinators.php#L513-L519

@ToonGagor
Copy link

ToonGagor commented Jan 2, 2021

Another improvement would be the string parser:

Screenshot 2021-01-02 at 16 19 57

Screenshot 2021-01-02 at 16 21 10

Screenshot 2021-01-02 at 16 20 53

It's faster to parse a bunch of nested objects, than it is to parse a list of simple strings.

https://github.com/mathiasverraes/parsica/blob/ddefc569877a1d9527b94ce5d833d2af0aa26d6f/src/JSON/JSON.php#L180-L202

We use zeroOrMore in combination with choice, which is a slow process of trying parsers and moving on to the next one. Maybe we can make this faster with the takeWhile stream method or something like it.

mhsdesign added a commit to mhsdesign/parsica that referenced this issue Nov 21, 2021
related: parsica-php#17
inspired by this regex:
```
/^
  "[^"\\]*              # double quoted strings with possibly escaped double quotes
    (?:
      \\.               # escaped character (quote)
      [^"\\]*           # unrolled loop following Jeffrey E.F. Friedl
    )*
  "
/x
```
i implemented an unrolled loop in combination with takeWhile. (the last use of the deprecated zeroOrMore was also removed)
this leads to a big performance boost from ca ~250%(without xdebug) - ~330%(with xdebug) - both without opcache.

before:
bench_json_encode: 4.133μs
bench_Parsica_JSON: **4.109ms**
bench_basemax_jpophp: 432.128μs

after:
bench_json_encode: 3.744μs
bench_Parsica_JSON: **1.662ms**
bench_basemax_jpophp: 401.154μs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants