✨ Prepare tokenizers for `stringMatching` #3920

dubzzz · 2023-05-29T13:13:03Z

In order to be able to implement a stringMatching arbitrary as requested in #2980, we first need to be able to understand a regex. Understanding a regex can be achieved by tokenizing it.

This first adds a basic tokenizer of regex that will be able to read a regex and translate it into an AST. This AST will be the entry point of our stringMatching. So far our tokenizer performs poorly for squared-bracket or parenthesis expressions and also unicode mode. But work is on-going to full support them.

Category:

Potential impacts:

We initially not wanted to go for regex as they were too rich and thus would have requested lost of stuff to be implemented and carefully check, but as globs were not really designed for string matching topic, we went back to it.

In order to be able to implement a `stringMatching` arbitrary as requested in #2980, we first need to be able to understand a regex. Understanding a regex can be achieved by tokenizing it. This first adds a basic tokenizer of regex that will be able to read a regex and translate it into an AST. This AST will be the entry point of our `stringMatching`. So far our tokenizer performs poorly for squared-bracket or parenthesis expresssions. But work is on-going to full support them. --- We initially not wanted to go for regex as they were too rich and thus would have requested lost of stuff to be implemented and carefully check, but as globs were not really designed for string matching topic, we went back to it.

codesandbox-ci · 2023-05-29T13:15:06Z

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

Latest deployment of this branch, based on commit 6f2a9c8:

Sandbox	Source
Vanilla	Configuration

codecov · 2023-05-29T14:12:47Z

Codecov Report

Merging #3920 (6f2a9c8) into main (1287515) will decrease coverage by 0.24%.
The diff coverage is 89.83%.

@@            Coverage Diff             @@
##             main    #3920      +/-   ##
==========================================
- Coverage   95.16%   94.92%   -0.24%     
==========================================
  Files         205      207       +2     
  Lines        5314     5560     +246     
  Branches     1123     1230     +107     
==========================================
+ Hits         5057     5278     +221     
- Misses        241      266      +25     
  Partials       16       16

Flag	Coverage Δ
unit-tests	`94.92% <89.83%> (-0.24%)`	⬇️
unit-tests-14.x-Linux	`?`
unit-tests-16.x-Linux	`94.92% <89.83%> (-0.24%)`	⬇️
unit-tests-18.x-Linux	`94.92% <89.83%> (-0.24%)`	⬇️
unit-tests-latest-Linux	`94.92% <89.83%> (-0.24%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...heck/src/arbitrary/_internals/helpers/ReadRegex.ts	`88.13% <88.13%> (ø)`
.../src/arbitrary/_internals/helpers/TokenizeRegex.ts	`91.40% <91.40%> (ø)`

dubzzz added 7 commits May 29, 2023 11:15

Add some tests

394e832

add some more tests

2429beb

add some more tests

e8a6896

add some more

9306bfb

introduce mode

fd684cb

some more tests and better basic [

fa8cfb1

dubzzz changed the title ~~✨ Prepare tokenizers for stringMatching~~ ✨ Prepare tokenizers for stringMatching May 29, 2023

dubzzz added 5 commits May 29, 2023 13:37

revampt [] extraction

2f28e1b

better support for [] tokens

204996a

better test split

965525a

Merge remote-tracking branch 'origin/main' into string-matching

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

1585f37

versions

6f2a9c8

dubzzz merged commit 768d96d into main May 29, 2023

dubzzz deleted the string-matching branch May 29, 2023 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

✨ Prepare tokenizers for `stringMatching` #3920

✨ Prepare tokenizers for `stringMatching` #3920

dubzzz commented May 29, 2023 •

edited

Loading

codesandbox-ci bot commented May 29, 2023 •

edited

Loading

codecov bot commented May 29, 2023 •

edited

Loading

✨ Prepare tokenizers for stringMatching #3920

✨ Prepare tokenizers for stringMatching #3920

Conversation

dubzzz commented May 29, 2023 • edited Loading

codesandbox-ci bot commented May 29, 2023 • edited Loading

codecov bot commented May 29, 2023 • edited Loading

Codecov Report

✨ Prepare tokenizers for `stringMatching` #3920

✨ Prepare tokenizers for `stringMatching` #3920

dubzzz commented May 29, 2023 •

edited

Loading

codesandbox-ci bot commented May 29, 2023 •

edited

Loading

codecov bot commented May 29, 2023 •

edited

Loading