Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Benchmark] Excessive (and expensive) backtracking #483

Open
milseman opened this issue Jun 15, 2022 · 5 comments
Open

[Benchmark] Excessive (and expensive) backtracking #483

milseman opened this issue Jun 15, 2022 · 5 comments

Comments

@milseman
Copy link
Collaborator

From the forums:

let pattern = "^ +A"
let regex1 = try! NSRegularExpression(pattern: pattern)
let regex2 = try! Regex(pattern)
let numberOfSpaces = 50000
let testString = String(repeating: " ", count: numberOfSpaces)
let date = Date()
print(regex1.firstMatch(in: testString, range: .init(location: 0, length: numberOfSpaces)) == nil)
print(-date.timeIntervalSinceNow)
print(try! regex2.firstMatch(in: testString) == nil)
print(-date.timeIntervalSinceNow)
@milseman
Copy link
Collaborator Author

@rctcwyvrn, can you add this to the benchmark suite you're working on? Lots of opportunities exposed here.

We don't have a compiler IR yet, so we don't want to layer on excess complexity while the execution model is still settling down. But, there's a ton of low-hanging fruit here.


Processor.consume() calculates the remaining size of the input slice. Switching to index(offsetBy:limitedBy:) removes this extra linear measurement, which is what was creating the quadratic factor. Changing this gave me a very easy 7x improvement for this specific benchmark (locally, via package build), which would only improve with input length.

@Azoy, can you try to spin up a PR incorporating this patch and make sure we're test the boundaries correctly?


firstMatch is needlessly spinning up a new executor and processor for every single input position, only to fail right away on the anchor. We should integrate first-match with the processor so it can just reset its state.

This patch gives an an additional 10% improvement on this after the 7x from above from re-using the same executor. It still creates a new processor every time though, avoiding that would give us another 30% beyond the above.

@rctcwyvrn Can you work with me on integrating it? Can you also work with me on crafting targeted benchmarks that demonstrates these prominently, as well as reusing the same processor?


After that, we're getting absolutely killed by ARC, which accounts for ~70% of the remaining time after the above improvements.

Part of this is how we store Characters to compare against in registers, while we should instead store trivial values when possible. There are a few ways to go about this and I'll try to think of a simple way that's unlikely to conflict with future directions. This will also speed up creating a new processor.

We still end up ARCing the executor after the above fixes, and somehow Processor.matchSeq heavily ARCs itself. We also heavily ARC MEProgram and others.

And there are lots of dumb ARC going on for things like regex.initialOptions.semanticLevel inside inner loops, etc., which we can hoist or code around.

@rctcwyvrn, would you be interested in tracking these down with me and fixing them?


Intermediate-term, we should peephole optimize a single-scalar normalization-invariant Character match into a consumeScalar instruction (where the scalar value is encoded into the instruction's payload) followed by a grapheme-boundary check. This is broadly useful and should improve pretty much every regex in practice with verbatim components.

@Azoy and @natecook1000, can you think through where we'd need to check for grapheme cluster boundaries? Where or how could those checks could be elided? And how we could make them faster?

Similarly for verbatim sequences.


We should explore instructions for simple repetitions, especially those around a single-scalar Character. This benchmark doesn't yet demonstrate the need, but hopefully after tackling the other low-hanging fruit this becomes more salient.

We can also exploit the fact that grapheme breaking for . or any newline-excluding class is trivial on ASCII-only inputs.

Similarly, we need much cheaper save point representations and backtracking.


We should switch at some point to prioritize or advantage contiguous validly-encoded UTF-8. That is, we branch once at the top and not during every read. This will require some more design and architecture.

Currently, the low-hanging fruit will drown out the improvements here and this will add complexity, so it makes sense to wait until we tackle those other issues.


After we have an IR, we should do post-dominating terminal analysis so that we can replace consume loops with scanning, eliding needless save points and backtracking.

Similarly for eliding grapheme boundary checks and save points. It makes sense to wait until we settle our execution model a little before doing this.

Similarly for anchor analysis and exiting early for first-match style operations.

@milseman
Copy link
Collaborator Author

20x improvement to this micro (nano?) benchmark we extracted from this report: #494

@milseman
Copy link
Collaborator Author

A compounding 2x in #495

@milseman
Copy link
Collaborator Author

A compounding 2-7x, depending, in #497

@ignatz
Copy link

ignatz commented Nov 17, 2023

Sorry for walking into this old thread, I'm happy to move the conversation to other channels, I'm just not entirely sure where would be best and this thread came up when looking for known issues.

Context: I'm evaluating Swift on the server-side (linux) and I wrote a simple program. I was very excited to discover the RegexBuilder APIs and got off the ground running quickly. However, I noticed that the regexes were noticeably slower than my prior art. To be specific, for a simple toy-case like

let msg =
  "FLRD0056B>OGFLR,qAS,LSZI2:/215553h4730.50N\00757.08En000/000/A=001975 !W56! id3ED0056B -019fpm +0.0rot 22.5dB -9.0kHz gps1x1"
let re = /(?<time>\d+)h/
let N = 1000000
for i in 0..<N {
  let match = try? re.firstMatch(in: msg)
  match?.time
}

built and run in release mode:

$ swift build -c release && /usr/bin/time ./.build/release/parser
Building for production...
[1/1] Compiling plugin GenerateManual
Build complete! (0.11s)
5.67user 0.00system 0:05.69elapsed 99%CPU (0avgtext+0avgdata 16640maxresident)k
0inputs+0outputs (0major+634minor)pagefaults 0swaps

It takes 5.7 seconds, which compared to other runtimes/re-engines

swift	        5.69	1
dart native	0.29	19.62068966
dart jit	0.4	14.225
kotlin	        0.17	33.47058824
python	        0.4	14.225

was quite a bit slower. Arguably, the performance of the standard library regex-engine doesn't say much about the language performance itself (and vice versa) but the factors were certainly large than what I had expected and larger than between the tested alternatives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants