[Benchmark] Excessive (and expensive) backtracking #483

milseman · 2022-06-15T01:00:08Z

let pattern = "^ +A"
let regex1 = try! NSRegularExpression(pattern: pattern)
let regex2 = try! Regex(pattern)
let numberOfSpaces = 50000
let testString = String(repeating: " ", count: numberOfSpaces)
let date = Date()
print(regex1.firstMatch(in: testString, range: .init(location: 0, length: numberOfSpaces)) == nil)
print(-date.timeIntervalSinceNow)
print(try! regex2.firstMatch(in: testString) == nil)
print(-date.timeIntervalSinceNow)

milseman · 2022-06-15T16:44:14Z

@rctcwyvrn, can you add this to the benchmark suite you're working on? Lots of opportunities exposed here.

We don't have a compiler IR yet, so we don't want to layer on excess complexity while the execution model is still settling down. But, there's a ton of low-hanging fruit here.

Processor.consume() calculates the remaining size of the input slice. Switching to index(offsetBy:limitedBy:) removes this extra linear measurement, which is what was creating the quadratic factor. Changing this gave me a very easy 7x improvement for this specific benchmark (locally, via package build), which would only improve with input length.

@Azoy, can you try to spin up a PR incorporating this patch and make sure we're test the boundaries correctly?

firstMatch is needlessly spinning up a new executor and processor for every single input position, only to fail right away on the anchor. We should integrate first-match with the processor so it can just reset its state.

This patch gives an an additional 10% improvement on this after the 7x from above from re-using the same executor. It still creates a new processor every time though, avoiding that would give us another 30% beyond the above.

@rctcwyvrn Can you work with me on integrating it? Can you also work with me on crafting targeted benchmarks that demonstrates these prominently, as well as reusing the same processor?

After that, we're getting absolutely killed by ARC, which accounts for ~70% of the remaining time after the above improvements.

Part of this is how we store Characters to compare against in registers, while we should instead store trivial values when possible. There are a few ways to go about this and I'll try to think of a simple way that's unlikely to conflict with future directions. This will also speed up creating a new processor.

We still end up ARCing the executor after the above fixes, and somehow Processor.matchSeq heavily ARCs itself. We also heavily ARC MEProgram and others.

And there are lots of dumb ARC going on for things like regex.initialOptions.semanticLevel inside inner loops, etc., which we can hoist or code around.

@rctcwyvrn, would you be interested in tracking these down with me and fixing them?

Intermediate-term, we should peephole optimize a single-scalar normalization-invariant Character match into a consumeScalar instruction (where the scalar value is encoded into the instruction's payload) followed by a grapheme-boundary check. This is broadly useful and should improve pretty much every regex in practice with verbatim components.

@Azoy and @natecook1000, can you think through where we'd need to check for grapheme cluster boundaries? Where or how could those checks could be elided? And how we could make them faster?

Similarly for verbatim sequences.

We should explore instructions for simple repetitions, especially those around a single-scalar Character. This benchmark doesn't yet demonstrate the need, but hopefully after tackling the other low-hanging fruit this becomes more salient.

We can also exploit the fact that grapheme breaking for . or any newline-excluding class is trivial on ASCII-only inputs.

Similarly, we need much cheaper save point representations and backtracking.

We should switch at some point to prioritize or advantage contiguous validly-encoded UTF-8. That is, we branch once at the top and not during every read. This will require some more design and architecture.

Currently, the low-hanging fruit will drown out the improvements here and this will add complexity, so it makes sense to wait until we tackle those other issues.

After we have an IR, we should do post-dominating terminal analysis so that we can replace consume loops with scanning, eliding needless save points and backtracking.

Similarly for eliding grapheme boundary checks and save points. It makes sense to wait until we settle our execution model a little before doing this.

Similarly for anchor analysis and exiting early for first-match style operations.

milseman · 2022-06-17T16:49:25Z

20x improvement to this micro (nano?) benchmark we extracted from this report: #494

milseman · 2022-06-17T19:42:34Z

A compounding 2x in #495

milseman · 2022-06-17T21:29:41Z

A compounding 2-7x, depending, in #497

ignatz · 2023-11-17T19:56:16Z

Sorry for walking into this old thread, I'm happy to move the conversation to other channels, I'm just not entirely sure where would be best and this thread came up when looking for known issues.

Context: I'm evaluating Swift on the server-side (linux) and I wrote a simple program. I was very excited to discover the RegexBuilder APIs and got off the ground running quickly. However, I noticed that the regexes were noticeably slower than my prior art. To be specific, for a simple toy-case like

let msg =
  "FLRD0056B>OGFLR,qAS,LSZI2:/215553h4730.50N\00757.08En000/000/A=001975 !W56! id3ED0056B -019fpm +0.0rot 22.5dB -9.0kHz gps1x1"
let re = /(?<time>\d+)h/
let N = 1000000
for i in 0..<N {
  let match = try? re.firstMatch(in: msg)
  match?.time
}

built and run in release mode:

$ swift build -c release && /usr/bin/time ./.build/release/parser
Building for production...
[1/1] Compiling plugin GenerateManual
Build complete! (0.11s)
5.67user 0.00system 0:05.69elapsed 99%CPU (0avgtext+0avgdata 16640maxresident)k
0inputs+0outputs (0major+634minor)pagefaults 0swaps

It takes 5.7 seconds, which compared to other runtimes/re-engines

swift	        5.69	1
dart native	0.29	19.62068966
dart jit	0.4	14.225
kotlin	        0.17	33.47058824
python	        0.4	14.225

was quite a bit slower. Arguably, the performance of the standard library regex-engine doesn't say much about the language performance itself (and vice versa) but the factors were certainly large than what I had expected and larger than between the tested alternatives.

rctcwyvrn mentioned this issue Jun 15, 2022

Reuse the executor in firstMatch #489

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Benchmark] Excessive (and expensive) backtracking #483

[Benchmark] Excessive (and expensive) backtracking #483

milseman commented Jun 15, 2022

milseman commented Jun 15, 2022

milseman commented Jun 17, 2022

milseman commented Jun 17, 2022

milseman commented Jun 17, 2022

ignatz commented Nov 17, 2023 •

edited

[Benchmark] Excessive (and expensive) backtracking #483

[Benchmark] Excessive (and expensive) backtracking #483

Comments

milseman commented Jun 15, 2022

milseman commented Jun 15, 2022

milseman commented Jun 17, 2022

milseman commented Jun 17, 2022

milseman commented Jun 17, 2022

ignatz commented Nov 17, 2023 • edited

ignatz commented Nov 17, 2023 •

edited