Skip to content

This has a function to generate a recursive regex for matching curly brackets, and a performance tester for that regex.

Notifications You must be signed in to change notification settings

dehlirious/regexFun

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 

Repository files navigation

Deeply Nested Structure Generator

This repository hosts a collection of PHP tools adept at crafting and evaluating the performance of deeply nested curly regex patterns. It includes facilities for constructing nested regex expressions, gauging the performance repercussions of executing these regex patterns, and generating intricately nested JavaScript functions as test subjects for the regex.

The methodology employs a form of synthetic recursion within regex patterns, a necessary approach since regex engines typically lack genuine recursion capabilities or stack-based memory. This limitation makes the task of matching structures with arbitrary levels of nesting particularly challenging.

Below is an illustration of how layers are added to create increasingly complex nested structures:

{[^{}]*} < One layer
{(?:[^{}]|{[^{}]*})*} < Two layers
{(?:[^{}]|{(?:[^{}]|{[^{}]*})*})*} < Three layers
{(?:[^{}]|{(?:[^{}]|{(?:[^{}]|{[^{}]*})*})*})*} < Four layers
{(?:[^{}]|{(?:[^{}]|{(?:[^{}]|{(?:[^{}]|{[^{}]*})*})*})*})*} < Five layers
{((?:[^{}]|{(?:[^{}]|{[^{}]*})*})*?)}

The basic element of the structure is explained as follows:

{: An opening curly brace.
(?:[^{}]|{[^{}]*})*: A non-capturing group that matches:
    [^{}]: Any character that is not a curly brace.
    |: OR
    {[^{}]*}: A pair of curly braces containing any characters that are not curly braces (no nesting).
}: A closing curly brace.

Through these tools and the outlined method, users can create and examine the intricacies and performance implications of regex patterns dealing with deeply nested structures. Please note: This recursive regex function is more for testing than for use. I have issues with it personally, other than when using the base version

generateNestedJS

A JavaScript function that generates a function with a deeply nested structure, incorporating various control structures like if, while, do, and foreach.

Usage

$testString = generateNestedJS(200);

generateNestedRegex

A PHP function designed to craft regex patterns featuring user-defined levels of nested curly braces.

Usage

$numberOfAdditionalLayers = 5;
$regexPattern = generateNestedRegex($numberOfAdditionalLayers);
echo $regexPattern;

measureRegexPerformance

A PHP function that measures the performance of regex patterns with increasing layers of nesting. It applies each pattern to a test string, measures the time taken for each, and records the fastest, slowest, and average times.

Usage

$testString = "your test string here";
$maxLayers = 10;
$iterationsPerLayer = 100;
$performanceTimings = measureRegexPerformance($testString, $maxLayers, $iterationsPerLayer);
foreach ($performanceTimings as $layers => $timing) {
    echo "Layers: $layers, Fastest Time: {$timing['fastest']}s, Slowest Time: {$timing['slowest']}s, Average Time: {$timing['average']}s\n";
}

Performance Visualizations

This study, conducted using preg_match_all on PHP 8.3, explores the relationship between the depth of nested structures in functions and regex patterns, and their impact on processing times.

Key findings include:

  • Performance Dynamics: When regex complexity exceeds that of function layers, processing tends to be quicker. Conversely, a higher count of function layers paired with fewer regex layers results in slower processing times.
  • Optimal Layer Ratio: Interestingly, regex processing speeds up with more recursive layers, especially noticeable with a high count of function layers.
  • Limitation Observed: Beyond 62 layers, regex matching ceased, indicated by a compilation error due to excessively nested parentheses.

100-Layer Function vs. 40-Layer Regex (50 iterations):

3GITCapture

40-Layer Function vs. 50-Layer Regex (50 iterations):

0GITCapture

This demonstrates that it slopes off near the end, when there are more regex layers than function layers.

400-Layer Function vs. 50-Layer Regex (50 iterations):

1GITCapture

This demonstrates that only having two layers of regex but a large quantity of function layers, takes up a LOT of time.

200-Layer Function vs. 50-Layer Regex (50 iterations):

5GITCapture

2000-Layer Function vs. 50-Layer Regex (50 iterations):

6GITCapture

The visual aids aim to provide a clearer understanding of the dynamics.

Consider these insights carefully when working with nested structures in regex and functions.

About

This has a function to generate a recursive regex for matching curly brackets, and a performance tester for that regex.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages