Skip to content
This repository has been archived by the owner on Oct 23, 2019. It is now read-only.

Bug in PerlRegExpConverter #38

Open
broudy3 opened this issue Sep 7, 2014 · 2 comments
Open

Bug in PerlRegExpConverter #38

broudy3 opened this issue Sep 7, 2014 · 2 comments

Comments

@broudy3
Copy link

broudy3 commented Sep 7, 2014

I have a problem with following regular expression:
\G(((int(eger)?|bool(ean)?|float|double|real|string|binary|array|object))\s*)

This regular expression gives me different result in phalagner, I assume that the problem is in converting pattern to .net. Pattern is converted to:
\G(?<an0ny_1>((int(?<an0ny_2>eger)?|bool(?<an0ny_3>ean)?|float|double|real|string|binary|array|object))\s*)

I think that after :
\G(?<an0ny_1>(
group name is missing.

The same problem occurs in this regular expression:
/((x)y)/
when I match it against 'xy' I get wrong results:
preg_match('/((x)y)/', 'xy', $matches, null);
$matches[1] == 'x' should be 'xy'
$matches[2] == 'xy' should be 'x'

@proff
Copy link
Contributor

proff commented Sep 15, 2014

not tested well yet...

diff -r cb4f50629489 Phalanger/ClassLibrary/RegExpPerl.cs
--- a/Phalanger/ClassLibrary/RegExpPerl.cs  Thu Sep 11 15:06:26 2014 +0400
+++ b/Phalanger/ClassLibrary/RegExpPerl.cs  Mon Sep 15 23:22:18 2014 +0400
@@ -2265,8 +2265,7 @@
                                             result.Append('>');
                                             continue;
                                         }
-                                        else
-                                        if (i + 2 < perlExpr.Length && perlExpr[i + 2] == ':')
+                                        if (i + 2 < perlExpr.Length && (perlExpr[i + 2] == ':' || perlExpr[i + 2] == '!' || perlExpr[i + 2] == '='))
                                         {
                                             // Pseudo-group, don't count.
                                             --group_number;
@@ -2284,6 +2283,27 @@
                            case 1:
                                 if (ch == '?')
                                     inner_state = 2;
+                                else if (ch == '(')
+                                {
+                                    ++group_number;
+                                    if (i + 1 < perlExpr.Length)
+                                    {
+                                        if (perlExpr[i + 1] != '?')
+                                        {
+                                            ++i;
+                                            result.Append("(?<");
+                                            result.Append(AnonymousGroupPrefix);
+                                            result.Append(group_number);
+                                            result.Append('>');
+                                            continue;
+                                        }
+                                        if (i + 2 < perlExpr.Length && (perlExpr[i + 2] == ':' || perlExpr[i + 2] == '!' || perlExpr[i + 2] == '='))
+                                        {
+                                            // Pseudo-group, don't count.
+                                            --group_number;
+                                        }
+                                    }
+                                }
                                 else if (ch != '(')// stay in inner_state == 1, because this can happen: ((?<blah>...))
                                     inner_state = 0;
                                 break;

@broudy3
Copy link
Author

broudy3 commented Sep 15, 2014

Sorry my mistake I didn't notice that Github changed the first regular expression, the correct one is :
\G(\((int(eger)?|bool(ean)?|float|double|real|string|binary|array|object)\)\s*)

And is converted to:
\G(?<an0ny_1>\((int(?<an0ny_2>eger)?|bool(?<an0ny_3>ean)?|float|double|real|string|binary|array|object)\)\s*)

And after \G(?<an0ny_1>\(( and before int group name is missing, so it should be like this:
\G(?<an0ny_1>\((?<an0ny_2>int(?<an0ny_3>eger)?|bool(?<an0ny_4>ean)?|float|double|real|string|binary|array|object)\)\s*)

I'm right?
Please try to fix also this case, thank you.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants