New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parser: simplify array expressions #9243
Conversation
parsing/parser.mly
Outdated
-> array_dim * (arg_label * expression) list; | ||
name: | ||
Lexing.position * Lexing.position -> 'dot -> bool -> paren_kind -> array_dim | ||
-> Longident.t Location.loc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some documentation of what those two functions are doing could help (we can sort of guess, but only sort of). Maybe just giving an example expressed with concrete syntax instead of AST fragments? If I understand correctly, index
builds the argument to the indexing function, and name
is the name of the indexing function. (This documentation suggests that maybe name
is more fundamental and should be placed first in the definition?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added some documentation and examples.
parsing/parser.mly
Outdated
[Nolabel, arr; | ||
Nolabel, ghexp(Pexp_array coords); | ||
Nolabel, newval])) | ||
let array_name loc _ assign paren_kind n = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that the 'dot
parameter is ignored here means that it only makes sense for one of your two instances. Maybe it should not be part of the record definition, and instead dotop_family
should not be a single family, but families indexed over the choice of dot? (That said, I'm not sure how to factor the grammar with this different representation, as currently dot
is chosen by array_{get,set}
.)
parsing/parser.mly
Outdated
@@ -2130,6 +2105,30 @@ let_pattern: | |||
{ $1 } | |||
; | |||
|
|||
%inline array_set(dot, left, mid, right): | |||
| simple_expr dot left mid right LESSMINUS expr | |||
{ $1, $2, $3, $4, Some $7 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to use named parameters instead of positions, especially for $7
.
I think it is weird that there are four parameters that are always used in the same way in the grammar. Could they be packed into a single symbol that returns a tuple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difficulty is that the other function in the cartesian product would need to produce the symbol variants, rather than use there sub symbol directly. There might be a better characterization, but it doesn't seem that straightforward.
parsing/parser.mly
Outdated
} | ||
; | ||
%inline all_array_expr(op, dot, mid): | ||
| op(dot, LPAREN { Paren }, mid, RPAREN) { $1 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way to approach this could be
%inline parens(mid): | LPAREN v=mid RPAREN { (Paren, v) }
%inline all_array_expr(op, dot, mid)
| op(dot, parens(mid)) { $1 }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current state of the implementation is good, thanks!
Changes
Outdated
@@ -139,6 +139,9 @@ Working version | |||
- #9211, #9215: fix Makefile dependencies of compilerlibs archives and dynlink | |||
(Gabriel Scherer, review by Vincent Laviron and David Allsopp) | |||
|
|||
-#9243, simplify parser rules for array indexing operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a space is missing before the PR number
2398838
to
a841284
Compare
Note: the increase of the parser size is unexpected, I will investigate it before merging. |
Any news? |
a841284
to
c840be6
Compare
I did not succeed in decreasing the change in the size of the generated parse with the initial code. However, simplifying the grammar by removing higher-order rule worked better. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good except for a typo in a comment and a small question.
parsing/parser.mly
Outdated
|
||
For instance, for builtin arrays, if Clflags.unsafe is set, | ||
* [ a.[index] ] => [String.unsafe_get] | ||
* [ a.{x;y} <- 1 ] => [ Bigarray.Array2.unsafe_set] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* [ a.{x;y} <- 1 ] => [ Bigarray.Array2.unsafe_set] | |
* [ a.{x,y} <- 1 ] => [ Bigarray.Array2.unsafe_set] |
parsing/parser.mly
Outdated
let assign = if assign then "<-" else "" in | ||
let mid = match n with | ||
| Many -> ";.." | ||
| One | Two | Three -> "" in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks wrong. Surely the Two
and Three
cases should return ";.."
?
If I understand correctly, the user_index
function below means that they could simply be assert false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the problem with impossible cases, I am not sure if they ought to be "", ";" and ";;", or ";.." if they were possible. An assert false
will be simpler and clearer indeed.
c840be6
to
b271d3e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved again.
It's been 12 hours and the Travis jobs are still pending. Is it finally the time to move away from Travis completely? (cc @dra27 who knows about this stuff) |
The tests finished 20 hours after the last push. Merging. |
I guess this is "continuous integration" for some value of "continuous"... When I was a wee lad, I was told about batch processing, where you submit your computing job on punched cards or tape and get a listing with "Syntax error" on it the day after. Plus ça change, plus c'est la même chose, as they say in English. |
Currently, there are 27 productions related to array indexing in the parser, corresponding to the cartesian product:
(3 basic rule: get + set + missing parentheses) * (3 parenthesis kind) * (3 indexing families)
Most of those productions are closely related to each other. This PR proposes to replaces those 27 productions by
2
main production (one for get and the other for set), and 4 helper rules (2 for for the first factor of the cartesian product above and one for each of the remaining two factors).Along the way, the helper functions for building array expressions have been reorganized to make it clearer that constructing an array indexing function is mostly a question of finding the right name, and determining how to transform the index expression.