Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parser: simplify array expressions #9243

Merged
merged 1 commit into from Jan 30, 2021

Conversation

Octachron
Copy link
Member

Currently, there are 27 productions related to array indexing in the parser, corresponding to the cartesian product:

(3 basic rule: get + set + missing parentheses) * (3 parenthesis kind) * (3 indexing families)

Most of those productions are closely related to each other. This PR proposes to replaces those 27 productions by 2 main production (one for get and the other for set), and 4 helper rules (2 for for the first factor of the cartesian product above and one for each of the remaining two factors).

Along the way, the helper functions for building array expressions have been reorganized to make it clearer that constructing an array indexing function is mostly a question of finding the right name, and determining how to transform the index expression.

parsing/parser.mly Show resolved Hide resolved
-> array_dim * (arg_label * expression) list;
name:
Lexing.position * Lexing.position -> 'dot -> bool -> paren_kind -> array_dim
-> Longident.t Location.loc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some documentation of what those two functions are doing could help (we can sort of guess, but only sort of). Maybe just giving an example expressed with concrete syntax instead of AST fragments? If I understand correctly, index builds the argument to the indexing function, and name is the name of the indexing function. (This documentation suggests that maybe name is more fundamental and should be placed first in the definition?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some documentation and examples.

[Nolabel, arr;
Nolabel, ghexp(Pexp_array coords);
Nolabel, newval]))
let array_name loc _ assign paren_kind n =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that the 'dot parameter is ignored here means that it only makes sense for one of your two instances. Maybe it should not be part of the record definition, and instead dotop_family should not be a single family, but families indexed over the choice of dot? (That said, I'm not sure how to factor the grammar with this different representation, as currently dot is chosen by array_{get,set}.)

@@ -2130,6 +2105,30 @@ let_pattern:
{ $1 }
;

%inline array_set(dot, left, mid, right):
| simple_expr dot left mid right LESSMINUS expr
{ $1, $2, $3, $4, Some $7 }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to use named parameters instead of positions, especially for $7.

I think it is weird that there are four parameters that are always used in the same way in the grammar. Could they be packed into a single symbol that returns a tuple?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difficulty is that the other function in the cartesian product would need to produce the symbol variants, rather than use there sub symbol directly. There might be a better characterization, but it doesn't seem that straightforward.

}
;
%inline all_array_expr(op, dot, mid):
| op(dot, LPAREN { Paren }, mid, RPAREN) { $1 }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to approach this could be

%inline parens(mid): | LPAREN v=mid RPAREN { (Paren, v) }

%inline all_array_expr(op, dot, mid)
| op(dot, parens(mid)) { $1 }

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current state of the implementation is good, thanks!

Changes Outdated
@@ -139,6 +139,9 @@ Working version
- #9211, #9215: fix Makefile dependencies of compilerlibs archives and dynlink
(Gabriel Scherer, review by Vincent Laviron and David Allsopp)

-#9243, simplify parser rules for array indexing operations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a space is missing before the PR number

@Octachron
Copy link
Member Author

Note: the increase of the parser size is unexpected, I will investigate it before merging.

@xavierleroy
Copy link
Contributor

Note: the increase of the parser size is unexpected, I will investigate it before merging.

Any news?

@Octachron
Copy link
Member Author

I did not succeed in decreasing the change in the size of the generated parse with the initial code. However, simplifying the grammar by removing higher-order rule worked better.

Copy link
Member

@damiendoligez damiendoligez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good except for a typo in a comment and a small question.


For instance, for builtin arrays, if Clflags.unsafe is set,
* [ a.[index] ] => [String.unsafe_get]
* [ a.{x;y} <- 1 ] => [ Bigarray.Array2.unsafe_set]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* [ a.{x;y} <- 1 ] => [ Bigarray.Array2.unsafe_set]
* [ a.{x,y} <- 1 ] => [ Bigarray.Array2.unsafe_set]

let assign = if assign then "<-" else "" in
let mid = match n with
| Many -> ";.."
| One | Two | Three -> "" in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong. Surely the Two and Three cases should return ";.."?
If I understand correctly, the user_index function below means that they could simply be assert false.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the problem with impossible cases, I am not sure if they ought to be "", ";" and ";;", or ";.." if they were possible. An assert false will be simpler and clearer indeed.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved again.

@gasche
Copy link
Member

gasche commented Jan 30, 2021

It's been 12 hours and the Travis jobs are still pending. Is it finally the time to move away from Travis completely? (cc @dra27 who knows about this stuff)

@gasche
Copy link
Member

gasche commented Jan 30, 2021

The tests finished 20 hours after the last push. Merging.

@gasche gasche merged commit e30cca3 into ocaml:trunk Jan 30, 2021
@xavierleroy
Copy link
Contributor

The tests finished 20 hours after the last push.

I guess this is "continuous integration" for some value of "continuous"...

When I was a wee lad, I was told about batch processing, where you submit your computing job on punched cards or tape and get a listing with "Syntax error" on it the day after. Plus ça change, plus c'est la même chose, as they say in English.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants