Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formal grammar and parser #4613

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from
Draft

Formal grammar and parser #4613

wants to merge 16 commits into from

Conversation

bentsherman
Copy link
Member

@bentsherman bentsherman commented Dec 21, 2023

This PR adds a custom parser for Nextflow scripts and uses it instead of the Groovy parser. The Nextflow parser is generated from an ANTLR grammar, which currently contains a subset of Groovy syntax with some additional rules for processes, workflows, and include statements.

To bypass the Groovy parser, we invoke the GroovyShell with a placeholder script that simply wraps the actual script in a string expression. Then in an AST transform, we extract the string value, parse it with the Nextflow parser, and insert the resulting Groovy AST into the placeholder script.

This approach allows us to control the parsing process -- including the syntax and detecting syntax errors -- while still leveraging the Groovy compiler for execution. In other words, we can define whatever grammar we want, as long as we can "compile" it into a Groovy AST. If you look at AstBuilder, you'll see that it converts processes / workflows / includes into the same Groovy AST structures produced by NextflowDSLImpl.

The hack I'm doing to make this work seems fine but a more robust solution might be to use internal Groovy classes in such a way that allows us to pass our AST directly to the Groovy compiler, instead of going through the GroovyShell and AST transforms. That will take time to understand which components we'll need to rip out. But the advantage is that we don't have to implement our own compiler backend.

I developed this code in a separate project and only just now incorporated it into Nextflow. I haven't tested extensively so there are likely some issues around the edges. Just wanted to finish a basic prototype before the holidays.

TODO:

  • Figure out how to use the generated lexer/parser without manual copy
  • Bring AstBuilder to parity with NextflowDSLImpl
  • Restore backwards compatibility
  • Pass unit tests and integration tests

Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
…cess inputs/outputs

Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Copy link

netlify bot commented Dec 21, 2023

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 3f8d353
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/6616179ab378ab00089d676d

@bentsherman
Copy link
Member Author

Some sweets-infused holiday thoughts... right now I am just producing the same AST expected by the runtime to keep this PR as simple as possible. But, like I said, we can produce whatever Groovy AST we want, so we could produce Groovy code that more effectively enables new features like static types, default arguments, etc.

The main example I'm thinking of is the annotation API (see nextflow-io/rnaseq-nf#24). I originally designed it as user-facing code, but it could also be an intermediate representation that is produced by the parser. If we "compile" the process and workflow definitions to actual function definitions, then we can more easily leverage the Groovy type checking.

This is just an example. We may not need the annotation API exactly, but it would be good to explore alternative AST representations, perhaps in a second iteration.

Signed-off-by: Ben Sherman <bentshermann@gmail.com>
@bentsherman
Copy link
Member Author

Another aside... GraalVM implements an AST model for every language that it supports. Here is the Graal Python AST source code.

So we could also have the parser produce a Graal/Python AST and thereby allow the pipeline code to use Python semantics instead of Groovy semantics.

We would need to design a DSL syntax for processes and workflows that would make sense with Python. Likely it would look more like Snakemake. Using native Python syntax (i.e. functions with decorators) is also an option but would likely be more verbose. We would still need to implement our own IDE tooling, but centered around Python syntax instead of Groovy syntax.

The point is, if we rely on the semantics (and compiler backend) of an existing language, it doesn't have to be Groovy. It could easily be any language supported by GraalVM.

Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
@bentsherman bentsherman changed the title Nextflow parser Formal grammar and parser Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants