Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config parser (and loader) #4744

Draft
wants to merge 18 commits into
base: master
Choose a base branch
from
Draft

Config parser (and loader) #4744

wants to merge 18 commits into from

Conversation

bentsherman
Copy link
Member

@bentsherman bentsherman commented Feb 14, 2024

Close #2723

This PR explores one possible solution to the config overhaul by implementing a custom parser for config files, using a similar approach as #4613.

The linked issue initially proposed adopting a new config format. However, most of the proposed alternatives do not have all of the capabilities that we need out of the box or have their own downsides. I think the actual problem is that the config file has been stretched to cover use cases for which it wasn't originally designed.

Between this PR, the module config (#4510), and moving the params definition from the config file to the params schema, I think this will relieve enough pressure from the config file that we can keep it.

This custom parser enforces a much simpler syntax:

  • Only three statements are allowed: assignments, blocks, and config inclusion.
  • The left-hand side of an assignment can only be a path expression (foo.bar.baz ...).
  • The right-hand side can be an arbitrary expression, including closures.
  • Within a closure expression, arbitrary Groovy code is still allowed, although right now there is only a minimal subset in the parser.

As a result, the config parser is much simpler and the code is easier to read. There is no more dependence on metaclasses, instead there is a simple builder DSL for executing a config file and producing a map.

The vast majority of existing configuration is still supported, with only a few minor changes needed:

  • Include statements are sometimes wrapped in a try-catch to prevent a missing config file from failing the pipeline. For this we can either provide both includeConfig and requireConfig, or demote the error to a warning when the config file is remote

  • Custom functions like check_max are no longer supported. Instead, I suggest that we import the lib directory into config files as we do for scripts, so that these helper functions can simply be defined there

  • Conditional statements are no longer supported. With module config they should no longer be needed

  • Params should be defined only in nextflow_schema.json to prevent issues like Allow custom configs params to be parsed before nextflow.config #2662 where a param is referenced before params are fully resolved. More of a best practice than a hard requirement. We can still support params defined in the config (likely with a warning), and overriding param default values in a profile is still a valid use case.

All of these issues can be resolved independently of this PR, in case we need to smooth the migration process.

It is still possible to execute arbitrary code with side effects, just not as easily as before. We can't really avoid this if we want to support dynamic expressions:

foo.bar = "${println 'pwned'}"

TODO:

  • add custom lazy resolution of params
  • review subset of Groovy that is supported within closures
  • support lib directory for using helper functions in config files
  • explore Java security manager to prevent side effects in dynamic strings

Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Copy link

netlify bot commented Feb 14, 2024

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit a6d89cd
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/66157b72ef7761000824c06f
😎 Deploy Preview https://deploy-preview-4744--nextflow-docs-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@bentsherman bentsherman changed the title Add custom config parser Config parser Feb 15, 2024
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
@bentsherman
Copy link
Member Author

Research updates:

  • Groovy compiler provides an interface to use a custom parser (in order to ease the transition from Antlr2 to Antlr4), this can be used to inject our custom parser without much hassle
  • Config schema effort can be pursued independently of the parser, see Config schema #4201
  • Might be able to use the Java security manager to prevent execution of unsafe code in the config file such as file I/O, subprocesses

@bentsherman
Copy link
Member Author

Here's an example to illustrate how the config parser "executes" the config file. Given this config:

foo.bar = 'hello'

foo {
  bar = 'hello'
}

includeConfig 'foobar.config'

The parser transforms this code into the following:

assign(['foo', 'bar'], 'hello')

block('foo', {
  assign(['bar'], 'hello')
}

includeConfig('foobar.config')

Basically each statement is translated to a DSL method (see ConfigDsl). This "hidden DSL" allows us to control how each statement is executed. In other words, the config parser + hidden DSL implement a simple interpreter for the Nextflow config language, separating the config language from the config execution.

I think this is a good model for how we might separate the Nextflow language from the runtime. Instead of hooking directly into Groovy methods and relying on Groovy MOP, metaclass, etc, we can introduce a hidden DSL which allows us to better control how a given statement (e.g. invoking a process in a workflow) is executed by the runtime.

@bentsherman
Copy link
Member Author

Java security manager is being removed. I will keep an eye out for other sandboxing techniques, but in the meantime, a simple solution is to provide a "safe" execution mode where assignment expressions are wrapped in string literals.

@bentsherman bentsherman changed the title Config parser Config parser (and loader) Mar 23, 2024
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
@ewels
Copy link
Member

ewels commented Apr 7, 2024

Really like the way that this is going, great work 👏🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Evolution of Nextflow configuration file
2 participants