Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Check for unpaired braces/brackets/parentheses #599

Open
AlexSCFraser opened this issue Sep 26, 2023 · 3 comments
Open

Comments

@AlexSCFraser
Copy link

I was having trouble with a CI pipeline this weekend and realized that my problem was that I had commented out only one part of a multi-line command in my CI YAML file. It would have easily been caught by a simple check for unpaired parentheses (or alternately unbalanced single quotes, but that's a character that sometimes gets used singly normally). Looking at the docs, it doesn't look like this is a configuration option that can be enabled?

I think it would be a useful feature, simply because it's so common to embed short commands from other languages to perform setup tasks in a YAML file and I would imagine the usual case of unbalanced braces/brackets/parentheses present in a YAML file is actually an error from an embedded scripting syntax.

I'm not sure if it would make sense be enabled by default, since it's a bit out of scope for pure YAML. Maybe enabled by default as a warning, but not an error? At the very least, it would make sense as an opt-in rule for people to turn on via the config file.

@adrienverge
Copy link
Owner

Hello Alex,

Do you have an example of such a YAML snippet (a minimalist one), and what you would expect from yamllint?

If I understand correctly (but I'm not sure), what you describe would be a syntax error, and reported by yamllint as such.

@AlexSCFraser
Copy link
Author

Here is a minimal example of essentially what caused my failed CI run.

---  # minimal example of unbalanced parentheses - SYNTAX ERROR

name:
  -run: >
    Rscript -e 'install.packages("BiocManager")'
    # -e 'BiocManager::install(c("remotes",
                               "rsaccharis"))' #  Should cause SYNTAX ERROR on this line

The problem I had was that the final line of embedded R syntax inside the rscript command was not commented out and caused a vague error it took me some time to find the root cause of.

According to yamllint, this is valid YAML. I assume it doesn't take issue with the extra close parentheses because it's simply interpreted as a generic string which is allowed to be anything.

My desired behaviour would be that the final line is registered as a syntax error at either the first or both close parentheses characters , ')', because they were not preceded by an equal number of open parentheses character, '('. The open and close parentheses can be separated by line breaks, however, since a long list might be split across lines.

Were the commented out line to be uncommented, it would no longer be marked as a syntax error, since the open parentheses would be present. The below snippet shows a hypothetical example of correct syntax.

---  # minimal example of balanced parentheses, should not cause an error

name:
  -run: >
    Rscript -e 'install.packages("BiocManager")'
    -e 'BiocManager::install(c("remotes",
                               "rsaccharis"))'

So one solution would be to optionally check inside the identified generic strings (since any embedded code would have to be completely contained under one correctly formatted YAML block of generic string syntax) for unbalanced parentheses/brackets/braces. If it's an optional feature, it could be coded to be character agnostic so a user can specify any pair of distinct start/stop characters meant to be balanced in the config file. Plus if the code is agnostic to what the start and stop characters to be balanced are, it's easy to set it up for {}, [], (), <>, or any other arbitrary pairing a user cares about as long as it uses distinct characters.

In terms of implementation, my understanding is that two popular approaches for linting this kind of balanced syntax are to either use a stack or just simple counting. The stack approaches pushes positions (e.g. line and column number) on the open character and then pops them off with a close character, which then reports an error either when a pop is attempted on an empty stack or the stack has members remaining at the end of the linted scope (Usually end of file, but in this case, when the string ends of the YAML block). The counting approach just assigns positive and negative value to the chars and keeps a running sum and reports and error if the sum is non-zero at the end of scope. The stack approach seems better since you can report the exact position of the offending character and would probably be the first approach I would try.

@adrienverge
Copy link
Owner

Thanks for details, I understand now. You're not talking about unpaired braces/brackets/parentheses in YAML syntax, but rather in the contents of some strings in the YAML document (that happen to be R code, in your example).

Yamllint aims to lint the form of YAML, not the content. We could imagine many new rules to check such parentheses parity, or that a string contains a substring, or that all keys respect a regex, etc. But the purpose of yamllint is to be focused on YAML itself, so I'm not favorable to adding such new content-oriented rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants