Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exact number of empty lines for document start/end #340

Open
wookietreiber opened this issue Oct 30, 2020 · 5 comments · May be fixed by #393
Open

exact number of empty lines for document start/end #340

wookietreiber opened this issue Oct 30, 2020 · 5 comments · May be fixed by #393

Comments

@wookietreiber
Copy link

I would like to enforce having exactly one empty line after document start and exactly one empty line before document end, i.e.:

good

---

foo: bar

...

bad

---
foo: bar
...
---


foo: bar


...

options

I have already forked and wanted to start hacking away, but then I realized from reading through the documentation again and looking at the code and tests, that there are actually two places where this feature could be added:

  1. directly with document-start and document-end

    I imagined something the likes of a new option empty-line, allowing either a bool or the string whatever, defaulting to whatever (compatibility). The behavior would have been:

    • the empty line option applies only if present: true
    • if true enforce a single empty line after document start / before document end, respectively
    • if false the empty line is forbidden
    • if whatever, don't care

    It would also be possible to turn these into an int to enforce an exact number of empty lines.

  2. with empty-lines which already contains max-start and max-end

    The feature could also be implemented with empty-lines and two new options called min-start and min-end.

guidance

Before implementing any of these, I would like to ask your guidance about how to implement this. In my opinion, adding this to document-start and document-end would be the most intuitive, but this could potentially conflict with max-start and max-end, while the conflict could also be delegated to the user. Please advise.

@adrienverge
Copy link
Owner

Hello Christian,

Thanks for your crystal-clear message.

  • I think empty-lines is not really related, because max-start and max-end refer to the file start / end, not the document start / end (a file can contain multiple YAML documents, and the --- can be on line 2 for instance).

                     ← empty-lines max-start
    ---
                     ← new rule or option
    foo:bar
                     ← new rule or option
    ...
                     ← empty-lines max-end
    
  • document-start and document-end seem the best way to go.

Your proposal with empty-line: int is cool, but I don't see how a user could enforce a range of values (e.g. I want at least 1 empty line, but not more than 2).

Maybe we could define options like this (inspired by rules braces, brackets, colons and commas)?

document-start:
  present: true
  min-empty-lines-after: 1
  max-empty-lines-after: 2

@wookietreiber
Copy link
Author

@adrienverge Thanks for the explanation! Your thoughts about [min|max]-empty-lines-[after|before] are very reasonable, so I'll get started on the PR for that.

@wookietreiber
Copy link
Author

@adrienverge I need your advice again :-)

At the moment, there are three types of rules, i.e. TYPE = 'comment|line|token'. For comparison, yamllint/rules/document_start.py uses token to check for DocumentStartToken, while yamllint/rules/empty_lines.py uses line to count empty lines.

It seems that line counting can't be done with the token-based parser, because it only goes up to nextnext and we'd need to peek further than that to be able to count the empty lines and, from looking at the pyyaml documentation, there does not seem to be a newline token that could be used.

To support the new [min|max]-empty-lines-[after|before] options, the diff might be a bit bigger than simply adding a little bit new logic to the existing rules. I can think of the following variants:

  1. Add a new rule TYPE for document-start and document-end, i.e. a mixture of both token-based parsing to check for DocumentStartToken and line-based parsing to check on the number of empty lines encountered.
  2. Add the new [min|max]-empty-lines-[after|before] options to empty-lines instead of document-start and document-end.
  3. Split document-start and document-end in two parts/rules, one that checks for the token and one that checks for the empty lines.

While still getting to know the code-base, I'm not sure about the implications and practicability of these variants. How do you want to proceed?

@adrienverge
Copy link
Owner

Hello @wookietreiber, indeed that's an interesting point.

  • It seems that line counting can't be done with the token-based parser, because it only goes up to nextnext and we'd need to peek further than that to be able to count the empty lines and, from looking at the pyyaml documentation, there does not seem to be a newline token that could be used.

    Are you sure? I did a simple test with:

    ---
    
    string
    

    ... and tokens seem to be:

    0:0 --> 0:0	StreamStartToken(encoding=None)
    0:0 --> 0:3	DocumentStartToken()
    2:0 --> 2:0	BlockMappingStartToken()
    
  • If it doesn't work, the best way to go is your point 1. What about a refactor that would allow any rule to be both/any of comment|line|token? For example:

     ID = 'document-start'
    -TYPE = 'token'
     CONF = {'present': bool}
     DEFAULT = {'present': True}
     
     
    -def check(conf, token, prev, next, nextnext, context):
    +def check_line(conf, line):
    +    # do stuff
    +
    +
    +def check_token(conf, token, prev, next, nextnext, context):
         if conf['present']:
             if (isinstance(prev, (yaml.StreamStartToken,
                                   yaml.DocumentEndToken,
    

@wookietreiber
Copy link
Author

It seems that line counting can't be done with the token-based parser, because it only goes up to nextnext and we'd need to peek further than that to be able to count the empty lines and, from looking at the pyyaml documentation, there does not seem to be a newline token that could be used.

Are you sure? I did a simple test with:

---

string

... and tokens seem to be:

0:0 --> 0:0	StreamStartToken(encoding=None)
0:0 --> 0:3	DocumentStartToken()
2:0 --> 2:0	BlockMappingStartToken()

@adrienverge So, if I understand this correctly, BlockMappingStartToken(2:0 --> 2:0) refers to the line where string starts, but doesn't actually include the string itself. There is no token for an empty line, because 1:0 is mentioned nowhere. I could theoretically infer from the line difference in DocumentStartToken(0:0 --> 0:3) and WhateverToken(2:0 --> 2:0) that there is one empty line at 1:0?

@wookietreiber wookietreiber linked a pull request Jul 5, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants