Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File that only contains contents is emptied #91

Closed
hartzell opened this issue Mar 9, 2023 · 5 comments
Closed

File that only contains contents is emptied #91

hartzell opened this issue Mar 9, 2023 · 5 comments

Comments

@hartzell
Copy link

hartzell commented Mar 9, 2023

I have a file, namespace/collection/meta/runtime.yml that is generated by the Ansible Galaxy collection init step, that starts off like so:

# Collections must specify a minimum required ansible version to upload
# to galaxy
# requires_ansible: '>=2.9.10'

# Content that Ansible needs to load from another location or that has
# been deprecated/removed
# plugin_routing:
#   action:

And continues with only blank or commented lines.

When I run yamlfmt -dstar '**/*yml' from the top of the directory tree, yamlfmt deletes all of the content in the file, resulting in a 0 byte file.

If I manually add a --- separator at the start of the file, then yamlfmt -dstar '**/*yml' does not strip all of the content. It does, however, strip the --- so the next run empties the file.

If I manually add a --- to the file and use this config file:

formatter:
  type: basic
  include_document_start: true

then the file is not emptied.

I see two things that might be addressed:

  • fix the yaml parser/writer so that it doesn't not strip comments from empty files; and
  • change the default for include_document_start to true (this gives me a work around).
@hartzell
Copy link
Author

hartzell commented Mar 9, 2023

This might be related to #74.....

@braydonk
Copy link
Collaborator

braydonk commented Mar 10, 2023

Hi @hartzell, thanks for opening an issue!

This is an unfortunate consequence of the way the yaml library opts to process comments. Comments in yaml.v3 are always associated with some element within the yaml file. Every comment is either a HeadComment, BlockComment, or FootComment. This is how they are preserved; a comment, based on its placement, is associated to a Node in the document and reproduced in the formatted output through their associations in the tree.
In your example, the document is (at least according to the parser) completely empty. It has comments, but because there are no nodes ever processed in your document the comments never get associated to anything and thus are not produced in the output.
This is why adding a --- token does preserve the comment initially. It is a node in the document, and the rest of the content of the file is parsed as a single comment that gets associated to it.

yaml.v3 will remove the document start if it determines there is no document. This might be something I can fix; I can experiment with adjusting this behaviour in the library. I can't say at the moment if it will have any other consequences.

Changing the behaviour of comments being stripped from a yaml file that consists only of comment content is something I probably will not pursue. It would require too many fundamental changes to parsing in the library, and I am trying to be careful not to touch the lexer and parser in yaml.v3 unless it is something absolutely crucial. However if I can safely add a feature that preserves the document start token on an otherwise empty document, your use case will probably work anyway. I will use this issue to track that work.

As for adjusting include_document_start to true by default, I opted to leave that default as false on purpose; I haven't been able to collect any real data, but in my experience including the document start is less popular in scenarios outside of multi-document files. I don't really have any mechanism to gather data on actual popularity, so I tend to choose defaults that either:

  • Avoid hacks that I had to add for certain requested features
  • Are otherwise as unintrusive as possible

So unless there is some resounding response that the default should be swapped, I think I will leave that default where it is.

@hartzell
Copy link
Author

That all makes sense, thank you for the investigation and the detailed explanation.

The challenge that I have is that, given the files that I'm working with (that are generated by external tools that I can't change) is that the only safe thing to do is run with include_document_start as true. But that's a tricky engineering problem (CI & human). The CI problem is that my CI job needs to set up a config file and pass it in on the command line, this could be solved by baking the file into the Docker image in which the CI job runs and adding the -conf switch to the invocation. The human version is trickier, I can't just document that the invocation needs to include e.g. -include_document_start, but rather "You need to add a configuration file that includes these lines.

Could yamlfmt by default check for a config file in the current working dir (and search up the tree)? Then I could commit the config file to the repo and the only risk is that someone would run the command from somewhere that's not in the tree?

@braydonk
Copy link
Collaborator

braydonk commented Mar 10, 2023

See this section of the README for an explanation of the priority order that yamlfmt checks for a config file. It will search in the working directory, but won't search the full tree. I decided that the use case of needing to find a .yamlfmt file arbitrarily in the working tree wasn't worth the added complexity and instead opted to provide the -conf flag to pass any path to a config file.

I do plan eventually to address #88, to pass configuration through command line flags. This won't be quite as simple as it sounds so I haven't prioritized it yet, but I do plan to allow for it in some fashion eventually.

@hartzell
Copy link
Author

Thank you for all the help here. This will work well for our needs. Appreciate all your efforts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants