Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with handling of comments at the start of files #74

Open
SarahFrench opened this issue Jan 3, 2023 · 4 comments
Open

Issues with handling of comments at the start of files #74

SarahFrench opened this issue Jan 3, 2023 · 4 comments
Labels
yaml_v3_problem A bug in the underlying yaml library. These issues are vastly harder to fix.

Comments

@SarahFrench
Copy link

Hi there! I came across your project when looking for a yaml formatter to use in a project and I encountered some unexpected behaviour that might be useful to you.

The project I'm contributing to the yaml is parsed using Ruby, so there are tags in the yaml to help Ruby map yaml to classes. The problems below aren't caused by those tags, but they make small problems more painful than they'd be otherwise (problem 2 below in particular).

Problem 1 - inconsistent handling of comments in files containing multiple yaml documents

If a file contains multiple yaml documents starting with --- then the comments are reordered differently at the start of the file versus elsewhere. This seems to be due to the first --- of the file being prioritised as line 1 of the output.

This file has two documents with a consistent format:

# Set.new([1,2]).to_yaml - first
--- !ruby/object:Set
hash:
  1: true
  2: true

# Set.new([1,2]).to_yaml - second
--- !ruby/object:Set
hash:
  3: true
  4: true

(Adapted from https://rhnh.net/2011/01/31/yaml-tutorial/ - example shows yaml output from Ruby code)

The formatter introduces differences between the comment positions in the first and subsequent entries:

---
!ruby/object:Set
# Set.new([1,2]).to_yaml - first
hash:
  1: true
  2: true

# Set.new([1,2]).to_yaml - second
---
!ruby/object:Set
hash:
  3: true
  4: true

Problem 2 - tags being moved above comments at start of file

In the example above you can see that the first !ruby/object:Set tag is 'pulled' up to the start of the file, above the comment which was originally line 1. This can be an issue if the yaml file starts with a copyright notice (or any other comment that needs to be at the top of the file) because the tag is separated from the yaml and this obscures how the file works.

Original file:

# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http:#www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Set.new([1,2]).to_yaml - first
--- !ruby/object:Set
hash:
  1: true
  2: true

# Set.new([1,2]).to_yaml - second
--- !ruby/object:Set
hash:
  3: true
  4: true

A yamlfmt moves the tag away from the yaml:

---
!ruby/object:Set
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http:#www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Set.new([1,2]).to_yaml - first
hash:
  1: true
  2: true

# Set.new([1,2]).to_yaml - second
---
!ruby/object:Set
hash:
  3: true
  4: true

Ideally (blocks of) comments at the start of a file would be given a special status and would still be the first lines of the file after formatting.


I appreciate you're a sole maintainer and the project is in v0.x.x at the moment, so I'm not expecting a fix in response! However I thought these examples and info about reproducing problems would be helpful to you.

Happy new year! 👋

@SarahFrench
Copy link
Author

SarahFrench commented Jan 3, 2023

And to add, for these examples my .yamlfmt file was:

formatter:
  type: basic
  indent: 2
  include_document_start: true

@braydonk
Copy link
Collaborator

braydonk commented Jan 4, 2023

Thank you for opening an issue @SarahFrench, and thank you for the in-depth detail!

I took a cursory look at this in the underlying yaml library. What I was able to determine is that what seems like an obviously consistent situation to the reader, namely the way these comments are placed around document-start tokens, is interpreted very differently by the underlying parser. Of the two problems, I suspect one is an unfortunate result of the YAML spec, but one problem may be a bug in the parser that I can potentially fix.

TLDR for the rest of these details: The issue with the comments at the head of the document may be reconcilable, but the one with comments on document starts further down the page may not. It all needs more research to be completely sure. Thank you for the issue and information, and I will keep this issue updated if I get anywhere!

Deeper technical stuff that is probably only interesting to my future self:

The YAML spec has 3 types of comment parsing patterns: HeadComment, LineComment, and FootComment. Each node in a yaml document can have any of all 3 of these.

What is happening under the hood in this scenario:

# Comment
---
a: 1

Is that the comment is being parsed as a HeadComment for the scalar value 1. This is the issue I think I may be able to fix. I have a strong suspicion that this comment really should be the head comment of the document start node and not the scalar value. I think this could be a parser bug.

The other annoying thing is this scenario of the following document:

a: 1

# Comment
---
a: 2

The comment in this scenario stays where it's supposed to after format. Unfortunately it's not because it's the HeadComment of the document node like we want; it's because it's the FootComment of the above mapping node a.

I will need to dedicate some more time to this next week to see what of these behaviours are "up to spec" so to speak, and what of them are due to bad parsing.

@djgoku
Copy link

djgoku commented Apr 12, 2023

input yaml:

repos:
  - repo: https://github.com/antonbabenko/pre-commit-terraform
    rev: v1.77.2
    hooks:
      # - id: infracost_breakdown
      - id: terraform_fmt
      # - id: terraform_providers_lock
      - id: terraform_validate
      # - id: terrascan

output yaml:

repos:
  - repo: https://github.com/antonbabenko/pre-commit-terraform
    rev: v1.77.2
    hooks:
      # - id: infracost_breakdown
      - id: terraform_fmt
      # - id: terraform_providers_lock
      - id: terraform_validate
        # - id: terrascan <-- two leading spaces

@braydonk
Copy link
Collaborator

Hi @djgoku, I think what you have there is unrelated to this issue. Would you mind opening a new one?

@braydonk braydonk added the yaml_v3_problem A bug in the underlying yaml library. These issues are vastly harder to fix. label Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
yaml_v3_problem A bug in the underlying yaml library. These issues are vastly harder to fix.
Projects
None yet
Development

No branches or pull requests

3 participants