Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Status of Single-line and Multiline Comments #19

Open
shawntabrizi opened this issue May 4, 2022 · 5 comments
Open

Status of Single-line and Multiline Comments #19

shawntabrizi opened this issue May 4, 2022 · 5 comments
Labels
enhancement New feature or request

Comments

@shawntabrizi
Copy link

shawntabrizi commented May 4, 2022

I have read in the documentation that that parsing and handling comments is one of the "sore points".

Specifically, it seems your library only parses and allows for inline comments, whereas it is pretty regular to see TOML files with comments over the lines they are modifying, or even multiline comments which act as separators between sections of TOML configuration.

Examples of comments over the lines can be found in the TOML specification: https://toml.io/en/

# integers
int1 = +99
int2 = 42
int3 = 0
int4 = -17

# hexadecimal with prefix `0x`
hex1 = 0xDEADBEEF
hex2 = 0xdeadbeef
hex3 = 0xdead_beef

# octal with prefix `0o`
oct1 = 0o01234567
oct2 = 0o755

And a basic Rust toml file uses an "island" comment:

[package]
name = "temp"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]

In our project Substrate, we take advantage of multiline comments in toml like this:

# The list of dependencies below (which can be both direct and indirect dependencies) are crates
# that are suspected to be CPU-intensive, and that are unlikely to require debugging (as some of
# their debug info might be missing) or to require to be frequently recompiled. We compile these
# dependencies with `opt-level=3` even in "dev" mode in order to make "dev" mode more usable.
# The majority of these crates are cryptographic libraries.
#
# Note that this does **not** affect crates that depend on Substrate. In other words, if you add
# a dependency on Substrate, you have to copy-paste this list in your own `Cargo.toml` (assuming
# that you want the same list). This list is only relevant when running `cargo build` from within
# the Substrate workspace.
#
# If you see an error mentioning "profile package spec ... did not match any packages", it
# probably concerns this list.
#
# This list is ordered alphabetically.
[profile.dev.package]
blake2 = { opt-level = 3 }
blake2-rfc = { opt-level = 3 }
blake2b_simd = { opt-level = 3 }
chacha20poly1305 = { opt-level = 3 }
...

What are the current limitations around parsing and handling these TOML comments? Is this something that this library could support in the future?

If you have an idea how you would like it implemented, could you write some notes so maybe I (or others) could try a stab at it?

Thanks!

@LongTengDao
Copy link
Owner

LongTengDao commented May 4, 2022

Yes, of cause I also want to support over-line, "island" and multi-line comments.

The main difficulty I meet, is that there seems no good way to describe which does a comment belong to (while parsing or stringifying). The commens could be too whimsical.

Think about this:

# overall

# a
# a.a
a.a = 1
# a.b
a.b.x = 1
a.b.y = 1
#a.b.z = 1

How would you treat that?

If we just remain the comments, without correct belonging relationship which most likely meeting the writer's intention, then it rarely help, and will break the compatibility when we finally find a good enough rule in the future.

I'm glad to discuss any possible idea, welcome to feedback.

@shawntabrizi
Copy link
Author

shawntabrizi commented May 5, 2022

Yeah, totally understood. The island comments seem to be the hardest to handle here, but I think that you could create a few rules which would capture most situations.

For example, today you only capture inline comments, and drop the rest.

You could also enable a feature where single / multi-line comments are attached to the nearest TOML line below it, and the rest would be dropped just as before.

So using your example above, we would get:

{
    a: {
        a: '1', [commentAbove('a')]: ' a \n a.a',
        b: {
            x: '1', [commentAbove('x')]: ' a.b',
            y: '1'
        }
    }
}

At least in this case, we capture what is arguably the most common kind of comment in the way that is most normally expected. Here we still drop the island comment, and the tricky comment at the end of the file, but this would be a large step forward from what is available today.

And, at the end, I believe more rules like this could be configurable as xOptions, allowing the user to play with the rules for their desired behavior.

@LongTengDao
Copy link
Owner

LongTengDao commented May 6, 2022

I think I need to confirm your final need about comment.

Firstly, @ltd/j-toml support any valid comment parsing; problem is how to remain it when stringify.

Here are two main cases to stringify a TOML document with comments: 1) a table parsed from existing file (which means you only want to change some value, and want to remain the comments without changes); 2) a brand new table with new structure, or change existing comments info.

If only 1st case is usually needed, then the mechanisms can be in black box; if both, then here must be a good enough api, and I need to design it systematically.

@LongTengDao
Copy link
Owner

# overall

# a
# a.a
a.a = 1
# a.b
a.b.x = 1
a.b.y = 1
#a.b.z = 1
{
    [commentsInThis]: [ [' overall'], [' a',' a.a'], 'a', ['a.b.z = 1'] ],
    a: {
        [commentsInThis]: [ 'a', [' a.b'], 'b' ],
        a: 1,
        b: {
            x: 1,
            y: 1,
        },
    },
}

Does this look good enough? (Won't distinguish over-line comment and "island" comment, which can be designed, but I found it's easy to see mixing up in your samples.)

@shawntabrizi
Copy link
Author

@LongTengDao first, thank you for your time to discuss this with me.

I think this might work. As I understand this syntax, you are keeping track of the comments and where they are relative to certain keys. In this case a and b in those commentsInThis refer to the location of the key, and then the comments exist positionally around them.

Ultimately, I would need to test this logic on some of the files by doing parse -> stringify to see the diff, but yes, I think this would already be a big improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants