Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I round trip ron files preserving formatting and comments? #216

Open
vi opened this issue Apr 22, 2020 · 22 comments
Open

How do I round trip ron files preserving formatting and comments? #216

vi opened this issue Apr 22, 2020 · 22 comments
Labels

Comments

@vi
Copy link

vi commented Apr 22, 2020

Are there Ron parsers for making automated edits to Ron files?

As far as I understand, whitespace-preserving parsing should be based on something other that Serde. Does Ron have some Ron-specific crate that goes beyond Serde's API?

@kvark
Copy link
Collaborator

kvark commented Apr 22, 2020

Oh interesting! I'm not aware of anything like that, no.

@xMAC94x
Copy link

xMAC94x commented Sep 29, 2020

I would also like to see a feature to modify ron files preserving the comments. (format would be nice-to-have).
One option would be to edit the file directly.
However preserving comments (but not format) would be possible by saving comments to in deserialize, and print them back on serialize.
This could allow config files with alternatives

{
   // supported: http, https, ftp
   "protocol": "http"
}

@kvark
Copy link
Collaborator

kvark commented Sep 29, 2020

Could you tell more? What RON would you modify programmatically, and how, while preserving comments?

@xMAC94x
Copy link

xMAC94x commented Sep 29, 2020

I contribute to a rust game called veloren (https://gitlab.com/veloren/veloren/)
We are storing our settings file with the help of RON format.
E.g.

$cat server_settings.ron 
(
    gameserver_address: "0.0.0.0:14004",
    metrics_address: "0.0.0.0:14005",
    auth_server_address: Some("https://auth.veloren.net"),
    max_players: 100,
    world_seed: 59686,
    server_name: "Veloren Alpha aa",
    server_description: "dddd",
    start_time: 32400,
    admins: [

        "Pfau",
        "zesterer",
        "xMAC94x",
        "Timo",
        "Songtronix",
        "slipped",
        "Sharp",
        "Acrimon",
        "imbris",
        "YuriMomo",
        "Vechro",
        "AngelOnFira",
        "Nancok",
        "Qutrin",
        "Mckol",
        "Treeco",    ],
    map_file: None,
    persistence_db_dir: "saves",
    max_view_distance: Some(30),
    banned_words_files: [],
    max_player_group_size: 6,
    player_timeout: (
        secs: 2,
        nanos: 0,
    ),
)

now the user has the option to modify his settings during the game. e.g. by writing in the chat
/adminify foovar to add user foobar to the admin list.
When we add someone to the admin list we want to store this. so we are saving the information back to the file.

So we are reading and writing to our settings file.

In this process, all comments get lost.

However, it would be awsome, to allow comments.
E.g. we as developers, might want to put the default world_seed as a comment.

Or a user wants to temporarrily removes an admin from the list. but when he comments a person out, and restarts the game (and the file got saved again) the name is completly lost.

@kvark
Copy link
Collaborator

kvark commented Oct 2, 2020

Interesting, thank you for the info!
I wonder how difficult this would be. Is there a precedent in other formats (JSON/YAML/TOML) based on Serde that would preserve the comments?

@torkleyy
Copy link
Contributor

torkleyy commented Oct 6, 2020

Yes: https://github.com/ordian/toml_edit

This crate allows you to parse and modify toml documents, while preserving comments, spaces* and relative order* or items.

@torkleyy
Copy link
Contributor

torkleyy commented Oct 6, 2020

I don't think it's using serde though.

@vi
Copy link
Author

vi commented Oct 6, 2020

Maybe like with cbor rewrite, RON should have two crates: low-level serde-independent and high-level serde-based?

@kvark
Copy link
Collaborator

kvark commented Oct 6, 2020

A full rewrite on non-serde would also open up some possibilities, but that's a huge amount of work that we don't know how to allocate resources into. So this would unlikely happen until somebody motivated would do it.

@Boscop
Copy link

Boscop commented Nov 18, 2020

I'd also really like to have this feature, also for modifying user-written config files.

Btw, is there something like serde that preserves whitespace & comments, that any format can piggyback on? If not, how hard would it be to write?
So that not every format crate has to reinvent the wheel..

@Boscop
Copy link

Boscop commented Nov 18, 2020

One workaround I can think of, to get this feature with the current crates, (but with some overhead) is:

  1. Use serde to parse, then serialize it again to string.
  2. "zip" the original source with the re-serialized one, to generate a source map (containing spans, also keep comments)
  3. edit the data
  4. serialize the data to string
  5. inverse apply the source map to the serialized string to restore original structure

The source map would have to have a tree structure (matching the type nesting) so that it won't result in dumb string comparison when reapplying the source map.
It could be generated with a proc-macro from the types.


Actually I think the process can be simplified with a custom Serializer:

  1. Use serde to parse, then serialize it with a custom Serializer which works like this:
    It's like the normal RON serializer but injected with some additional code:
    At each nesting level, it compares the original string with the serialized string, and stores any discrepancies in whitespace & comments. After this, it returns this info to use later.
    (It's like a tree zipper, like a visitor that visits both trees in parallel.)
  2. edit the data
  3. Use the gathered whitespace&comment-diff info when serializing the data to string

This info can be stored in types that are isomorphic to the original types: Instead of the original types, each field has the type WhitespaceAndCommentData. These isomorphic types can be generated with a proc-macro.

@xMAC94x Would you be interested in this workaround? :)

@kvark
Copy link
Collaborator

kvark commented Nov 18, 2020

Might be worth looking at Serde alternatives for this, such as https://github.com/dtolnay/miniserde and https://github.com/not-fl3/nanoserde

@Boscop
Copy link

Boscop commented Nov 18, 2020

@kvark It looks like those 2 crates also don't preserve whitespace/comment information.

@xMAC94x
Copy link

xMAC94x commented Nov 20, 2020

@Boscop sounds like a feasible workaround. it would be cool if there exist a procmacro/Serializer or at least an example coding for that :)

@Boscop
Copy link

Boscop commented Nov 24, 2020

@xMAC94x I implemented this approach once, for my transpiler from Rust to "Rust with pythonic syntax" because I needed to have the exact spans of all tokens, to be able to generate a sourcemap, so that I could backtranslate the lines/columns of compilation errors back to lines/columns in the pythonic source code (in my cargo proxy).
(At the time there was no crate that could parse Rust and would give me the spans of all tokens, only a way to get TokenStreams and a parser on top of that, but it threw away comments and whitespace info.)
This approach worked very well!

@coderedart
Copy link

Just to express interest in this old issue, I want comments in the config file that explain what the specific option does. Ideally, it would just take the doc comments of the field and insert those into the serialized format. pretty much all existing crates like toml/yaml/json/xml parsers have atleast one issue asking for this. but it seems this is not happening as serde doesn't provide doc comments to the backend crates :(

having comments in the config file really helps users to be a little more daring and try out various options, so its a good thing to have imo. one of the core strengths of Rust is documentation of libraries. compared to older languages, documentation was part of the codebase for rust from early on and it makes sense for RON to consider it :)

@vi
Copy link
Author

vi commented Sep 22, 2021

but it seems this is not happening as serde doesn't provide doc comments to the backend crates :(

What if, to store data about comments and whitespace style, do one of:

  1. Store things in adjacent "hidden" elements in parent dicts. This would allow editing them prior to serialisation;
  2. Store things in root dictionary being serialized/deserialized. This would make attaching round-trippable deserialisation into exsting system easier, but less convenient for editing comments from code;
  3. Make deserialization produce two things: main object and special metadata block that can be optionally used for serialisation to fine-tune it?

@Zireael07
Copy link

Bump. Still an issue. Also, serde independence (or ability to use something smaller) would be very welcome.

@tryoxiss
Copy link

Given ron is designed to support all of serdes data model, and serde is a rust standard, I think serde independence is likely a bad idea, at least for the main parser (although an altnerate fork such as ron-noserde maybe, but as mentioned that is a lot of work).

I think the work around suggested by xMAC94x could be good! Though I highly suspect that a non-destructive serialisation would come with performance downsides, so it should be a feature that swaps the standard functions for a non-destructive set.

@vi
Copy link
Author

vi commented Jan 25, 2024

altnerate fork such as ron-noserde

Maybe it should be something like ron-lowlevel, which can be used directly (for tricky cases) or through ron facade (for basic use cases), with ron depending on ron-lowlevel.


serde is a rust standard

Serde is something like "greatest common divisor" of all parsing formats - it allows swapping formats for simple cases, but when you need something tricky (human-editable files, super performance, support of somewhat incorrect files), you often skip Serde and work directly with a low-level library.

Each format being represented as a Serde library (if possible) is a good idea; formats being limited to what Serde can is not, even for formats which are directly inspired by Serde.

@juntyr
Copy link
Member

juntyr commented Jan 25, 2024

serde is a rust standard

Serde is something like "greatest common divisor" of all parsing formats - it allows swapping formats for simple cases, but when you need something tricky (human-editable files, super performance, support of somewhat incorrect files), you often skip Serde and work directly with a low-level library.

Each format being represented as a Serde library (if possible) is a good idea; formats being limited to what Serde can is not, even for formats which are directly inspired by Serde.

I think that's a good way to think about it. @torkleyy in the past put some work into a rewrite of ron that has its own AST parsing with more functionality and then uses that to also provide serde features. As we are trying to make ron more robust, in particular to enhance its Value type, I do think we'll end up somewhere in this direction. Especially since I've been working on fuzzing all of ron, including its interaction with serde attributes, I've realised just how much the current serde limitations also impact ron. While serde could be extended to fix most of these issues, now that impl trait in trait function return types is stable, this may take a while. Once I have some more time and energy to put into ron again (or if anyone wants to push this feature earlier), slowly transitioning towards a non-serde-driven parser might be a good direction to take.

@tryoxiss
Copy link

tryoxiss commented Jan 25, 2024

altnerate fork such as ron-noserde

Maybe it should be something like ron-lowlevel, which can be used directly (for tricky cases) or through ron facade (for basic use cases), with ron depending on ron-lowlevel.

serde is a rust standard

Serde is something like "greatest common divisor" of all parsing formats - it allows swapping formats for simple cases, but when you need something tricky (human-editable files, super performance, support of somewhat incorrect files), you often skip Serde and work directly with a low-level library.

Each format being represented as a Serde library (if possible) is a good idea; formats being limited to what Serde can is not, even for formats which are directly inspired by Serde.

and

serde is a rust standard

Serde is something like "greatest common divisor" of all parsing formats - it allows swapping formats for simple cases, but when you need something tricky (human-editable files, super performance, support of somewhat incorrect files), you often skip Serde and work directly with a low-level library.
Each format being represented as a Serde library (if possible) is a good idea; formats being limited to what Serde can is not, even for formats which are directly inspired by Serde.

I think that's a good way to think about it. @torkleyy in the past put some work into a rewrite of ron that has its own AST parsing with more functionality and then uses that to also provide serde features. As we are trying to make ron more robust, in particular to enhance its Value type, I do think we'll end up somewhere in this direction. Especially since I've been working on fuzzing all of ron, including its interaction with serde attributes, I've realised just how much the current serde limitations also impact ron. While serde could be extended to fix most of these issues, now that impl trait in trait function return types is stable, this may take a while. Once I have some more time and energy to put into ron again (or if anyone wants to push this feature earlier), slowly transitioning towards a non-serde-driven parser might be a good direction to take.

I did not realise that, I thought serde did some of the heavier lifting for parts. As long as there is a simple serde interface I would have no problems of it being independent now that I know that. (Even though I see myself working with lowlevel-ron directory a considerable amount for non-destructive serialisation, a serde interface is very nice to have for more plug-and-play use cases).

And having serde independce would allow for non-destructive serialisation without workarounds, which is good imo!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants