Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove BinRead::after_parse and add offset-table helpers #210

Closed
wants to merge 3 commits into from

Conversation

csnover
Copy link
Collaborator

@csnover csnover commented Jul 2, 2023

This trait method exists pretty much exclusively to support deferred reads of FilePtr values but causes ongoing trouble
elsewhere and the benefits do not seem to outweigh the problems:

  1. It requires T::Args to be cloneable in many cases where it should not be the case, which causes confusion, has a strong chance of causing accidental slowness, and makes it unnecessarily hard to move from imports;
  2. An analysis I ran of binrw users on GitHub showed that pretty much all cases of FilePtr were using FilePtr::parse or deref_now, so any potential performance benefit does not seem to be realised by real-world projects;
  3. Because there is no hard requirement to call after_parse and it mostly will not break anything, it is too easy for
    users to write custom implementations that do not do this and so are subtly broken. From the same GH analysis, there
    was only one case where I found someone who wrote a custom implementation that correctly called after_parse;
  4. Since after_parse does not build a stack of objects to revisit later, its ability to avoid non-linear reads of data is limited to at most one struct or enum at a time anyway.

Given these things (and probably others that I forget), IMO the existence of this feature is not justified. Instead, I think that a design that reads offsets into a Vec<{integer}> and then iterates over them later to convert into Vec<T> is
preferable.

To that end, this patch set includes some new helper functions in file_ptr that enable efficient offset table pattern, with the added benefit of (I think?) solving the absolute/relative offset issue: now, a user puts the data field in the correct place in the struct and it Just Works, or else can use #[br(seek_before)] on the data field if the offsets are relative to a different position in the file, which means the offset position can be relative or absolute and there is no need for special offset plumbing.

I think this also suggests a path forward for offset table writing, where a temporary marker for the offset table can be created that zero-fills the space, and then the data field and the marker are passed to a helper function that writes the data and then writes out the offset table, but that isn’t something I’ve actually prototyped yet.

This trait method exists pretty much exclusively to support
deferred reads of `FilePtr` values but causes ongoing trouble
elsewhere and the benefits do not seem to outweight the
problems:

1. It requires `T::Args` to be cloneable in many cases where it
   should not be the case, which causes confusion, has a strong
   chance of causing accidental slowness, and makes it
   unnecessarily hard to move from imports;
2. An analysis I ran of binrw users on GitHub showed that pretty
   much all cases of `FilePtr` were using `FilePtr::parse` or
   `deref_now`, so any potential performance benefit does not
   seem to be realised by real-world projects;
3. Because there is no hard requirement to call `after_parse`
   and it mostly will not break anything, it is too easy for
   users to write custom implementations that do not do this
   and so are subtly broken. From the same GH analysis, there
   was only one case where I found someone who wrote a custom
   implementation that correctly called `after_parse`;
4. Since `after_parse` does not build a stack of objects to
   revisit later, its ability to avoid non-linear reads of data
   is limited to at most one struct or enum at a time anyway.

Given these things (and probably others that I forget), IMO
the existence of this feature is not justified. Instead, I
think that a design that reads offsets into a `Vec<{integer}>`
and then iterates over them later to convert into `Vec<T>` is
preferable; a subsequent patch includes some helper functions
to do this, but also right now it can be done (with some verbosity)
using the `args_iter` helper.

Closes jam1garner#17, jam1garner#119.
Fixes jam1garner#185, jam1garner#197.
@csnover csnover requested a review from jam1garner July 2, 2023 03:47
@csnover
Copy link
Collaborator Author

csnover commented Sep 21, 2023

This is merged now.

@csnover csnover closed this Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant