Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve finding edges to data segments #45

Open
fitzgen opened this issue Apr 26, 2018 · 6 comments
Open

Improve finding edges to data segments #45

fitzgen opened this issue Apr 26, 2018 · 6 comments
Labels
enhancement New feature or request wasm

Comments

@fitzgen
Copy link
Member

fitzgen commented Apr 26, 2018

Ok, clearly this wasm's size has a lot going on in the data section:

~/twiggy
$ twiggy top -n 10 ~/gutenberg-parser-rs/bindings/wasm/gutenberg_post_parser.debug.wasm 
 Shallow Bytes │ Shallow % │ Item
───────────────┼───────────┼───────────────────────────────────────────────────────────────────────────────────
          6285 ┊     6.49% ┊ data[4]
          3653 ┊     3.77% ┊ gutenberg_post_parser::parser::block::h482c2450268e1452
          3485 ┊     3.60% ┊ <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T, I>>::from_iter::h16c80f212f4e48bd
          2726 ┊     2.82% ┊ "function names" subsection
          2593 ┊     2.68% ┊ data[2807]
          1900 ┊     1.96% ┊ data[5]
          1796 ┊     1.86% ┊ data[2804]
          1771 ┊     1.83% ┊ data[0]
          1749 ┊     1.81% ┊ data[2749]
          1486 ┊     1.54% ┊ data[38]

Which functions are using this data?

~/twiggy
$ twiggy paths ~/gutenberg-parser-rs/bindings/wasm/gutenberg_post_parser.debug.wasm 'data[4]' 'data[2807]' 'data[5]' 'data[2804]' 'data[0]' 'data[2749]' 'data[38]'
 Shallow Bytes │ Shallow % │ Retaining Paths
───────────────┼───────────┼────────────────
          1771 ┊     1.83% ┊ data[0]
          6285 ┊     6.49% ┊ data[4]
          1900 ┊     1.96% ┊ data[5]
          1486 ┊     1.54% ┊ data[38]
          1749 ┊     1.81% ┊ data[2749]
          1796 ┊     1.86% ┊ data[2804]
          2593 ┊     2.68% ┊ data[2807]

No edges to the data segments found. I don't believe that is accurate.

@fitzgen fitzgen added enhancement New feature or request wasm labels Apr 26, 2018
@fitzgen
Copy link
Member Author

fitzgen commented Apr 26, 2018

input test case

@fitzgen
Copy link
Member Author

fitzgen commented May 4, 2018

This should help us get useful information about data segments: https://reviews.llvm.org/D46417

@srenatus
Copy link
Contributor

🔍 From sprinkling println! on it, it looks like for the input test case provided, let I32Const(base) never matches:

if let I32Const(base) = code[i - 1] {
  if let Some(data_id) = items.get_data(base as u32 + off) {
    items.add_edge(body_id, data_id);
  }
}

However, I'm just beginning to wrap my mind around what's happening there... currently, I fail to understand the meaning (and necessity) of base -- it seems like offset would be all that is needed to load something from the linear data (i.e., all the data segments concatenated), and hence enough to determine if a particular data segment is needed.

Also, am I right assuming that optimizing data segments isn't all that simple -- we cannot drop something without changing all existing offsets. I suppose it could be overwritten with zeros, and thus be end up being better to compress? (And additionally, it's hard to determine when the default data segment content from the wasm file is used, or when it is overwritten (using the store instructions), and loaded from that, isn't it?)

@srenatus
Copy link
Contributor

Update -- I've found a reference for my 💭 points above: https://github.com/rust-lang-nursery/rust-wasm/issues/21

@fitzgen
Copy link
Member Author

fitzgen commented May 29, 2018

However, I'm just beginning to wrap my mind around what's happening there... currently, I fail to understand the meaning (and necessity) of base -- it seems like offset would be all that is needed to load something from the linear data (i.e., all the data segments concatenated), and hence enough to determine if a particular data segment is needed.

Accessing array[i] -- where array is some global data array and i is our static index into it -- will translate roughly into

i32.const $address_of_array
i32.load $i

The i is the static offset in that code snippet.

The thing is, this will only work for static indices, where the offset is embedded in the instruction. Any kind of dynamic indexing into the array won't be seen by this (very simple, conservative) analysis. I suspect that this is what is happening in practice.

Also, am I right assuming that optimizing data segments isn't all that simple -- we cannot drop something without changing all existing offsets. I suppose it could be overwritten with zeros, and thus be end up being better to compress? (And additionally, it's hard to determine when the default data segment content from the wasm file is used, or when it is overwritten (using the store instructions), and loaded from that, isn't it?)

The only way to safely remove them is by having and parsing relocations from the linker. Twiggy is just performing (or at least attempting to perform) a conservative analysis for informative purposes. It can't be complete unless both (1) relocations are present, and (2) we teach twiggy how to understand them. That said, it would be cool to add that feature, and it would also be cool to extend our conservative analysis to understand more cases.

In general, I think a full abstract interpreter for wasm would be super useful here, although that definitely belongs in a project of its own :)

@srenatus
Copy link
Contributor

@fitzgen thanks for shedding some light on this 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request wasm
Projects
None yet
Development

No branches or pull requests

2 participants