Improve finding edges to data segments #45

fitzgen · 2018-04-26T22:20:58Z

Ok, clearly this wasm's size has a lot going on in the data section:

~/twiggy
$ twiggy top -n 10 ~/gutenberg-parser-rs/bindings/wasm/gutenberg_post_parser.debug.wasm 
 Shallow Bytes │ Shallow % │ Item
───────────────┼───────────┼───────────────────────────────────────────────────────────────────────────────────
          6285 ┊     6.49% ┊ data[4]
          3653 ┊     3.77% ┊ gutenberg_post_parser::parser::block::h482c2450268e1452
          3485 ┊     3.60% ┊ <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T, I>>::from_iter::h16c80f212f4e48bd
          2726 ┊     2.82% ┊ "function names" subsection
          2593 ┊     2.68% ┊ data[2807]
          1900 ┊     1.96% ┊ data[5]
          1796 ┊     1.86% ┊ data[2804]
          1771 ┊     1.83% ┊ data[0]
          1749 ┊     1.81% ┊ data[2749]
          1486 ┊     1.54% ┊ data[38]

Which functions are using this data?

~/twiggy
$ twiggy paths ~/gutenberg-parser-rs/bindings/wasm/gutenberg_post_parser.debug.wasm 'data[4]' 'data[2807]' 'data[5]' 'data[2804]' 'data[0]' 'data[2749]' 'data[38]'
 Shallow Bytes │ Shallow % │ Retaining Paths
───────────────┼───────────┼────────────────
          1771 ┊     1.83% ┊ data[0]
          6285 ┊     6.49% ┊ data[4]
          1900 ┊     1.96% ┊ data[5]
          1486 ┊     1.54% ┊ data[38]
          1749 ┊     1.81% ┊ data[2749]
          1796 ┊     1.86% ┊ data[2804]
          2593 ┊     2.68% ┊ data[2807]

No edges to the data segments found. I don't believe that is accurate.

The text was updated successfully, but these errors were encountered:

fitzgen · 2018-04-26T22:21:36Z

input test case

fitzgen · 2018-05-04T02:19:33Z

This should help us get useful information about data segments: https://reviews.llvm.org/D46417

srenatus · 2018-05-28T14:21:17Z

🔍 From sprinkling println! on it, it looks like for the input test case provided, let I32Const(base) never matches:

if let I32Const(base) = code[i - 1] {
  if let Some(data_id) = items.get_data(base as u32 + off) {
    items.add_edge(body_id, data_id);
  }
}

However, I'm just beginning to wrap my mind around what's happening there... currently, I fail to understand the meaning (and necessity) of base -- it seems like offset would be all that is needed to load something from the linear data (i.e., all the data segments concatenated), and hence enough to determine if a particular data segment is needed.

Also, am I right assuming that optimizing data segments isn't all that simple -- we cannot drop something without changing all existing offsets. I suppose it could be overwritten with zeros, and thus be end up being better to compress? (And additionally, it's hard to determine when the default data segment content from the wasm file is used, or when it is overwritten (using the store instructions), and loaded from that, isn't it?)

srenatus · 2018-05-29T08:15:12Z

Update -- I've found a reference for my 💭 points above: https://github.com/rust-lang-nursery/rust-wasm/issues/21 ✅

fitzgen · 2018-05-29T16:55:33Z

However, I'm just beginning to wrap my mind around what's happening there... currently, I fail to understand the meaning (and necessity) of base -- it seems like offset would be all that is needed to load something from the linear data (i.e., all the data segments concatenated), and hence enough to determine if a particular data segment is needed.

Accessing array[i] -- where array is some global data array and i is our static index into it -- will translate roughly into

i32.const $address_of_array
i32.load $i

The i is the static offset in that code snippet.

The thing is, this will only work for static indices, where the offset is embedded in the instruction. Any kind of dynamic indexing into the array won't be seen by this (very simple, conservative) analysis. I suspect that this is what is happening in practice.

Also, am I right assuming that optimizing data segments isn't all that simple -- we cannot drop something without changing all existing offsets. I suppose it could be overwritten with zeros, and thus be end up being better to compress? (And additionally, it's hard to determine when the default data segment content from the wasm file is used, or when it is overwritten (using the store instructions), and loaded from that, isn't it?)

The only way to safely remove them is by having and parsing relocations from the linker. Twiggy is just performing (or at least attempting to perform) a conservative analysis for informative purposes. It can't be complete unless both (1) relocations are present, and (2) we teach twiggy how to understand them. That said, it would be cool to add that feature, and it would also be cool to extend our conservative analysis to understand more cases.

In general, I think a full abstract interpreter for wasm would be super useful here, although that definitely belongs in a project of its own :)

srenatus · 2018-05-30T08:49:34Z

@fitzgen thanks for shedding some light on this 😃

fitzgen added enhancement New feature or request wasm labels Apr 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve finding edges to data segments #45

Improve finding edges to data segments #45

fitzgen commented Apr 26, 2018

fitzgen commented Apr 26, 2018 •

edited

fitzgen commented May 4, 2018

srenatus commented May 28, 2018

srenatus commented May 29, 2018

fitzgen commented May 29, 2018

srenatus commented May 30, 2018

Improve finding edges to data segments #45

Improve finding edges to data segments #45

Comments

fitzgen commented Apr 26, 2018

fitzgen commented Apr 26, 2018 • edited

fitzgen commented May 4, 2018

srenatus commented May 28, 2018

srenatus commented May 29, 2018

fitzgen commented May 29, 2018

srenatus commented May 30, 2018

fitzgen commented Apr 26, 2018 •

edited