Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

walrus roundtrip inflates size of WebAssembly binaries #142

Open
RReverser opened this issue Nov 26, 2019 · 9 comments
Open

walrus roundtrip inflates size of WebAssembly binaries #142

RReverser opened this issue Nov 26, 2019 · 9 comments
Labels
bug Something isn't working

Comments

@RReverser
Copy link
Member

Describe the Bug

Parsing and re-emitting WebAssembly even without modifications often produces larger binary than the original.

Steps to Reproduce

  1. Download https://squoosh.app/optipng.4e77b.wasm (this is just a public example, but the problem can be reproduced on pretty much arbitrary real-world Wasm files).
  2. Use the following code:
    let mut module = walrus::ModuleConfig::new()
        .generate_producers_section(false)
        .parse_file("optipng.4e77b.wasm")?;
    module.producers.clear();
    module.emit_wasm_file("optipng.out.wasm")?;
  1. Check input and output sizes.

Expected Behavior

Output is same (or maybe even smaller) than the original.

Actual Behavior

Input file is 286727 bytes long, and output is 290568, making it 3.75KB larger.

Additional Context

wasm-objdump -h dump for input:

     Type start=0x0000000b end=0x00000101 (size=0x000000f6) count: 33
   Import start=0x00000104 end=0x000006d0 (size=0x000005cc) count: 68
 Function start=0x000006d3 end=0x00000976 (size=0x000002a3) count: 673
   Global start=0x00000978 end=0x0000099c (size=0x00000024) count: 7
   Export start=0x0000099f end=0x00000b3d (size=0x0000019e) count: 26
     Elem start=0x00000b40 end=0x00000cce (size=0x0000018e) count: 1
     Code start=0x00000cd2 end=0x0003c473 (size=0x0003b7a1) count: 673

and output:

     Type start=0x0000000e end=0x00000104 (size=0x000000f6) count: 33
   Import start=0x0000010a end=0x000006d6 (size=0x000005cc) count: 68
 Function start=0x000006dc end=0x0000097f (size=0x000002a3) count: 673
   Global start=0x00000985 end=0x000009a9 (size=0x00000024) count: 7
     Elem start=0x00000b53 end=0x00000d1d (size=0x000001ca) count: 1
     Code start=0x00000d23 end=0x0003d372 (size=0x0003c64f) count: 673
     Data start=0x0003d378 end=0x00046f08 (size=0x00009b90) count: 54

As you can see, most sections remained unchanged, but Elem and Code became larger.

Compression sizes differ as well (even if less so) - e.g. with Brotli input is 103410 bytes and output is 103887 bytes.

@RReverser RReverser added the bug Something isn't working label Nov 26, 2019
@alexcrichton
Copy link
Collaborator

A curious bug!

I don't think your input/output are for the same file, using the input file in the link you gisted above I get:

input

optipng.4e77b.wasm:     file format wasm 0x1                           
                                                                       
Sections:                                                              
                                                                       
     Type start=0x0000000b end=0x00000101 (size=0x000000f6) count: 33  
   Import start=0x00000104 end=0x000006d0 (size=0x000005cc) count: 68  
 Function start=0x000006d3 end=0x00000976 (size=0x000002a3) count: 673 
   Global start=0x00000978 end=0x0000099c (size=0x00000024) count: 7   
   Export start=0x0000099f end=0x00000b3d (size=0x0000019e) count: 26  
     Elem start=0x00000b40 end=0x00000cce (size=0x0000018e) count: 1   
     Code start=0x00000cd2 end=0x0003c473 (size=0x0003b7a1) count: 673 
     Data start=0x0003c477 end=0x00046007 (size=0x00009b90) count: 54  

output

optipng.out.wasm:       file format wasm 0x1

Sections:

     Type start=0x0000000e end=0x00000104 (size=0x000000f6) count: 33
   Import start=0x0000010a end=0x000006d6 (size=0x000005cc) count: 68
 Function start=0x000006dc end=0x0000097f (size=0x000002a3) count: 673
   Global start=0x00000985 end=0x000009a9 (size=0x00000024) count: 7
   Export start=0x000009af end=0x00000b4d (size=0x0000019e) count: 26
     Elem start=0x00000b53 end=0x00000d1d (size=0x000001ca) count: 1
     Code start=0x00000d23 end=0x0003d372 (size=0x0003c64f) count: 673
     Data start=0x0003d378 end=0x00046f08 (size=0x00009b90) count: 54

The differences here are small differences in the elem and code sections. The elem section seems entirely accounted for because walrus will renumber function indices. The code section I presume is similar. Walrus sorts functions biggest first to get engines with streaming translation the chance to start on the big functions first, and it looks like that happens to cause the size to increase here, presumably because small functions are referenced far more often than large functions.

Walrus isn't really a size-optimizing compiler. It's possible that we could add an entirely new emission mode to optimize for this though.

@RReverser
Copy link
Member Author

I don't think your input/output are for the same file, using the input file in the link you gisted above I get:

Sorry, I'm not sure what you meant here. Your output seems to match mine at the first glance? (except I accidentally omitted last line in wasm-objdump for input when copy-pasting).

and it looks like that happens to cause the size to increase here, presumably because small functions are referenced far more often than large functions

Yeah, could be, although surprised it makes such a big difference.

I think size is often important for Wasm users, so adding such mode would be benefitial, but if not, maybe it's worth documenting as explicit non-goal in docs or README?

@RReverser
Copy link
Member Author

I also wonder if it would be possible to have this reordering as an optional pass, so that by default functions would appear in original order and roundtrip would be preserved more closely. This would basically delegate order of most functions to the original Wasm compiler, which might choose to optimise for speed or perf as well.

@fitzgen
Copy link
Member

fitzgen commented Nov 26, 2019

That would require keeping track of the original order, which we don't currently do.

@RReverser
Copy link
Member Author

@fitzgen As far as I understand from the code, functions should be already naturally added in original order to the arena during parsing, and that order is preserved throughout most basic transformations.

The actual reordering seems to be happening just here -

functions.sort_by_key(|(id, _, size)| (cmp::Reverse(*size), *id));
- during the emit phase.

@alexcrichton
Copy link
Collaborator

Yes the difference is that the input binary you listed above doesn't look like it has a data section, but the data section is there (and is the same size in both)

I don't really think this warrants any notes in the README or anything like that, it's not like we're not size-averse or we specifically don't optimize for size in walrus.

This is a micro-optimization in one use case where resorting functions causes bloat. If you want to 100% optimize for size that's not really what walrus is built for, and that's why there's tools like wasm-opt -Os which probably account for this.

@RReverser
Copy link
Member Author

doesn't look like it has a data section, but the data section is there (and is the same size in both)

Oh yeah, sorry, as I said, it was just a copy-paste mistake.

I don't really think this warrants any notes in the README or anything like that, it's not like we're not size-averse or we specifically don't optimize for size in walrus.

To me, it's not so much about intentionally optimising for the size, as about roundtrip producing different results. Other tools either usually preserve it by default, or, if they also use a different internal IR - such as Binaryen - clearly document that roundtrip can produce very different Wasm than the input.

I think in this case, as a minimum, it would make sense to follow what Binaryen does with just documenting this constraint. On the other hand, the fix to keep original order (and keep reordering as a separate option) seems so simple that it seems worth just implementing it instead.

@RReverser
Copy link
Member Author

I experimented with this a bit, and, while disabling function reordering decreases size a bit closer to original, it seems that the main overhead remains and is caused by locals reordering instead.

@alexcrichton
Copy link
Collaborator

I think it's reasonable to document yeah that walrus will not exactly roundtrip a file when it's read and then serialized, that's not a design goal of this crate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants