Skip to content
This repository has been archived by the owner on Nov 21, 2023. It is now read-only.

Brotli and/or zstd, too? #1

Closed
trevyn opened this issue Feb 20, 2021 · 9 comments
Closed

Brotli and/or zstd, too? #1

trevyn opened this issue Feb 20, 2021 · 9 comments

Comments

@trevyn
Copy link

trevyn commented Feb 20, 2021

@leafac Thanks a bijillion for starting this project; the outline looks great and I trust that you in particular have spent some time thinking about the tradeoffs!

In vercel/pkg#837 (comment), you mention that the self-extracting archive uses gzip, which is great. It was unclear to me if this applies to the Node binary or the user assets or both, but either way, I would encourage offering brotli and/or zstd as configurable options for the compressor. Brotli in particular offers an extremely good compression ratio while still being very fast to decompress — the tradeoff is longer compression time, but this is the ideal tradeoff for an app release build: it takes longer on the GitHub Actions runner, but results in a significantly smaller binary that is still near-instant to decompress.

If you need more convincing, I can pull up some hard numbers. :)

And thanks again for starting caxa! I love the name and story, and please provide some pronunciation tips for non-Portuguese-speakers! ;)

@leafac
Copy link
Owner

leafac commented Feb 23, 2021

In vercel/pkg#837 (comment), you mention that the self-extracting archive uses gzip, which is great. It was unclear to me if this applies to the Node binary or the user assets or both

It applies to both. The way caxa works is it puts the node executable under node_modules/.bin and then compresses the whole project folder.

, but either way, I would encourage offering brotli and/or zstd as configurable options for the compressor.

I’m all in for that if these options don’t require extra software to uncompress. We’re relying on whatever the end user (not the developer/packager) has installed on their machine. tar is available everywhere. I know very little about brotli and nothing at all about zstd, so please educate me: What do you need to uncompress them?

And thanks again for starting caxa! I love the name and story, and please provide some pronunciation tips for non-Portuguese-speakers! ;)

It’s what you’d expect. Google Translate gets it right 😃 📦

@trevyn
Copy link
Author

trevyn commented Feb 24, 2021

Ahhhh, I see. Brotli and zstd have easy command-line utilities available (e.g. brew.sh/brotli, brew.sh/zstd), but these are not nearly widely-enough installed to rely on their presence on end-user machines.

The caxa technique could be used to build a (binary?) executable that embeds a Brotli decompressor (still very much a size win), but that is almost certainly a later optimization.

For reference, using node-v15.10.0-darwin-x64:

77,396,144          node                                  compress  decompress
26,333,789 (100%)   gzip -9 node                          9 sec     0.4 sec
20,028,735 (76.1%)  zstd --ultra -22 node                 40 sec    0.5 sec
18,235,506 (69.2%)  brotli -q 11 --large_window=27 node   265 sec   0.7 sec
17,929,180 (68.1%)  xz -e node                            38 sec    1.5 sec

(And I didn't realize that Google Translate will pronounce arbitrary text strings in different languages! That's awesome.)

@leafac
Copy link
Owner

leafac commented Mar 8, 2021

Hey @trevyn,

Since we talked I released 1.0.0. In order to support Windows, I ended up using an approach that is slightly different, and I think that supporting other compression algorithms would be feasible.

Instead of relying on tar being available on the machine, the stubs used in the self-extracting archives are Go programs that include the uncompression routine. (See the README for more details.) For now we’re still going with plain-old tarballs, because they were easier to work with: Go has support for tarballs in the standard library, the Archiver Node.js package is a convenient way to work with tarballs in Node.js, and if people need to use the stub by some other means then it’s easy to create tarballs from the command line (I show how to do that in the README, and it has come in handy during development and testing).

That said, here’s how we could introduce other algorithms: In the packager (TypeScript), we would compress using other algorithms; and in the stub (Go), we would uncompress using the corresponding algorithm. That simple. We could even support multiple algorithms: The last part of the self-extracting archive is a JSON footer, which could indicate which algorithm the stub should use for uncompression.

I’m sure that there exist good libraries for other compression algorithms in both JavaScript & Go. To use external libraries in Go would require a slightly more sophisticated build setup, but that’s no big deal for the users because we distribute the stubs in compiled form.

Do you want to give this idea a try?

@trevyn
Copy link
Author

trevyn commented Mar 8, 2021

Aha, interesting. If I re-wrote the stub in Rust and it worked well, would you be open to that? Might be able to get the size of the compiled stub down a little bit too.

@leafac
Copy link
Owner

leafac commented Mar 8, 2021

Yeah, sure, in principle I’m okay with Rust. I know almost nothing about either language (Rust & Go) anyway…

My main concern is being welcoming to new contributors. It seems that Go is more popular than Rust: https://insights.stackoverflow.com/survey/2020#most-popular-technologies https://www.jetbrains.com/lp/devecosystem-2020/

In your experience, do you think that Rust vs Go would be an issue for other contributors?

Or do you think that Rust has significant technical advantages that outweigh this consideration?

The size of the binary stub isn’t a big issue: We’re already bundling Node.js which is much bigger, anyway. Also, I don’t that the usual arguments in favor of Rust (the safety of the type system in relation to memory ownership, and so forth) apply here, because the stub so simple and small.

At the end of the day, I think it’s a matter of being familiar to new contributors.

What do you say?

@trevyn
Copy link
Author

trevyn commented Mar 8, 2021

This is a good point — Go is almost certainly easier for contributors with little Go experience than Rust is for contributors with little Rust experience.

I think the biggest immediate technical advantage for this use-case is that we'd probably be able to get close-to-C stub sizes. If this isn't a concern, then the advantages are more nebulous, but keep me in mind if you ever bump into a limitation of Go. ;)

Anyway, it looks pretty straightforward to pop Brotli into today's caxa, I'll give it a shot.

@leafac
Copy link
Owner

leafac commented Mar 8, 2021

I’m looking forward to your Pull Request 🙌

@leafac
Copy link
Owner

leafac commented Aug 25, 2021

@leafac
Copy link
Owner

leafac commented Nov 21, 2023

Hi @trevyn,

Thanks for using caxa and for the conversation here.

I’ve been thinking about the broad strategy employed by caxa and concluded that there is a better way to solve the problem. In this new approach we depend on the user already having the software to extract the package, so we should use the broadly available .tar.gz and .zip formats.

It’s a different enough approach that I think it deserves a new name, and it’s part of a bigger toolset that I’m building, which I call Radically Straightforward · Package.

I’m deprecating caxa and archiving this repository. I invite you to continue the conversation in Radically Straightforward’s issues.

Best.

@leafac leafac closed this as completed Nov 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants