Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress world data to reduce loading time using socket compression options for large data transfer events. #5942

Closed
aaclayton opened this issue Sep 29, 2021 · 50 comments
Assignees
Labels
tech-debt Issues focused on the reduction of technical debt

Comments

@aaclayton
Copy link
Contributor

Originally in GitLab by @manuelVo

Feature Summary

Compress the world data before transmission, to reduce the loading time of foundry.

Priority/Importance

The world data can get quite big rather quickly. For example my pathfinder 1 world with ~70 actors is already over 10 MB in size. This is (at least in part) because feats, spells and classes contain lengthy description texts. Additionally, JSON as a text format is rather wasteful in space usage. Since foundry needs to do a blocking wait for this data, the transmission time of the world data is fully added to the loading time, which can make foundry slow to load over slower connections (especially when self-hosting, where it's not unusual to have upload speeds that don't exceed 1 MB/s; distributing this world to 6 other players would take 1 minute - furthermore this cannot be mitigated via a CDN because it's transmitted through the websocket). Compression can significantly reduce the amount of data that needs to be transmitted (the world data from above takes under 2 MB when it's compressed), thus improving loading times (our previously mentioned slow connection would only take 12 seconds to transmit the data to all players instead of a minute).

Since compression libraries for JS are readily available (maybe socket.io even supports this natively already) this feature should be fairly simple to implement.

Edit: Compressing templates could be worthwhile as well, though that won't be as impactful as compressing the world data.

@aaclayton
Copy link
Contributor Author

It pains me when someone submits an issue and they say "should be simple" because whenever those cursed words are included in the issue description, it's never simple.

Socket.io supports automatic compression of "large" messages (gated by default at 1024kb or greater) which was added in 1.4.0, see https://socket.io/blog/socket-io-1-4-0/

I have experimented with customizing the compression rules with some custom per-message deflate options but this (1) did not work in my experimentation and (2) has several reported memory leaks associated with this feature so I got deterred from investigating it further.

See websockets/ws#1617 and websockets/ws#1369

@aaclayton
Copy link
Contributor Author

Originally in GitLab by @manuelVo

How about a custom compression stack? There are several pure JS compression libraries out there (or even wasm ones, for better performance). All that would be necessary would be to put the world json through the respective compression function and convert the compressed blob to base64 (or better ascii85 for better space efficiency) to send it through socket.io's text channel. When receiving the world, the client just needs to do the same thing reverse: unbase64, decompress, then josn decode; that should be it. Or am I missing something?

(Apparently socket.io has a binary mode that could be used to send the compressed data without using base64 or something similar, but I haven't investigated that thoroughly, so this might be wrong)

@aaclayton
Copy link
Contributor Author

It's a possibility to explore, yes. I don't think we will prioritize it for v9, but maybe if we have some extra bandwidth we can take this up and explore a custom compression/decompression approach.

@aaclayton
Copy link
Contributor Author

Originally in GitLab by @stwlam

Might it be viable to send the initial world data payload over http, with all subsequent data going through a websocket connection? At least then the data could be compressed if piped through a reverse proxy.

@aaclayton
Copy link
Contributor Author

Originally in GitLab by @Weissrolf

The latter suggestion would maybe even work if all database files were initially sent over via HTTP. Because their compressed total size would still be lower than the uncompressed snippet currently sent via websocket.

@aaclayton
Copy link
Contributor Author

Comment from @ben_lubar

Feature Summary

My friend is setting up a D&D game on Foundry, and so far the UI seems really responsive, except for a very long initial load time. I opened my browser's developer tools to investigate and found that the client sends ["world"], and then there is a long pause before the server sends a 91407939 character response. On a 12Mbps upload connection, this would take 60 seconds per player, not including overhead from TCP and so on.

However, if I take the response and pass it through gzip, I get a file that is only 13708208 bytes (under 10 seconds per player). I believe this can be done automatically using the perMessageDeflate option in socket.io, although that may negatively affect the performance of the messages that aren't multiple megabytes in length.

Additionally, the packet seemed to have a lot of unnecessary data, such as objects containing only empty strings or zeroes or nulls as field values and data the client almost definitely doesn't need like Windows filesystem paths. (they're not long individually, but they add up!) Filtering the data that gets sent could save a lot of space (and therefore loading time) without the downsides of perMessageDeflate.

As a third possible solution, the initial state could be presented as a difference from some static file, so that the websocket would only need to describe the differences between the current state and "world state snapshot 1", where the static file would be cacheable by a proxy server or a web browser unlike a websocket message.

User Experience

There does not need to be any user-visible effect of this feature other than the game loading faster.

Priority/Importance

This is firmly in the "would be nice to have" tier of importance.

@aaclayton
Copy link
Contributor Author

Comment from @aaclayton:

The socket.io library and zlip that it depends on has some notable issues with memory fragmentation when using the perMessageDeflate option which is why we have kept it disabled. You are right that this could be beneficial. In my own testing with it (during v9 development) it did make a difference - but I did see some signs of increases in server-side memory profile the longer that the host ran. Overall I don't think this is stable enough (without a viable memory management solution) to enable perMessageDeflate

Some context on the issue here: https://serverfault.com/questions/1066025/how-to-fix-ws-and-socket-io-memory-leak

@aaclayton
Copy link
Contributor Author

Comment from @mkahvi

Compressing just the initial world data message would provide lots of benefits by itself, and if it's limited to just that it should avoid all the memory leak problems except for servers that run for months without restart. Additionally you could provide a setting for it so people can disable it if it causes problems, since this seems to be a linux specific bug?

@aaclayton
Copy link
Contributor Author

Comment from @Weissrolf

Compressing down the initial database messages could squeeze them to about 15-25% their original size using "fastest" deflate. Maybe add a single garbage collection after login/reload to get around the memory leak?

But even without GC I did notice that not all data is initially transferred and that most of it is squeezed into a single large message. So only compressing that one websocket message would already help for a group of 1 GM + 4 players logging in or reloading (F5).

My my real-world PF2E lv1-3 adventure has 10.1 mb worth of data files, out of which 5.5 mb is FoW data for a single largish map (all other maps reset). Upon login as GM 5.34 mb of combined JSON data is send as a single large message to my client, plus other messages adding up to 6.22 mb total. If my actors db was 60 mb worth of data then that single large message would have been over 60 mb large.

PS: For comparison, all non websocket data adds up to 19.65 MB / 9.54 MB transferred in that world, so about half of it is compressed. Most of these show "Cache-Control: no cache" in their header and there doesn't seem to be happening much caching in Firefox even for large background map images. Chrome seems to cache them a bit more aggressively, only transferring 256-259 bytes per file (checking file headers I guess?).

@aaclayton
Copy link
Contributor Author

This could possibly become "bonus" territory in V10, but my past foray into per-message-deflate was stymied by the aforementioned memory leak which served as a substantial deterrent.

@aaclayton
Copy link
Contributor Author

Originally in GitLab by @zeel02

If the memory issue can't be resolved, would it be feasible to offer as a configuration option instead? A user with plenty of memory but poor bandwidth might be willing to accept the tradeoff.

@aaclayton
Copy link
Contributor Author

Originally in GitLab by @Weissrolf

Does this have to be an all or nothing decision? What speaks against only compressing that single large message that includes over 95% of the total initial websocket data? I would expect the memory leak to happen due to many/all messages being compressed, but not necessarily a single one.

And the HTTP idea also holds some merit if websockets is incapable of doing this. In my last example the whole database folder is compressed down to 16% its initial size via "fastest" deflate, from 10.1 mb down to 1.76 mb. That's still much smaller than the 5.34 mb of websocket data currently being sent that don't even include all data.

@aaclayton
Copy link
Contributor Author

No decision at all is going to be made here until and unless we accomplish our stated v10 priorities and have some spare bandwitdh for bonus work. If and when we take on this issue we will explore a wider variety of possibilities to try and accomplish the end goal (compressing the payload) in whatever way is best performing or least risky with regards to potential downsides.

@aaclayton
Copy link
Contributor Author

Originally in GitLab by @Weissrolf

I thought I had checked that I did not disable the cache in developer tools, but then I mixed that up with the Chrome developer tools. So I had the cache enabled in Chrome, but accidentally disabled in Firefox. Sorry for the confusion and good to know it's working (cached) all properly.

@Weissrolf
Copy link

For reference: the Intel Celeron J3355 of my Synology DS218+ takes just 1-2 seconds to compress the data folder of a world (while data scrubbing of its other drive is running in the background). So the CPU load/time seems to be well worth the savings in transfer size.

@aaclayton aaclayton removed this from the Version 10 - Intended milestone Jul 4, 2022
@esb
Copy link

esb commented Nov 6, 2022

It would be great for something to happen on this issue.

I have a long running campaign (over 2 years now) that has hit about 70 Mb in world data. In Australia, we don't have super-fast broadband, with fibre out of the reach of most people. Indeed we have players still on ADSL. Even in the best conditions, it can take 5 minutes to load a game, and for people on very slow connections they tend to keep trying to refresh the page after a few minutes, which just makes things worse.

Adding a compression option for the websocket data would be a massive boost for performance, and help out those of us who don't have the luxury of great internet speeds.

@unsoluble
Copy link

For what it’s worth there are a number of other ways to optimize load times; we can work through these with you on the discord.

@esb
Copy link

esb commented Nov 13, 2022

I modified my server code to add the "permessage-deflate" option on the socket.io server and we ran a 6 hour session last night with this updated code.

All I can say is that the results were stunning! Everyone commented on how much faster things were running, and even players on the slowest of connections were able to participate in a way that they'd been unable to in the past with the standard Foundry code.

I monitored memory usage over the session and couldn't see any downsides to this change. If there is indeed a memory leak problem, then its impacts appear to be overstated as far as Foundry is concerned.

I'd highly recommend this change for all users.

@aaclayton
Copy link
Contributor Author

As mentioned here #5942 (comment) the permessage-deflate option has some unresolved issues with memory fragmentation/leaks which will gradually degrade the performance of long-running servers. It's not an option we can enable until those issues are fixed.

Our hope is to approach this in a different way by virtue of #4314

@esb
Copy link

esb commented Nov 13, 2022

My take on this would be that a Foundry server is never going to be used as a long-running server with large numbers of users. If there are memory problems, they're unlikely to be of such significance that they become an issue, and isn't it better to solve a real problem now with a relatively simple fix that gives users a significant performance boost?

If there is a real problem with memory, then the compression could be made optional to allow users to make their own choice.

At the moment, all that's on offer is that maybe the performance will be addressed at some stage in the future, because the solution available today "might" cause some unquantifiable problem now.

Is there any evidence that memory fragmentation/leaks in socket.io are going to have an impact on Foundry users? I'd have no problem restarting my server before each game, or doing so on a regular basis if needed just to get some major increases in performance for users on slow connections.

@Weissrolf
Copy link

@esb Would you publish your approach how to enable compression? What files have to be edited how?

The main benefit will be at login when the large non image/audio based db packages are transferred. Afterwards I imagine that fog of war data will benefit. No idea if the memory leak is data cumulating based or time based, but I can live with a restart from time to time.

A quick fix would have been to only transfer the initial login package and then switch compression off. Now another long-term solution is being worked on, which I appreciate. But until this is implemented I would like to try the "hack" instead.

@mkahvi
Copy link

mkahvi commented Jan 10, 2023

@esb

Is there any evidence that memory fragmentation/leaks in socket.io are going to have an impact on Foundry users?

I've heard some people running same world for months with no restarts or even returning to setup, so these seem very likely to encounter it if it is serious concern.


Besides that, the per-message leak upstream issue has been closed.

And the other issue mentions that it has been mitigated with other changes. Though the second issue is about turning compression sync, not about memory leak directly, so it might not reflect the memory leak status correctly.

NodeJS issue 8871 might still be relevant however (or not? they don't seem to be sure if it should be called themselves).


Also, I'd like to have an option for enabling compression on my end if this is not enabled wholesale. Windows and Mac were from the start unaffected by this issue as far as I understand.

@esb
Copy link

esb commented Jan 10, 2023

I've enabled socket compression and the server has been running for months without a restart. This has caused zero impact in terms of memory leaks, but there has been a real gain from the reduced load times. Previously we had players on really slow connections who would struggle to participate in a game due to timeouts and data loss, and this has all gone away for them now.

I understand that at this time, this is probably a top priority, but it's a really trivial patch to the code that could be made optional and only enabled for those that want to try it.

@esheyw
Copy link

esheyw commented Jan 10, 2023

I've enabled socket compression and the server has been running for months without a restart. This has caused zero impact in terms of memory leaks, but there has been a real gain from the reduced load times. Previously we had players on really slow connections who would struggle to participate in a game due to timeouts and data loss, and this has all gone away for them now.

I understand that at this time, this is probably a top priority, but it's a really trivial patch to the code that could be made optional and only enabled for those that want to try it.

could you share said patch somewhere? I'd like to try this out, two of my players share a not-very-good connection.

@Nordiii
Copy link

Nordiii commented Jan 13, 2023

I've enabled socket compression and the server has been running for months without a restart. This has caused zero impact in terms of memory leaks, but there has been a real gain from the reduced load times. Previously we had players on really slow connections who would struggle to participate in a game due to timeouts and data loss, and this has all gone away for them now.

I understand that at this time, this is probably a top priority, but it's a really trivial patch to the code that could be made optional and only enabled for those that want to try it.

I too would be interessted how to enable this.

EDIT:
Sadly changing a line in this file did not work:

/home/foundry/resources/app/dist/server/express.mjs
When creating SocketServer, setting perMessageDeflate: true did not help

Firefox headers suggest that it is defleating (Sec-WebSocket-Extensions: permessage-deflate is set) but it still needs around 7-8 Seconds to transfer the ~43mbyte of World data which would match my up to 50mbit/s upload speed. The server restarts every day anyway to take a backup so I would much rather deal with memory leakage instead of waiting for me and my players to download 43mb 6 times. I understand that this would be horrible for teh average customer without any knowledge but an maybe easy although hidden way to enable this would be nice.

EDIT2: This runs behind NGINX but I do not think that nginx uncompresses the websocket again.

@mkahvi
Copy link

mkahvi commented Jan 14, 2023

/home/foundry/resources/app/dist/server/express.mjs When creating SocketServer, setting perMessageDeflate: true did not help

Replacing .emit() calls with .compress(true).emit() seems to have some success with perMessageDeflate enabled (at least my player said world load was faster after, we didn't check if it actually was). Though you need to be careful what you replace since there's some non-socket.io .emit() calls that break with this change.

@gabriellet
Copy link

Hi, I'm running into a very similar problem with poor load times (over a minute, sometimes more, even on a good network connection) for a fairly large world. I am trying to see if the perMessageDeflate option will improve performance at all and had a couple questions.

  1. Which emit() calls did you replace with .compress(true).emit()? I see a few, and I only have the deminified source code, so the variable names are tricky infer about.
  2. Did you provide any additional options when setting perMessageDeflate? I see from the Socket.io documentation here (https://socket.io/docs/v3/server-initialization/#permessagedeflate) that you can provide additional options.

@mkahvi
Copy link

mkahvi commented Jan 14, 2023

Everything pretty much. Except the ones that had this.emit

It'd be smarter to put it only in specific ones, but I didn't look that closely, just enough to not break things.

@Nordiii
Copy link

Nordiii commented Jan 14, 2023

/home/foundry/resources/app/dist/server/express.mjs When creating SocketServer, setting perMessageDeflate: true did not help

Replacing .emit() calls with .compress(true).emit() seems to have some success with perMessageDeflate enabled (at least my player said world load was faster after, we didn't check if it actually was). Though you need to be careful what you replace since there's some non-socket.io .emit() calls that break with this change.

That makes a lot of sense that you need to add compress(true).
In which file do I need to add this line? Found some emits in foundry.js and game.js but adding .compress() in the getData() function does not work. Eventhough the comment for the function says it loads world data.

@aaclayton
Copy link
Contributor Author

aaclayton commented Jan 14, 2023

Hey folks, we are all here for the same reason to provide feedback so that Foundry VTT can be better in the future, but our GitHub issue tracker is not a forum for discussion or tutorials about how to modify our server-side code. We don't support such modifications nor can we support you if they don't work properly.

@Nordiii
Copy link

Nordiii commented Jan 14, 2023

While I understand the sentiment especially from a support standpoint if find it quite sad that discussion about how to do this, between people which are not affiliated with Foundry, is disapproved.

I bought the AV Premium module and selfhost, just importing the Adventure is already around a whopping 27mb of world data…. This is a load time of around 5-6 seconds per player on my bandwidth. Using an online tool to deflate the world data reduces the amount to around 13% of the original size.

So, in my opinion this topic of modifying it myself is relevant even though this does not help on how to improve it for everyone.
And I totally understand that the Foundry Team does not support modified Foundry Versions nor explain how to modify it.

@esb
Copy link

esb commented Jan 14, 2023

My original intent here was to indicate that this was a viable strategy for improving performance. There is an architectural limitation with how Foundry transfers data to the client, and this does become critical for large worlds, especially with clients who have slow connections.

I have found that the perMessageDeflate does work, although I think you need to do a full restart to get it to work. I certainly didn't do anything to modify emit calls, but I can see from my browser traces that the option is working.

I refrained from trying to provide patch instructions as this can lead to messy problems as people can end up nuking their systems. You need a fair degree of experience to perform such a modification, so I figured that if you have that experience, then you should be able to work out what to change anyway.

Going forward, I would hope that this activity encourages the Foundry team to re-examine the idea of adding this option to the code. It's a very simple option to add. It's literally one line of code to enable, and a small amount extra to add an interface option to enable/disable the option.

Maybe all the Foundry developers have super fast fibre connections, so it doesn't matter to them, but some of us live in poor countries like Australia which is near the bottom of the table of Internet infrastructure after a decade of neglect by the conservative government. This sort of stuff can be really important to our experience in using Foundry, so please reconsider adding it as an option.

@aaclayton
Copy link
Contributor Author

The fact that this is an open issue and not something that is closed is confirmation that we would like to improve performance here in the future. It's definitely something we will take a look at for future Foundry VTT updates.

@gabriellet
Copy link

gabriellet commented Jan 17, 2023

I totally understand not supporting any modifications to server code. I wouldn’t want to support that if I were in your shoes. Like some other folks have said, I was curious if this approach would actually improve performance so that I could chime in here with something that works on the off chance it would be helpful. I wouldn’t recommend anyone inexperienced in programming try any of this, and only undertook it myself as a professional.

For what it’s worth I’m running into this issue with a group based in the US, some of whom have fiber connections. I ended up at this GitHub issue because of how long the load times have gotten as our campaign goes on, and some professional curiosity.

Glad to hear it’s still being considered.

@Nordiii
Copy link

Nordiii commented Jan 18, 2023

My original intent here was to indicate that this was a viable strategy for improving performance. There is an architectural limitation with how Foundry transfers data to the client, and this does become critical for large worlds, especially with clients who have slow connections.

I have found that the perMessageDeflate does work, although I think you need to do a full restart to get it to work. I certainly didn't do anything to modify emit calls, but I can see from my browser traces that the option is working.

I refrained from trying to provide patch instructions as this can lead to messy problems as people can end up nuking their systems. You need a fair degree of experience to perform such a modification, so I figured that if you have that experience, then you should be able to work out what to change anyway.

Going forward, I would hope that this activity encourages the Foundry team to re-examine the idea of adding this option to the code. It's a very simple option to add. It's literally one line of code to enable, and a small amount extra to add an interface option to enable/disable the option.

Maybe all the Foundry developers have super fast fibre connections, so it doesn't matter to them, but some of us live in poor countries like Australia which is near the bottom of the table of Internet infrastructure after a decade of neglect by the conservative government. This sort of stuff can be really important to our experience in using Foundry, so please reconsider adding it as an option.

Sadly adding only perMessageDeflate did not help. The headers are now set if you look at the socket but the data does not get compressed. I did fully reboot the Foundry server.

@esb
Copy link

esb commented Jan 18, 2023

Sadly adding only perMessageDeflate did not help. The headers are now set if you look at the socket but the data does not get compressed. I did fully reboot the Foundry server.

I can confirm that adding the perMessageDeflate option led to a significant drop in the time to load the initial scene. I timed the same session load with and without the option, and there was a massive improvement in the load times with nothing else changing.

@Nordiii
Copy link

Nordiii commented Jan 18, 2023

Sadly adding only perMessageDeflate did not help. The headers are now set if you look at the socket but the data does not get compressed. I did fully reboot the Foundry server.

I can confirm that adding the perMessageDeflate option led to a significant drop in the time to load the initial scene. I timed the same session load with and without the option, and there was a massive improvement in the load times with nothing else changing.

May I ask in which file?

@Weissrolf
Copy link

Weissrolf commented Jan 18, 2023

Instead of using a timer you can also use your browsers' network debugging tools (F12). They can tell you how much data was transferred. Firefox is more useful than Chrome in this regard. Unfortunately the websocket transfer is never marked as "finished", which makes it a bit harder to analyze.

@aaclayton aaclayton added this to the Version 11 - Prototype 2 milestone Jan 18, 2023
@Weissrolf
Copy link

Weissrolf commented Jan 18, 2023

aaclayton added this to the Version 11 - Prototype 2 milestone 10 minutes ago

Thanks. ;-)

@LukeAbby
Copy link

LukeAbby commented Jan 31, 2023

I feel like client-side caching is almost more essential than compression. For context one of my players has really bad internet and so it can take them minutes to load into a world that takes me a couple of seconds to load into. Here's what I mean by that: running gzip worldData.json; gzip -l worldData.json.gz[1] reports shaving off ~30.4% of the size at the default level 6 compression and level 9 doesn't end up compressing it any further. This this would turn 2 minutes of loading the world data[2] into a 1 minute 18 second load time which certainly is better. Caching would turn it into a one time 2 minute load and then nearly instant loads thereon out.

This may be a second reason to suggest sending world data through http, allowing the server to begin to responding with a 304 after the first load. Of course if caching through the websocket were implemented I would not mind as the end result would be the same but 304s have out of the box support for stuff I presume will be useful like etags[3] which should more easily allow updating if required (e.g. enabling/disabling modules/packs).

Though unfortunately if I had to guess there might be some architectural work involved. Specifically I think you would have to split out the portions of world data that can frequently change from that which is basically never going to change. Packs and modules and the like seem to take up the majority of the space in my world data and I don't expect that to change often.

I would just like to note that my own load takes about a dozen seconds despite having about 400 Mbps download according to speedtest.net and other sites. My server is hosted on Oracle with these specs, the instance metrics don't seem to indicate that there's anything being bottlenecked (e.g. cpu etc.) and I've transferred GBs of files onto and out of the server before fairly quickly. This would indicate to me that there's some other factor that may be slow, maybe vendoring on the server side but I don't have a super clear picture on that. I can open this as another issue if this would help tracking but I just happened to be debugging these issues recently and stumbled upon this issue and noticed it's getting added to the V11 Prototype 2.

[1] I mention gzip, specifically the deflate compression algorithm as it is by far the most popular option on the web. Brotli shaves off ~31.8% of the size in comparison but may be less supported.
[2] It's accurate to say that world data takes minutes for this player. World data is ~95.27% of the actually transferred data according to the Firefox console. If the other http requests weren't getting caching it would be ~75.2% of the transferred data. 2.95 MB is always transferred over HTTP though it would be 19.60 MB uncached, the world data is 58.91 MB.
[3] I believe ExpressJS will automatically send etags for all content based upon documentation and StackOverflow.

@kakaroto
Copy link

kakaroto commented Feb 4, 2023

I'm gonna blame @LukeAbby for nerd-sniping me earlier this week by linking me to this issue (and messing with my sleep and time off), and @aaclayton can thank him instead!

As I hadn't realized that the websocket data was uncompressed, I've decided to experiment and see if we could achieve that for The Forge, since we're already acting as a middleman/proxy between the user and the Foundry server and we have gzip compression enabled for other HTTP requests via our load balancer.
I've changed our passthrough proxy approach to instead be a websocket server which then connects to Foundry as a client on the local network (where compression is less useful) and emits messages back and forth between the client and the foundry server.
Doing that had a small impact on performance (and I'm looking at ways to improve that, possibly by using custom parsers to pass events from the server to the client without going through encoding/decoding of the messages themselves) but it allowed us to have the deflate option enabled on our server for the client to use.

Loading speeds

I tested a small and a large world, with my normal (fast) internet (500 Mbps fiber, about 10ms latency) and using chrome network throttling to fast 3g (which is 1.44Mbps with 500ms latency).
I tested 3 reloads, on each of the 2 worlds, with and without deflate..

  • small world (3.5MB)

    • fast connection
      • normal socket: 1.1s, 1.0s, 1.0s (Average of 1 seconds to load)
      • using deflate: 1.0s, 1.1s, 1.2s (Average of 1.1 seconds to load)
    • slow connection
      • normal socket: 31.2s, 31.4s, 30.6s (Average of 31 seconds to load)
      • using deflate: 13.9s, 13.5s, 13.5s (Average of 13.6 seconds to load)
  • large world (37MB)

    • fast connection
      • normal socket: 4.4s, 4.1s, 3.6s (Average of 4 seconds to load)
      • using deflate: 6.1s, 5.7s, 5.5s (Average of 5.7 seconds to load)
    • slow connection
      • normal socket: 5:25.1m, 5:25.0m, 5.25.5m (Average of 5 minutes 25 seconds to load)
      • using deflate: 2:06.9m, 2:08.6m, 2:08.1m (Average of 2 minutes 8 seconds to load)

So we can see a small decrease in performance for users with very fast connections probably due to the compression/decompression taking more time than what it would have taken to transfer that uncompressed data over the network. There's also a not-insignificant hit on the performance due to my own server having both the socket.io server and client stacks running instead of being a direct socket passthrough.
To eliminate that discrepancy in the data, I did 4 more tests with normal passthrough, with the websocket proxy with deflate off, and the websocket proxy with deflate on:

  • Passthrough: 3.8 - 4.0 - 3.9 - 3.9 (average 3.9 seconds)
  • Proxy deflate false: 4.6 - 5.1 - 4.9 - 4.5 (average 4.77 seconds)
  • Proxy deflate true: 5.1 - 5.2 - 5.4 - 5.5 (average 5.3 seconds)

Which does confirm that there is some small slowdown with deflate on for users with a very good connection, but not as bad as it originally seemed. If the option was in Foundry itself, the difference would be very small indeed.

The speed advantage however on a slow connection are definitely not insignificant! And I would expect the larger the world and slower the internet connection, the more of an impact it will have of course. For many users, loading time going from 5 minutes to 2 minutes could make a real difference in their ability to play.

Bandwidth usage

I've also tested bandwidth usage (though not on those 2 worlds in production, I only had a 19MB world to test with on my dev environment, and I could only check for packet sizes by capturing network packets in an unencrypted local env):

  • normal world: 19 638 077 bytes
  • deflated world: 2 485 684 bytes

I've also done testing with replacing the client side socket.io library with one that uses the msgpack parser for encoding the messages, instead of JSON. This made all the data over the websocket be transferred as binary and using less space, but I ended up not pursuing this further as it was adding more overhead and had insignificant improvements once the deflate option was enbaled.

  • msgpack: 16 841 423 bytes
  • msgpack-deflate: 2 462 442 bytes

So, my 19.6MiB world became 16.8Mib with msgpack, but they were both equal at 2.4Mib in size with the deflate option enabled. While msgpack did provide a lower world size, it's relatively insignificant compared to the deflate option and therefore, msgpack or alternative encodings is likely not worth pursuing, as long as we have deflate instead.

Memory leak concerns

Now, I haven't yet looked into the specific consequences of CPU and RAM usage on enabling the option on my servers, but I was concerned by @aaclayton's claim of memory leaks from that option and I've investigated this thoroughly. Once I have some actual data from real-world testing, I'll be able to report back on the effects of enabling the option, however, I can already guess that it is perfectly safe to use the option without any negative impact on RAM usage or server performance.

I have experimented with customizing the compression rules with some custom per-message deflate options but this (1) did not work in my experimentation and (2) has several reported memory leaks associated with this feature so I got deterred from investigating it further.

See websockets/ws#1617 and websockets/ws#1369

Comment from @aaclayton:

The socket.io library and zlip that it depends on has some notable issues with memory fragmentation when using the perMessageDeflate option which is why we have kept it disabled. You are right that this could be beneficial. In my own testing with it (during v9 development) it did make a difference - but I did see some signs of increases in server-side memory profile the longer that the host ran. Overall I don't think this is stable enough (without a viable memory management solution) to enable perMessageDeflate

Some context on the issue here: serverfault.com/questions/1066025/how-to-fix-ws-and-socket-io-memory-leak

I've read through most of those issues, and here's what I found:

  • This isn't a memory leak issue at all, but rather a memory fragmentation issue (meaning that in your 1GB of RAM usage, you may only use 100MB of it, but it's all small chunks here and there which means the next block you want to allocate can't find a contiguous space in that 1GB so it's forced to allocate it at the end of the 1GB, increasing it, until it runs out of RAM)
  • The memory fragmentation issue is caused by the zlib context being created, and only seems to happen if you create thousands of zlib context at the same time
  • The initial zlib memory fragmentation issue has already been mitigated by Node : zlib: switch to lazy init for zlib streams nodejs/node#34048 (basically by waiting until zlib processing to create the zlib context, so that if you have thousands running in parallel, they don't all fragment the RAM at once). This was merged in Node 14.7.0 and backported to node 12.19.0
  • The fragmentation issue was also mitigated by socket.io itself by imposing a limit on the number of parallel deflates that can happen at once, so that if you receive 10000 messages at once that need to be decompressed, socket.io will queue them and only deflate them 5 at a time, which should fix the issue entirely (68MB of RAM used for 5 concurrent deflates, 200MB used for 1000 concurrent, and 3GB used with no queuing system, as per RFC: Concurrency limit on zlib websockets/ws#1202). It was fixed in 2017 with the default being 10 concurrent deflates: [feature]: zlib deflate concurrency limit websockets/ws#1204
  • A user of websocket in a production environment confirmed that they don't see the issue anymore with the deflate option enabled and has re-enabled it in their app without noticing memleaks (Memory leak socketio/socket.io#3477 (comment))
  • Despite this, there will still be some memory fragmentation though it will be rather small due to the queuing of concurrent requests, but the memory fragmentation issue seems to be entirely taken care of by using the jemalloc library to replace the glibc's memory allocator. This was confirmed by multiple node devs as well throughout all that issue. It's probably something we'll use as a replacement for the glibc allocate on our Forge deployment, though I don't think it's something worth investigating from Foundry's side, since the FVTT server will never be handling thousands/millions of requests in parallel, which is the real issue causing memory fragmentation in a production environment
  • There was another memory leak in socket.io which happened as a consequence of the previously mentionned fix but it was fixed in August 2019 in websocket 7.1.2 (PerMessageDeflate#cleanup - properly cleanup on close websockets/ws#1618) which became a dependency of engine.io/socket.io since the 3.0.0 releases (socketio/engine.io@c471e03)

Conclusion

In conclusion... enabling deflate significantly reduces the amount of traffic used and significantly decreases the time to load the world. It may or may not add a small overhead that wouldn't be particularly noticeable for users with a very fast connection or on localhost.
The concerns about the memory leak seem to be unfounded as it has been issues fixed as far back as 2019, via both a mitigation in the websocket library as well as in Node itself. That being said, I don't yet have any hard experimental data in terms of RAM/CPU usage comparisons between the deflate option being enabled/disabled.
In light of this, I think that it is definitely something that would be worth exploring for an upcoming update of the Foundry core software.
From our side, we'll continue to investigate this for The Forge and hopefully soon have that option working via our proxy setup and improving the user experience of all users, regardless of the Foundry version used, basically backporting it to previous releases. It would still be nice if non-Forge users would also benefit from the loading speed improvements once v11 drops, if the option is enabled there.
I'd recommend testing it on your side, and possibly enabling it in the next prototype release for more people to test and report if they notice any issues due to the option being enabled.

I hope this research was helpful to you.
Thanks for reading!

@Weissrolf
Copy link

Weissrolf commented Feb 4, 2023

Thanks for the hard work and sharing it with the community. One could argue that the transfer of data is one of Foundry's main jobs, so this should not be postponed forever. And as far as I understood your timing tests were done for a single connection, so multiple users connecting (or reloading) at the same time may show even more impact.

Fog of war improvements will also play into this as even a single map's fow can be of considerable size.

@aaclayton
Copy link
Contributor Author

@kakaroto thanks for sharing your findings. We do have this issue scoped for our current Prototype 2 milestone, so we'll investigate on our end and likely make this change.

@aaclayton aaclayton self-assigned this Feb 13, 2023
@aaclayton
Copy link
Contributor Author

I spent some time today working on this and encountered frustration in my inability to verify the benefit of these proposed changes.

I'm going to outline my findings. Perhaps @kakaroto or @esb who implemented changes and measured a different might have perspectives on why I'm not observing any benefit.

Test Setup

  • Medium-sized world
  • Canvas disabled so that the only network requests on initial world load are those related to transfer of the world data payload
  • Disable Cache feature of Chrome devtools enabled
  • Node.js 18

On the client side, perMessageDeflate is true by default (https://socket.io/docs/v3/client-api/#io) and does not need to be enabled. This is verified by the headers.

On the server-side, the Deflate Off scenario is perMessageDeflate: false (the default value and current V10 configuration), the Deflate On scenario is perMessageDeflate: true with no advanced configuration options, just the defaults.


Scenario 1 - Deflate Off

The first test is the status-quo scenario of perMessageDeflate: false.

Headers

Image

You see from the websocket headers that the client requests permessage-deflate but the server does not respond with that extension enabled.

Data Transfer

Image

This scenario transferred 3.4MB of data. The world data payload was the following message:

Image

Using Firefox dev tools instead reveals that 9.51MB of data was received:

Image


Scenario 2 - Deflate On

Now I repeat the same test with server-side perMessageDeflate: true.

Headers

Image

We can now see from the headers that both the client and server request permessage-deflate as a websocket extension.

Data Transfer

Image

Chrome reports the exact same amount of data transfer, 3.4MB although there is a small difference in speed.

The world data payload was the following message and is the same length as before:

Image

Using Firefox dev tools also reports the same 9.51MB of data received:

Image


Open Questions

  1. I don't know what to make of the discrepency between the 3.4MB of reported transfer in Chrome vs. the 9.51 reported transfer in Firefox.
  2. Why is there no difference in the amount of data transfer with perMessageDeflate on vs. off?
  3. What configurations were @kakaroto or @esb using that did observe a measurable difference in the amount of data transfer due to compression?
  4. What else am I missing?

@mkahvi
Copy link

mkahvi commented Feb 16, 2023

Only thing I can think of from the presented info to be missing is .compress(true) from .emit() calls. Simple mistake, but seems unlikely, but it needs to be there with perMessageDeflate.

@kakaroto
Copy link

Why is there no difference in the amount of data transfer with perMessageDeflate on vs. off?

I think that if you use the devtools to check the size, you won't see any difference, because that's post-deflate sizes, because yes, the message is and will always be 9348171 bytes, but the amount of data actually transferred over the network will be difference.
Basically, you were expecting the size shown in chrome to be the actual amount transferred, like for HTTP requests:
image

But for the websocket, Chrome will only show the size of the parsed data, not of what is transferred over the network.
I don't know if that's what accounts for the discrepancy between chrome and firefox, but it might be, though I think 3.4MB would be quite a lot for a compressed 9MB world (since mine went from 19MB to 2.4MB)

The only way I found to actually check for those sizes is by using Wireshark/tcpdump and analyzing the actual websocket message data. The only time I alluded to it though in my long post from before was in this line:

I've also tested bandwidth usage [...] I could only check for packet sizes by capturing network packets in an unencrypted local env

Basically, use non-ssl to be able to capture the content unencrypted, use Wireshark or tcpdump to dump the network packets (then open it in Wireshark) as you refresh the page.

  • Find the websocket connection (I filter by the 'http' rule, then sort by the "info" column to more easily find the "HTTP/1.1 101 Switching Protocols" one)
    image
  • Right click it, then "Follow TCP Stream".
  • Inside the TCP stream, switch to Hex Dump view
    image

At this point, you can probably just check the size of the conversion as a whole to get an estimate of how much data was transferred:
image
This shows a 2.5MB versus a 16MB conversation, which is not actually accurate because for some reason, I had a few packets that didn't get captured in my tcpdump, and it didn't make sense that a 19MB uncompressed world was using only 16MB, so I had to look at the websocket protocol itself to be sure.

So what you can do to see the actual size of the compressed world, which is what I used in my report above, is that you can analyze the packets being sent/received and look for the size in their binary header. That's what Chrome/Firefox does as it decodes those packets and only shows the content itself, and since the deflate protocol is part of the protocol itself, it happens at an OSI layer above the application layer.

You can use the websockets RFC to understand each field, more specifically the chapter 5 Data Framing: https://www.rfc-editor.org/rfc/rfc6455#section-5
Here's the header representation:
image

Here's a good sample of the data, which shows 5 messages being sent and received. Red is sent by the client to the server and blue is what we received from the server. You can see that the client is always sending compressed data, even for small packets, but Foundry server is sending a mix between compressed and uncompressed data, likely due to the 1024 bytes threshold limit as default for the deflate option on the server.
image
For fun (at least, it's my idea of fun 😁), let's analyze that first packet the client sent.
Header is 0xc1 84 67 94 ca 76 where 0xc1 represents the following bits:
FIN: 1 (it's the final fragment of the message)
RSV1: 1 (this is the field to represent whether it's using deflate or not)
RSV2, RSV3: 0
Opcode: 1 (text frame)

The next byte is 0x84 which is:
Mask: 1 (means there's 4 bytes after the size that represents a mask for the data)
Payload size: 4
Then we have 0x6794ca76 which is the mask, and finally the 4 bytes of the payload, 0x55a5ca76 which is the masked data.
The next packet which the server sends is the session line, it's simpler:
image
Header: 0x8120
FIN: 1
RSV1, RSV2, RSV3: 0 (not compressed)
Mask: 0 (not masked)
Payload size: 0x20
You then have 0x20 (32) bytes of non compressed and non masked, plaintext data. Then there's another uncompressed/plaintext message from the server with header 0x8152 which has a size of 0x52 bytes.
Then another message sent by the client which is compressed and has a size of 0xe bytes and includes a mask.
Then we get to the interesting part, the message where the world data is being sent by the server, compressed:

image
Header: 0xc1 7f 00 00 00 00 00 25 ed b4
FIN: 1
RSV1: 1 (compressed)
Payload size: 0x7f = 127
According to the RFC, a size of 127 means that it uses the 64 bits extended size field:
image
This means that the next 8 bytes (00 00 00 00 00 25 ed b4) are the size of this packet, and since it's big endian formatting, it's easy to validate that 0x25edb4 = 2485684 = 2.4MiB

Compared now with the packet with deflate disabled (same world)
image
Header: 0x81 7f 00 00 00 00 01 2b e8 e8
FIN: 1
RSV: 0 (not compressed)
Payload size: 0x7f
Extended payload size: 0x12be8e8 = 19654888 = 19.6MiB

You'll notice how the size was indeed 19.6MB even though Wireshark said "Entire conversation (16MB)" and I remember seeing a [TCP packets missing...] line somewhere in my huge dump, but I can't find it at this time in the dump.

What configurations were @kakaroto or @esb using that did observe a measurable difference in the amount of data transfer due to compression?

Same configuration, just make sure the websocket response has Sec-WebSocket-Extensions: permessage-deflate in the response headers. That'll tell you for sure that it was configured correctly.

What else am I missing?

Likely just the way to verify the size. I also had the same problem initially before I figured out chrome won't show me the numbers I was looking for.

For your purposes, just looking at the size estimate from Wireshark's "Entire conversation" line can be enough to prove that the compression did occur and impact significantly the size transferred, but if you want an accurate read on the world size transferred, The TLDR of it is to check the first two bytes of the packet, if it's 0x81 then it's uncompressed, and if it's 0xc1, it's compressed. Then the next byte should be 0x7f since it's a big packet, then the next 8 bytes are the actual size you're looking for. With that in mind, it should be fairly easy to see how much the world was compressed.

I hope this helps, and I hope this was as fun for you to read/follow than it was for me to discover and write it down :)

@kakaroto
Copy link

kakaroto commented Feb 16, 2023

Annnnd just after posting, I realized I'm a total idiot and didn't use my tools properly and overcomplicated things for no good reason (other than it being fun, and me being used to reverse engineering things the hard way due to proprietary formats).

There's a much much easier way to check for the packet sizes with wireshark, which I'm sure you'll appreciate 😁🤦‍♂️
Instead of doing a filter for 'http', do a filter for 'websocket' instead, then sort by the Length column. Click on the large packet sizes, expand the "Websocket" line and you'll have your size pre-parsed for you.
With deflate on:
image
image

With deflate off:
image
(notice how the one with the world was the second larger packet, it's because the first one is one TCP frame which contains multiple websocket frames sent together, so they ended up being larger than the first world packet)
image

EDIT:

Only thing I can think of from the presented info to be missing is .compress(true) from .emit() calls. Simple mistake, but seems unlikely, but it needs to be there with perMessageDeflate.

No, it's not needed, you only need to do it to force a packet to be compressed, but the default options is that any packet with a size larger than 1024 bytes will be compressed by default.

@aaclayton
Copy link
Contributor Author

Thanks for the in-depth response @kakaroto - it sounds like the TL;DR is that the data is getting compressed as intended with perMessageDeflate set to true, but the effects of that are not able to be detected in Chrome or Firefox dev tools. That's a bit frustrating, but I suppose the important part is that the feature is "working" - although it does as mentioned before likely make load times slightly slower for small worlds or for users with fast connections.

I'll include this change in V11P2 and see how it fares in the wild.

@kakaroto
Copy link

No problem, and yep, that's the TL;DR :) But I figured a more in-depth response was needed as you likely wanted to verify the claim that it had an actual effect.

As for the slightly slower load times, I actually have an update on that. I've implemented a custom parser for the websocket which doesn't unpack/repack the data to avoid doing a JSON.parse/JSON.stringify needlessly, and the result is that the large world which was loading in 4 seconds and became 5.7 seconds with the deflate option, it dropped it to 4.4 seconds, which is not as bad. I would bet that the slowdown would be completely insignificant if it's implemented directly within Foundry itself, and most of my added latency is because of the socket.io server+client stack that I have to put in the middle of the connection, which isn't a problem for Foundry's server itself.

Glad to see this included in v11p2, I can't wait to test it out natively!

@aaclayton aaclayton removed the blocked Issues which are unable to be worked on until others are resolved label Feb 20, 2023
@Weissrolf
Copy link

I did a quick test using a PF2 Abomination Fault server to load websocket data with and without compression, measuring via a TCP packet sniffer on my local ethernet port.

The data is compressed manyfold now, which should help whenever multiple players connect at the same time or everyone is forced to do a reload. Thanks for the implementation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tech-debt Issues focused on the reduction of technical debt
Projects
Status: Done
Development

No branches or pull requests

10 participants