Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change topojson serialization format #1099

Merged
merged 10 commits into from Jan 12, 2022

Conversation

maurizi
Copy link
Contributor

@maurizi maurizi commented Jan 7, 2022

Overview

Upgrading to Node.js 16 in #602 caused a regression in data loading - the maximum size a of string was was reduced from ~1Gb to ~512Mb, which meant we could no longer load our two largest regions by using JSON.parse(buffer.toString())

This PR takes a multi-step approach to tackling this problem:

  • I've switched to serialzing the TopoJSON using the v8 binary serialization format: c2a0dcb, d17738e & dc52da8
  • Added a command to download the existing TopoJSON files (I used streaming JSON so we can load the now too large files w/o needing to downgrade Node temporarily) f8262c2
    • I've already applied this command on all the files used on staging as well as those referenced by load-dev-data, but I still need to update production, and any one-off regions will no longer work unless updated

After testing this branch on staging, I encountered a lot of resource contention and a large number of layers failed to load the first time, so I added 807dc9d & e8ea4f1 to make the TopologyService more robust.

Checklist

  • Description of PR is in an appropriate section of CHANGELOG.md and grouped with similar changes, if possible

Demo

Updated healthcheck, showing which layers are loading / pending.
image

Notes

I was hopeful this would decrease our load times, but it hasn't done so to any meaningful degree. We did buy a lot of headroom in terms of TopoJSON file size - the max file size for v8.deserialize is ~4GB.

Testing Instructions

Try running ./scripts/manage serialize-topojson ... locally, and make sure you can run the server

I tested this against staging, and was able to get staging to load successfully.

Closes #1095

@maurizi maurizi requested a review from ddohler January 7, 2022 20:43
@maurizi maurizi force-pushed the test/mvm/change-topojson-serialization branch from 3ac8ed3 to 16ba9c0 Compare January 7, 2022 20:49
@maurizi maurizi requested review from KlaasH and removed request for ddohler January 10, 2022 16:47
@maurizi maurizi mentioned this pull request Jan 11, 2022
1 task
Copy link
Collaborator

@KlaasH KlaasH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍
I deleted the topo.buf for the NJ config I have in my database, then recreated it and confirmed that it loaded and I could make a map.

Re performance: I found this comparison of Javascript encoding tools, and this PR that adds v8.serialize to the mix. It seems like the tools that are notably faster than v8.serialize require a predefined schema, which is probably more trouble than it would be worth for our purposes.

await asyncLoop(
sortedRegions.map(region => () => {
const promise = this.fetchLayer(region.s3URI, region.archived);
this._layers = { ...this._layers, [region.s3URI]: promise };
Copy link
Collaborator

@KlaasH KlaasH Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish asyncLoop could work as a throttled map rather than a throttled forEach, so we wouldn't have to collect the results as a side effect inside this mapped function. But my attempts to mentally revise it to work that way have not gone well, so this will do just fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I tried / failed at this as well - good to know it wasn't just me 😅

@maurizi maurizi force-pushed the test/mvm/change-topojson-serialization branch from 00380cf to cb22be0 Compare January 12, 2022 14:11
@maurizi maurizi merged commit 523d298 into develop Jan 12, 2022
@maurizi maurizi deleted the test/mvm/change-topojson-serialization branch January 12, 2022 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate issues loading PA LRC region in staging
2 participants