Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change serialization to use topobuf #1142

Closed
maurizi opened this issue Feb 14, 2022 · 1 comment
Closed

Change serialization to use topobuf #1142

maurizi opened this issue Feb 14, 2022 · 1 comment
Assignees

Comments

@maurizi
Copy link
Contributor

maurizi commented Feb 14, 2022

Context

Work on the concurrency ADR showed that we can get significant improvements in file size by using a Protobuffer based file format - I used an outdated version of geobuf to do this - we'll be forking that in #1141 as topobuf

Notes

We changed from JSON -> the v8 serialization format in #1099, which should be able to serve as a template for this PR

AC:

  • Update process-geojson and the backend server code to use topobuf instead of v8.serialize / v8.deserialize
  • Update serialize-topojson to be able to read v8 files and write topobuf files
    • Use the updated serialize-topojson script to make new .pbf files for every region in:
      • Staging
      • Production
      • scripts/load-dev-data
@maurizi maurizi self-assigned this Feb 21, 2022
@maurizi
Copy link
Contributor Author

maurizi commented Feb 23, 2022

Well this was an all-around disappointment.

Testing in staging showed using topobuf instead of v8 deserialization consumes more memory:
(left side is using v8, right is using pbf)

image.png

This is probably something we could fix in the topobuf library at the expense of some CPU cycles... but we're slower than v8 deserialization already 😞

topobuf:

districtbuilder-server-1  | cache-miss San Bernadino: 227.267ms
districtbuilder-server-1  | cache-miss Dane County: 634.601ms
districtbuilder-server-1  | cache-miss Delaware: 826.569ms
districtbuilder-server-1  | cache-miss New Jersey: 3.451s
districtbuilder-server-1  | cache-miss Wisconsin: 5.165s
districtbuilder-server-1  | cache-miss Pennsylvania: 14.393s
districtbuilder-server-1  | cache-miss Illinois: 10.048s

v8:

districtbuilder-server-1  | cache-miss San Bernadino: 137.446ms
districtbuilder-server-1  | cache-miss Dane County: 327.031ms
districtbuilder-server-1  | cache-miss Delaware: 597.973ms
districtbuilder-server-1  | cache-miss New Jersey: 3.232s
districtbuilder-server-1  | cache-miss Wisconsin: 5.475s
districtbuilder-server-1  | cache-miss Pennsylvania: 10.225s
districtbuilder-server-1  | cache-miss Illinois: 8.007s

File size is much much smaller with topobuf, but that is about the only thing it has going for us.
Probably worth looking into topobuf further for potentially sending TopoJSON to the browser due to the much smaller file size, but ultimately I think we should stick with the current serialization method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant