Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compressing the discovery cache #2321

Open
jorenham opened this issue Jan 16, 2024 · 1 comment
Open

Compressing the discovery cache #2321

jorenham opened this issue Jan 16, 2024 · 1 comment
Labels
type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@jorenham
Copy link
Contributor

jorenham commented Jan 16, 2024

#2315 shaved off a bit more than 20MB of the discovery cache, by removing the json indentation.
Currently there's still 74.6MB remaining, and it appears to be growing steadily over time (from 81MB on 2023-03-31 to 94MB on 2024-01-15).

So while #2315 definitely helped, I believe that it's a good idea to consider reducing the size even further. I believe that this could significantly improve build/deploy times of the many docker images that use this library. For reference, the current latest python-slim docker image is less than 50MB.

An easy win would be to use compression.
To illustrate: if I manually zip the entire documents directory of 74.6MB in v2.114.0 using the pop-os default gnome archive manager, I end up with an archive of 11.9MB.
Creating a documents.tar.xz in the same way makes this 4.6MB.

It should be possible to achieve similar levels of compression by using Python's standard compression libraries, e.g. zlib, zipfile, or lzma.

@parthea parthea added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Jan 25, 2024
@parthea
Copy link
Contributor

parthea commented Jan 25, 2024

PRs are welcome!

We should ensure that when we retrieve the discovery artifact from the compressed file using get_static_doc that we only extract the file for the requested service and version.

def get_static_doc(serviceName, version):
"""Retrieves the discovery document from the directory defined in
DISCOVERY_DOC_DIR corresponding to the serviceName and version provided.
Args:
serviceName: string, name of the service.
version: string, the version of the service.
Returns:
A string containing the contents of the JSON discovery document,
otherwise None if the JSON discovery document was not found.
"""
content = None
doc_name = "{}.{}.json".format(serviceName, version)
try:
with open(os.path.join(DISCOVERY_DOC_DIR, doc_name), "r") as f:
content = f.read()
except FileNotFoundError:
# File does not exist. Nothing to do here.
pass
return content

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

2 participants