Compressing the discovery cache #2321

jorenham · 2024-01-16T21:32:55Z

#2315 shaved off a bit more than 20MB of the discovery cache, by removing the json indentation.
Currently there's still 74.6MB remaining, and it appears to be growing steadily over time (from 81MB on 2023-03-31 to 94MB on 2024-01-15).

So while #2315 definitely helped, I believe that it's a good idea to consider reducing the size even further. I believe that this could significantly improve build/deploy times of the many docker images that use this library. For reference, the current latest python-slim docker image is less than 50MB.

An easy win would be to use compression.
To illustrate: if I manually zip the entire documents directory of 74.6MB in v2.114.0 using the pop-os default gnome archive manager, I end up with an archive of 11.9MB.
Creating a documents.tar.xz in the same way makes this 4.6MB.

It should be possible to achieve similar levels of compression by using Python's standard compression libraries, e.g. zlib, zipfile, or lzma.

The text was updated successfully, but these errors were encountered:

parthea · 2024-01-25T13:35:43Z

PRs are welcome!

We should ensure that when we retrieve the discovery artifact from the compressed file using get_static_doc that we only extract the file for the requested service and version.

google-api-python-client/googleapiclient/discovery_cache/__init__.py

Lines 55 to 78 in c965b05

    
           def get_static_doc(serviceName, version): 
        
               """Retrieves the discovery document from the directory defined in 
        
               DISCOVERY_DOC_DIR corresponding to the serviceName and version provided. 
        
               Args: 
        
                   serviceName: string, name of the service. 
        
                   version: string, the version of the service. 
        
               Returns: 
        
                   A string containing the contents of the JSON discovery document, 
        
                   otherwise None if the JSON discovery document was not found. 
        
               """ 
        
               content = None 
        
               doc_name = "{}.{}.json".format(serviceName, version) 
        
               try: 
        
                   with open(os.path.join(DISCOVERY_DOC_DIR, doc_name), "r") as f: 
        
                       content = f.read() 
        
               except FileNotFoundError: 
        
                   # File does not exist. Nothing to do here. 
        
                   pass 
        
               return content

parthea added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Jan 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compressing the discovery cache #2321

Compressing the discovery cache #2321

jorenham commented Jan 16, 2024 •

edited

parthea commented Jan 25, 2024

Compressing the discovery cache #2321

Compressing the discovery cache #2321

Comments

jorenham commented Jan 16, 2024 • edited

parthea commented Jan 25, 2024

jorenham commented Jan 16, 2024 •

edited