Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to serialize/unserialize the cache in/from a file? #11

Open
deeplook opened this issue Jul 29, 2020 · 4 comments
Open

How to serialize/unserialize the cache in/from a file? #11

deeplook opened this issue Jul 29, 2020 · 4 comments
Labels
enhancement New feature or request high risk Addressing this issue may introduce security vulnerabilities priority-1-normal This issue has a normal priority size-large Much work need to be done

Comments

@deeplook
Copy link

Having the cache info is very useful, but I'm missing an entry to the cache itself so I could serialize it and reuse it later. Is there any way to do that already?

>>> f.cache_info()
CacheInfo(hits=8207, misses=1957, current_size=1957, max_size=None,
          algorithm=<CachingAlgorithmFlag.LRU: 2>, ttl=None,
          thread_safe=True, order_independent=False, use_custom_key=False)
@marceloFA
Copy link

I'm also hoping for some sort of pickling feature in the future!
I may be able to contribute to the project by implementing this feature if the maintainers have the will of answering some possible questions I may have during development.

@lonelyenvoy lonelyenvoy added enhancement New feature or request high risk Addressing this issue may introduce security vulnerabilities priority-1-normal This issue has a normal priority size-large Much work need to be done duplicate This issue or pull request already exists labels Dec 14, 2020
@lonelyenvoy lonelyenvoy removed the duplicate This issue or pull request already exists label Aug 1, 2021
@lonelyenvoy
Copy link
Owner

lonelyenvoy commented Aug 1, 2021

After careful consideration, I'm sorry that a serialization/deserialization feature can not yet be implemented in this library. Although it would be a really nice and useful feature, it seems that the cons still outweigh the pros, and some challenges must be addressed.

  • Deserialization is an unsafe operation which caused a large number of vulnerabilities in many programming languages (for example, Python's pickle is unsafe). So, it must be designed very carefully, especially in a library which is depended by a lot of software.
  • It is difficult (or even impossible) to consistently maintain the serialized cache when the code changes.

For example, given a foo

@cached
def foo(x):
    return x

One day, we serialize the cache by foo.serialize(...). After several revisions, we changed the logic of foo:

@cached
def foo(x):
    return x + 1

If we deserialize the cache by foo.deserialize(...), what will happen is that we will get wrong results (we get x instead of x + 1 when we call foo(x)).

If anyone:

  • finds a way to keep the serialized cache consistent with the code (for example, raise an error if the code has been changed)
  • has the capability and resources to implement a serialization feature free from vulnerabilities

Please comment or submit PRs. Thank you!

@judahrand
Copy link

@lonelyenvoy Have you looked at how joblib.Memory approaches this? https://github.com/joblib/joblib/blob/55d97abd59dbc703579307f9d359870be436ebd1/joblib/memory.py#L672

Joblib also does a better job of dealing with the input args. It takes a sensibly filtered set of the arguments, assembles them into a list, pickle.dumps the list, hashes the bytes and uses this as the key.

@blazespinnaker
Copy link

blazespinnaker commented Sep 8, 2022

Sorry, how do you pickle the cache? I get that you might not want to have serde code in the library, but folks should be able to do that themselves via exposed apis to get and set the cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request high risk Addressing this issue may introduce security vulnerabilities priority-1-normal This issue has a normal priority size-large Much work need to be done
Projects
None yet
Development

No branches or pull requests

5 participants