Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design and prototype SerialDescriptor-based cache aka "SerialDescriptorLocal" #2240

Open
qwwdfsad opened this issue Mar 17, 2023 · 0 comments

Comments

@qwwdfsad
Copy link
Member

Current situation

Currently, we have multiple formats implementation that requires some kind of lazily-built cache.
For example, ProtoBuf maintains a mapping from the serial descriptor id to the corresponding proto id, Json maintains the cache of all types' unique serialNames and the corresponding alternative names. Other caches might be even more heavy-weight, for example, Json may want to build a trie for serial names in order to support zero-allocation key decoding.

Per-format cache has numerous downsides:

  • It forces each format to re-invent its own thread-safe data structure and cache convention
  • It is prone to memory leaks and unbound memory usage in classloading-heavy scenarios: cache should reference serial descriptors in a weak manner. Addressing this is a non-trivial implementation burden.
  • Managing the concurrency of such cache is a non-trivial task: format instances may become the contention point, especially on the application startup
  • Memory-unfriendly: in our practice, an application may have dozens of format instances that differ in various minor details (pretty printing, polymorphism etc.).Eventually, each of them will effectively have a copy of the cached value.
  • The most important downside, that likely to outweigh the previous ones: computational-heavy caching is an API-unfriendly performance timebomb: users often consider format's allocation as something lightweight and do not bother to pre-allocate it (to the extent that we have a dedicating IDEA's inspection for that). In such scenarios, the supposedly-cached value will be re-evaluated each time and potentially can consume more time than the actual serialization process (see Intrinsics for serializer() function #1348 and Implemented serializers caching for lookup #2015).

Proposed solution

Taking into account all the known limitations, I propose a format-agnostic concept -- SeralDescriptorLocal, a ThreadLocal counterpart (or, with some restrictions, ClassValue one) that shifts the caching responsibility to the core library level and delegates it to the SerialDescriptor the same way ThreadLocal delegates it to Thread instance.

The very preliminary API shape might have the following form:

// Format code, in companion

private val myCache = SerialDescriptorLocal { descriptor -> computeFormatSpecificValue(descriptor) }

// Format code, decode*/encode* functions

val cachedValue =  myCache.get(currentSerialDescriptor) 
...

// Core library, SerialDescriptor implementation

fun getOrCompute(key: SerialDescriptorLocal<T>): T {
    ... implementation shared between all SDs ...
}

Things that we have to figure out:

  • How to expose it in the SerialDescriptor interface
  • Whether we want to support a scenario where SerialDescriptor can opt-out from such behaviour and whether it is allowed to throw an exception
  • Whether we can provide support for non-static (e.g. format-dependent) SerialDescriptorLocal instances (e.g. case-insensitive trie) with the help of structural equality of SDL
  • Whether it is possible to implement with all the restrictions applied (thread-safety, class unloading friendliness) while keeping the API lightweight
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant