Design and prototype SerialDescriptor-based cache aka "SerialDescriptorLocal" #2240

qwwdfsad · 2023-03-17T17:28:42Z

Current situation

Currently, we have multiple formats implementation that requires some kind of lazily-built cache.
For example, ProtoBuf maintains a mapping from the serial descriptor id to the corresponding proto id, Json maintains the cache of all types' unique serialNames and the corresponding alternative names. Other caches might be even more heavy-weight, for example, Json may want to build a trie for serial names in order to support zero-allocation key decoding.

Per-format cache has numerous downsides:

It forces each format to re-invent its own thread-safe data structure and cache convention
It is prone to memory leaks and unbound memory usage in classloading-heavy scenarios: cache should reference serial descriptors in a weak manner. Addressing this is a non-trivial implementation burden.
Managing the concurrency of such cache is a non-trivial task: format instances may become the contention point, especially on the application startup
Memory-unfriendly: in our practice, an application may have dozens of format instances that differ in various minor details (pretty printing, polymorphism etc.).Eventually, each of them will effectively have a copy of the cached value.
The most important downside, that likely to outweigh the previous ones: computational-heavy caching is an API-unfriendly performance timebomb: users often consider format's allocation as something lightweight and do not bother to pre-allocate it (to the extent that we have a dedicating IDEA's inspection for that). In such scenarios, the supposedly-cached value will be re-evaluated each time and potentially can consume more time than the actual serialization process (see Intrinsics for serializer() function #1348 and Implemented serializers caching for lookup #2015).

Proposed solution

Taking into account all the known limitations, I propose a format-agnostic concept -- SeralDescriptorLocal, a ThreadLocal counterpart (or, with some restrictions, ClassValue one) that shifts the caching responsibility to the core library level and delegates it to the SerialDescriptor the same way ThreadLocal delegates it to Thread instance.

The very preliminary API shape might have the following form:

// Format code, in companion

private val myCache = SerialDescriptorLocal { descriptor -> computeFormatSpecificValue(descriptor) }

// Format code, decode*/encode* functions

val cachedValue =  myCache.get(currentSerialDescriptor) 
...

// Core library, SerialDescriptor implementation

fun getOrCompute(key: SerialDescriptorLocal<T>): T {
    ... implementation shared between all SDs ...
}

Things that we have to figure out:

How to expose it in the SerialDescriptor interface
Whether we want to support a scenario where SerialDescriptor can opt-out from such behaviour and whether it is allowed to throw an exception
Whether we can provide support for non-static (e.g. format-dependent) SerialDescriptorLocal instances (e.g. case-insensitive trie) with the help of structural equality of SDL
Whether it is possible to implement with all the restrictions applied (thread-safety, class unloading friendliness) while keeping the API lightweight

The text was updated successfully, but these errors were encountered:

qwwdfsad added feature design runtime labels Mar 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design and prototype SerialDescriptor-based cache aka "SerialDescriptorLocal" #2240

Design and prototype SerialDescriptor-based cache aka "SerialDescriptorLocal" #2240

qwwdfsad commented Mar 17, 2023

Design and prototype SerialDescriptor-based cache aka "SerialDescriptorLocal" #2240

Design and prototype SerialDescriptor-based cache aka "SerialDescriptorLocal" #2240

Comments

qwwdfsad commented Mar 17, 2023

Current situation

Proposed solution