Skip to content
This repository has been archived by the owner on Jun 1, 2022. It is now read-only.

Refactor Codec/Serializers to reduce new object contstruction #164

Open
sappenin opened this issue Jul 30, 2018 · 4 comments
Open

Refactor Codec/Serializers to reduce new object contstruction #164

sappenin opened this issue Jul 30, 2018 · 4 comments

Comments

@sappenin
Copy link
Contributor

There are three problems with the codec/serializer framework as it stands now.

First, the codecs rely upon holding "state" until the Serializer can write (or the inverse in the case of a read). The net effect of this is that for every serialization operation, there is at least 1 new object construction operation (to create a new codec), but in general it's much worse that this.

For example, consider the AsnInterledgerPreparePacketDataCodec, which is engaged for every prepare packet. Here, we have at least 5 new object instantiations for every deserialize operation. So, if we process 5m payments (prepare[5] + fulfill[2]), we would expect at least 35 million new object creations, which wastes memory, cpu cycles, etc.

Second, the current system attempts to automatically map from Java objects to ASN.1 OER. For the most part, this is a fine default setting, but creates complexity inside of the framework when we need to create custom mappings. For example, currently the CodecContextFactory maps a BigInteger to AsnUint64Codec, which makes sense for the ILP codecs. However, for some CCP codecs, BigInteger can sometimes be required to map to AsnUintCodec instead. This is not a showstopper -- we might simply try to create a more intelligent Codec perhaps, but this new codec would likely not suit all use-cases. A better system would be one where a developer can have more fine-grained control over the Codec/Serializer that should be used, despite what the default factory has configured.

Third, is clarity of code. There appears to be overlap between the Serializers and the Codecs. For example, it's nearly impossible to create a good unit test of a serializer because it's tightly coupled to its codec (a proper unit test here can be quite complicated, typically because creating a codec-mock is non-trivial). Additionally, it can be unclear which layer is responsible for what.

This ticket proposes to solve both performance and clarity issues by combining the codecs and serializers into a single layer, with an API that allows for simple encoding/decoding functionality (like the current API) but also allows for fine-grained read... and write... methods, similar to how an InputStream/OutputStream would allow by providing direct access to read/write methods.

@sublimator
Copy link
Contributor

A few thoughts to share:

Generally, the high level structs seem to have to declare exactly which codec they require for each field, so I don't really see much problem with the defaults (re: BigInteger <-> Uint64) .
Actually I was looking at using this library in an android background service for a rpc via Bluetooth Low Energy framework. I found declaring the structs to be quite repetitive and cumbersome.

I was messing around a while back with custom annotations, which allowed the use of reflection/Proxy to auto generate codecs for serializing/deserializing Immutable structs. See sketch here: sublimator@b086d2e#diff-e71e8189355a88ef639759191518ecfaR72

Obviously not very performant. I think the auto generated sequence codecs were around 35% slower.
Essentially:

AnInterface.builder().from(DecoderProxy.from(AnInterface.class, ...)).build()

Whatever way it's implemented, it might be nice to have a separate package of annotations one could apply that codec-framework could understand, and auto generate codecs, which could be replaced with hand written codecs as and when performance is required. It would aid in documenting the structs, and the auto generated codecs could be compared against the hand written ones.

I agree the stateful codecs are a bit less than ideal. On the topic of performance, there's a lot of concurrent map usage inside the context which perhaps could be better solved with builders/ immutable maps and copies when registering?

I think I recall @adrianhopebailie at one time expressing that a simpler Reader/Writer pattern could be more performant?

@adrianhopebailie
Copy link
Contributor

Yeh, this framework was definitely designed with "align to ASN.1" in mind.

The goal was a separation of the concepts:

  1. Definition of data structures using ASN.1
  2. Serializing data that conforms to those structures according to a set of encoding rules (in this case OER)

So you could have an ASN.1 modelling framework that allowed someone to write codecs to convert between business objects and an ASN.1 representation and then separately encode and decode these using a specific set of encoding rules.

Technically you should be able to write a different set of serializers for DER, BER etc if you needed an alternative encoding for the same ASN.1 defined objects.

The overhead of creating new objects each time was intentional as it made things simpler to implement (thread-safe without the need to enforce immutability) and I figured the "framework" approach was not the right tool for a high perf system anyway.

If you want performance you probably want use-case specific readers/writers that don't need to translate your business objects into an intermediate ASN.1 form. If we get to that then I think we want to look at a variety of other techniques for improving performance like re-using buffers for ILP packets and doing in place replacement of fixed length data (like amounts and expiries).

In a high perf ILP connector you could read a prepare packet, modify it, and forward it on with a new amount and expiry without ever needing to fully deserialize it. Maybe we can hold off on refactoring the codecs until we want to try something like that and ditch the framework altogether?

@sublimator
Copy link
Contributor

hold off on refactoring the codecs

Yeah, they'll rewrite themselves if something actually uses them and needs the perf right?

@sublimator
Copy link
Contributor

sublimator commented Aug 1, 2018 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

3 participants