[Packed Array] delta compression is almost useless for types smaller than or equal to 8bit #523
Replies: 6 comments 6 replies
-
I am afraid that I don't understand the first table. What does it mean column array and packed array? And which zserio type did you use? In all cases, I am pretty sure that deltas are not stored as 8bit at all and they work. Please have a look e.g. here. I agree that delta compression highly depend on data patterns but this is not a problem of zserio. Zserio just implements delta compression and cannot influence the data at all. |
Beta Was this translation helpful? Give feedback.
-
@AntonSulimenkoHarman, could you provide a demo schema which you would intend to use with the data patterns you provide so that we can try out your assumptions? Using general purpose compressors on zserio encoded structures is indeed a little bit of its own science. It is not directly the packed arrays that spoil the success there, but moreover any non-byte aligned schema. General purpose compressors usually work best on byte-aligned data patterns. So in case you have lists with e.g. |
Beta Was this translation helpful? Give feedback.
-
I apologize, I seem to have forgotten the schema. struct Array<T> {
varsize size;
T data[size];
};
struct PackedArray<T> {
varsize size;
packed T data[size];
};
instantiate Array<uint8> ArrayUint8;
instantiate PackedArray<uint8> PackedArrayUint8; Table description: Warning!!! Just my assumption: instantiate PackedArray<string> PackedArrayString; //<<< warns about [unpackable-array]
instantiate PackedArray<PackedArray<uint8>> PackedArrayPackedArrayUint8; //<<< doesn't warn about [unpackable-array] So it looks like zserio makes an assumption about string content (unlikely to be packable) but expects that PackedArrayUint8 can be packed. BTW, does zserio effectively pack |
Beta Was this translation helpful? Give feedback.
-
Yes, zserio uses delta compression for all integer types, including bit:5 as well. |
Beta Was this translation helpful? Give feedback.
-
@AntonSulimenkoHarman , @mikir |
Beta Was this translation helpful? Give feedback.
-
Please check my test pattern extension: fklebert/zserio-experiments#1 There are some unexpected (from the user's point of view) changes in the results for some patterns. As I can see, please correct me if I'm wrong; a packed array of uint8 could be guaranteed to be effective (in terms of serialized data size) if the maximum diff between adjacent elements <= 5bit LSB. |
Beta Was this translation helpful? Give feedback.
-
I've prepared a packed array test with some delta compression-friendly patterns:
ALL patterns (except a single case where the whole array is filled with a single const value) are larger because deltas are stored as 8bit!
One might say that delta compression prepares data for further archiving. So let's check it.
warning: !!! I've archived files with 7z, so sizes include service information like header and file names inside an archive. Please compare only the difference.
It's important to note that various archiving algorithms may produce varying results, but I am confident that this table accurately represents the concept.
Even in the case of further archiving, most of the patterns are better archivable as plain arrays by archiving algorithms.
And most favorable for delta compression patterns give only 24bytes and 15bytes advantage for 4K source data.
IMHO, delta compression is highly dependent on data patterns, and it becomes even more demanding when dealing with data that has a size of 8 bits or less.
Only data with large groups of the same data could give small benefits.
This makes it almost useless for real-world data.
Beta Was this translation helpful? Give feedback.
All reactions