Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BitShuffle#shuffle for arrays with blocksize 4 (like integers) seems to be broken #296

Closed
foosmate opened this issue Nov 8, 2021 · 5 comments

Comments

@foosmate
Copy link

foosmate commented Nov 8, 2021

Some of our unit tests triggers that BitShuffle#shuffle() literally does nothing meaningful for arrays having a wordsize of 4 bytes (e.g. like integers). In contrast shuffling byte arrays (wordsize of 1 byte) works pretty well (unit tests passed) but for integers only the byte order is changed but no bits are re-arranged at all.

Since the shuffle and unshuffle methods both reflect this behaviour the bijective API contract is not broken and this effect is only "visible" on deeply inspecting the binary results.

I attached a short unit test allowing to easily reproduce the behaviour (one test pass and one fails).

The failing one is based on an array that contains 2 integers (each having 4 bytes) and the output is as follows:

Original Ints    -> ( 00000000 | 00000001 | 00000010 | 00000011 | 00000100 | 00000101 | 00000110 | 00000111 )
Expected Ints    -> ( 00100000 | 00000000 | 00100011 | 00000000 | 00101100 | 00000000 | 00101111 | 00000000 )
Shuffled Ints    -> ( 00000011 | 00000010 | 00000001 | 00000000 | 00000111 | 00000110 | 00000101 | 00000100 ) <--- RESULT
Unshuffled Ints  -> ( 00000000 | 00000001 | 00000010 | 00000011 | 00000100 | 00000101 | 00000110 | 00000111 )

Since I am not familiar with the internals, I can only guess where the error lies. I suspect something is wrong with the integration / passing to the bitshuffle library.

Since we are very dependent on a fast (native) bit-shuffling algorithm, I would really appreciate some help.

Thanks in advance for any support!

BitShuffleTest-java.txt

@kiyo-masui
Copy link

kiyo-masui commented Nov 8, 2021

I don't know much about the java interface, but on the bitshuffle README the function is BitShuffle.bitShuffle, not BitShuffle.shuffle.

That said, I didn't see a function with that name in a quick look at the java wrapper. I'm pretty sure the C-code should be correct, since our unit tests check against a python implementation. I'll need to verify that point.

I also wonder if the apparent byte-shuffling behavior might be fooling you and just be an artifact of the very particular test pattern you chose. Can you try something with a bit more entropy?

I see you specify big endian in your test, whereas most of our testing has been under little endian architectures. On that note, what architecture/instruction set are you using?

@foosmate
Copy link
Author

foosmate commented Nov 8, 2021

Thanks for the quick reaction. 👍

The mentioned method BitShuffle#shuffle was a reference to the current wrapper method (Snappy-Java) that has to be called from the Java world. I specified the byte order only for clarification, in order to express / relate it to the Java standard (force BE only). Since I use byte buffers as constants in my unit test, the byte order has literally no effect (even though the int array unit test seems to have a byte order related issue).

I use this wrapper class:

https://github.com/xerial/snappy-java/blob/master/src/main/java/org/xerial/snappy/BitShuffle.java

I nailed it down to a very simple test and made further findings:

The algorithm seems to require a minimum number of elements, otherwise it is not shuffled at all. Could that be the reason? All my unit tests are "handmade" having only a few elements which actually did not reach this "threshold"?

Have a look to this very simple tests (obviously only the second test is actually bit shuffled):

4 words with 2 bytes each -> elementSize = 4 | typeSize = 2

Original Bytes   -> ( 00000000 | 00000001 | 00000010 | 00000011 | 00000100 | 00000101 | 00000110 | 00000111 )
Shuffled Bytes   -> ( 00000000 | 00000001 | 00000010 | 00000011 | 00000100 | 00000101 | 00000110 | 00000111 ) <--- SHUFFLE RESULT
Unshuffled Bytes -> ( 00000000 | 00000001 | 00000010 | 00000011 | 00000100 | 00000101 | 00000110 | 00000111 )

8 words with 2 bytes each -> elementSize = 4 | typeSize = 2

Original Bytes   -> ( 00000000 | 00000001 | 00000010 | 00000011 | 00000100 | 00000101 | 00000110 | 00000111 | 00001000 | 00001001 | 00001010 | 00001011 | 00001100 | 00001101 | 00001110 | 00001111 )
Shuffled Bytes   -> ( 00000000 | 10101010 | 11001100 | 11110000 | 00000000 | 00000000 | 00000000 | 00000000 | 11111111 | 10101010 | 11001100 | 11110000 | 00000000 | 00000000 | 00000000 | 00000000 ) <--- SHUFFLE RESULT
Unshuffled Bytes -> ( 00000000 | 00000001 | 00000010 | 00000011 | 00000100 | 00000101 | 00000110 | 00000111 | 00001000 | 00001001 | 00001010 | 00001011 | 00001100 | 00001101 | 00001110 | 00001111 )

Of course I know that bit shuffling makes no sense for very few elements (only costs computing time). But for the pure algorithm I would have expected that it ignores the number of elements and always shuffles (even if there are only two elements given). In other words a meaningful threshold has to be applied in the outer context. Or is this threshold a pre-requirement by the blockwise implementation?

@foosmate
Copy link
Author

foosmate commented Nov 8, 2021

Just to be sure that I fully understand how the realized bitshuffle algorithm works.

Can you please confirm that the following bit sequence is the correct result?

2 words with 2 bytes each -> elementSize = 2 | typeSize = 2

A1 A0 B1 B0

Original Bit Sequence:

A1.7 | A1.6 | A1.5 | A1.4 | A1.3 | A1.2 | A1.1 | A1.0 \
A0.7 | A0.6 | A0.5 | A0.4 | A0.3 | A0.2 | A0.1 | A0.0 \
B1.7 | B1.6 | B1.5 | B1.4 | B1.3 | B1.2 | B1.1 | B1.0 \
B0.7 | B0.6 | B0.5 | B0.4 | B0.3 | B0.2 | B0.1 | B0.0

Shuffled Bit Sequence:

B1.3 | A1.3 | B1.2 | A1.2 | B1.1 | A1.1 | B1.0 | A1.0 \
B1.7 | A1.7 | B1.6 | A1.6 | B1.5 | A1.5 | B1.4 | A1.4 \
B0.3 | A0.3 | B0.2 | A0.2 | B0.1 | A0.1 | B0.0 | A0.0 \
B0.7 | A0.7 | B0.6 | A0.6 | B0.5 | A0.5 | B0.4 | A0.4

@kiyo-masui
Copy link

Few things:

I think you are interpreting elementSize incorrectly: it means the same thing as typeSize. In the C/Python API, the number of elements (not bytes) in the array is simply size.

Yes, having arrays that are too short could definitely be the issue. You need at least 8 elements (size >= 8). Furthermore, any data in the remainder of size % 8 gets copied instead of shuffled. Its algorithmically super expensive to bitshuffle arrays shorter than 8, because one bit from each element will not fill a byte.

Ah, the bit ordering always breaks my brain. There is a good chance I'll get this wrong but let me try.
Lets say the input sequence is 8 2-byte words.

A1.7 | A1.6 | A1.5 | A1.4 | A1.3 | A1.2 | A1.1 | A1.0 \
A0.7 | A0.6 | A0.5 | A0.4 | A0.3 | A0.2 | A0.1 | A0.0 \
B1.7 | B1.6 | B1.5 | B1.4 | B1.3 | B1.2 | B1.1 | B1.0 \
B0.7 | B0.6 | B0.5 | B0.4 | B0.3 | B0.2 | B0.1 | B0.0
C1.7 | ...
C0.7 | ...
...
H1.7 | ...
H0.7 | ...

In Bitshuffle, everything is little endian, so A1 is the least significant byte of word A and A1.0 is the least significant bit (LSB) of A1. Bitshuffle puts the LSB first, so the output should be:

A1.0 | B1.0 | C1.0 | D1.0 | E1.0 | F1.0 | G1.0 | H1.0 \
A1.1 | B1.1 | C1.1 | D1.1 | E1.1 | F1.1 | G1.1 | H1.1
...
A0.0 | B0.0 | C0.0 | D0.0 | E0.0 | F0.0 | G0.0 | H0.0 \
...
A0.7 | B0.7 | C0.7 | D0.7 | E0.7 | F0.7 | G0.7 | H0.7 \

This is counter intuitive, because we normally think of the LSB as being the last bit in a byte. But that's actually somewhat a matter of interpretation. If you want to see how confused I was about this while writing bitshuffle, there is this thread on SO: https://stackoverflow.com/questions/24045102/how-does-endianness-work-with-simd-registers

@foosmate
Copy link
Author

Actually I think it may be the other way round? :-)

B1.0 | C1.0 | D1.0 | E1.0 | F1.0 | G1.0 | H1.0 | A1.0 \
B1.1 | C1.1 | D1.1 | E1.1 | F1.1 | G1.1 | H1.1 | A1.1
...
B0.0 | C0.0 | D0.0 | E0.0 | F0.0 | G0.0 | H0.0 | A0.0 \
...
B0.7 | C0.7 | D0.7 | E0.7 | F0.7 | G0.7 | H0.7 | A0.7 \

My initial problem definitely results from the minimum words threshold (at least 8 elements of a given size), otherwise it acts in copy-only mode.

For the other points (raised during this analysis) I opened further issues:

#297 -> byte-order
#298 -> type-size

Thank you so much for your quick reaction!

By the way, bit-shuffling of small arrays also makes sense if it is not about performance but only about bandwith. We transfer small self-contained data packets to smartphones and bit-shuffling can reduce the amount of data considerably (even for data series with less than 8x total size). If bandwith matters (and not performance) you are willing to go the extra mile. ;-)

Example:

Once assumed the following data series (each number is stored as a UInt64, thus for each number 7 bytes of 8 are wasted with zero):

1, 2, 1, 0, 0, 1, 2, 1, 2, 0, 1, 0, 0, 1, 1

The total number of elements is 15 and the total array size in bytes is 15 x 8 = 120 bytes in total.

Here are the numbers and ratios:

Original Size: 120 (bytes) -> Ratio: 100%
LZ4 Compression Size: 46 (bytes) -> Ratio: 38%
Bitshuffle + LZ4 Compression Size: 14 (bytes) -> Ratio: 11%

Obviously even for small messages (e.g. that needs to be transported over the wire) bit-shuffling may have a significant positive effect. This was the reason how we figured out, that the bit-shuffling library does not kick-in here for these short messages. Our LZ4 compression ratio was sticked to 38% in this example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants