Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for concatenated snappy-in-java files #23

Open
gwittel opened this issue Jul 25, 2018 · 1 comment
Open

Support for concatenated snappy-in-java files #23

gwittel opened this issue Jul 25, 2018 · 1 comment

Comments

@gwittel
Copy link

gwittel commented Jul 25, 2018

When dealing with some legacy format files, I noticed that snzip will fail to read snappy-in-java format files that are concatenated together. The issue is when it encounters the 2nd file, it reads the 's' (0x73) from the header and aborts since its not a recognized id.

The simple workaround is to skip the next 6 bytes (nappy\0 ) similar to how the framing2 format implicitly skips the header (this is due to it reading 0xff 0x06 0x00 0x00 as 6, then skipping those 6 bytes (sNaPpY) with the fseek.

Before I sent a real PR I wanted to get some feedback. My quick and dirty workaround does not validate the 2nd header is actually a valid snappy header. However, framing2 doesn't do this either (it relies on the implicit skipping defined by the header format itself).

Creating test file:

$ echo 'hello' | ./snzip -t snappy-in-java > one.snappy
$ echo 'world' | ./snzip -t snappy-in-java > two.snappy
$ cat one.snappy two.snappy > three.snappy

Original version:

$ ./snzip -d -c three.snappy
hello
Unknown compressed flag 0x73

Patched:

$ ./snzip -d -c three.snappy
hello
world

Thoughts/preferences on patch approach?

Hacky version diff:

diff --git a/snappy-in-java-format.c b/snappy-in-java-format.c
index 0f95e1a..2b2579a 100644
--- a/snappy-in-java-format.c
+++ b/snappy-in-java-format.c
@@ -195,6 +195,16 @@ static int snappy_in_java_uncompress(FILE *infp, FILE *outfp, int skip_magic)
     case UNCOMPRESSED_FLAG:
       /* pass */
       break;
+       case 's':
+         /* s== 0x73 Possible concatenated block.
+          * Note that other framing formats like frame2 see 0xff and just skip
+          * the rest of the header due to the header being: 0xff 0x06 0x00 0x00 snappy
+          * (it reads the 3-byte chunk header length resulting in a block length of
+          * 6 bytes, and skips 6 bytes which happens to be == snappy)
+          */
+         /* Likely concatenated snappy file.  We read first byte, skip rest */
+         fseek(infp, SNAPPY_IN_JAVA_MAGIC_LEN - 1, SEEK_CUR); /* TODO strict check? */
+         continue;
     default:
       print_error("Unknown compressed flag 0x%02x\n", compressed_flag);
       goto cleanup;
@kubo
Copy link
Owner

kubo commented Oct 24, 2018

Thanks for opening the issue and sorry not to reply you for long time.
If you have will yet, could you make a pull request?

Could you validate file headers? That's because original implementation does. (here)
Could you fix indentation width also? This file uses two spaces for indentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants