Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract alternating words #1069

Open
adavidzh opened this issue Sep 29, 2023 · 4 comments
Open

Extract alternating words #1069

adavidzh opened this issue Sep 29, 2023 · 4 comments
Labels

Comments

@adavidzh
Copy link

Before heading out and writing an opaque type plugin I was wondering if someone knows a direct kaitai way to decode a data structure that looks like:

Higher 32 bits Lower 32 bits
Source 1, Word 0 Source 2, Word 0
Source 1, Word 1 Source 2, Word 1
... ...
Source 1, Word $N$ Source 2, Word $N$

The goal is to have the date from Sources 1 and 2 available for further decoding. I know $N$.

I'm not fluent in kaitai and could not think of ways to use repeat and instances (with some _index arithmetic) to achieve this.

Any help would be appreciated!

@KOLANICH
Copy link

It strongly depends on the following things:

  1. what the sources are: are they separate files, or separate records within the same stream or what?
  2. what structure you want to parse: does it have a rigid fixed layout (a few fields of fixed length known at spec design time) or its layout depends on data

In the very simplest case you can currently just parse the 2 streams separately and then combine them manually, all in KS. Otherwise, I guess, currently an opaque type would be better.

"chains" proposal #196 is highly relevant for your problem.

@adavidzh
Copy link
Author

Thanks for the quick reply:

  1. what the sources are: are they separate files, or separate records within the same stream or what?

I should have been more clear: the layout above is of a single stream. I.e., the words from the two sources are interleaved in one stream.

  1. what structure you want to parse: does it have a rigid fixed layout (a few fields of fixed length known at spec design time) or its layout depends on data

I think I see where you're going and indeed the two sources, while having the same format as each other, will in some files have a certain format and a different one in other files.

In the very simplest case you can currently just parse the 2 streams separately and then combine them manually, all in KS.

Any chance you can share some (pseudo-)code so that I can try it out?

"chains" proposal #196 is highly relevant for your problem.

I think I see how it would help. Is it correct that chains cannot be tested?

@KOLANICH
Copy link

I.e., the words from the two sources are interleaved in one stream.

Now I understand better, so you don't need to combine them as I presumed, instead you need to split them. I guess it is some multimedia file format.

Any chance you can share some (pseudo-)code so that I can try it out?

I understood your requirements wrong. I feel like an opaque type accepting an _io of type KaitaiStream and providing several instances of subtype of KaitaiStream could be nice. But I don't feel like it is a proper solution for a problem, when data is interleaved, it is done on purpose, I suspect that nearby chunks of different streams are somehow related (except when the purpose is obfuscation) and should be consumed parallely. I feel like KS is not a proper solution for streaming multimedia for now because its focus is building internal representation in the form of a graph of objects. So if you make a spec and try to parse a large multimedia file, it will parse it into memory. You can optimize a bit by delaying parsing: parsing of instances is lazy, so if you create an array of handle types, which instances do actual parsing, then your consumer can just get a chunk by accessing an instance, and then free the allocated object when it is no longer needed. And even if your spec is not ready for usage in production, it is still very convenient to have it.

I guess something like the following (but it is uncertain, you would have to do some experiments, I don't remember if type: io can be used in params, you can try to bypass usage of placeholder additional type)

seq:
  ...
  - id: your_interleaved_blob
    type: your_opaque_type(count_of_streams) #accepts _io implicitly
instances:
  audio:
    pos: 0
    type: your_audio_codec_structure
    io: your_interleaved_blob.as<a_type_mocking_your_opaque_type>.audio_io_proxy._io # the custom code in your_opaque_type should create an audio_io_proxy field, which type is placeholder
    #io: your_interleaved_blob.as<a_type_mocking_your_opaque_type>.audio_io # if you bypass the proxy types
    -affected-by: 500 # this way we document issues in the issue tracker that block optimal specification, see https://github.com/kaitai-io/kaitai_struct_doc/pull/47 for more info
types:
  placeholder: {}
  a_type_mocking_your_opaque_type:
    -affected-by: 500
    #params:
    #  - id: audio_io
    #    type: io
    seq:
      - id: audio_io_proxy
        type: placeholder
      - id: video_io_proxy
        type: placeholder

#500 should be relevant too

@KOLANICH
Copy link

I call #500 "interfaces proposal"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants