Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bytes.to_s(encoding) should validate the passed encoding #1090

Open
generalmimon opened this issue Mar 17, 2024 · 0 comments
Open

bytes.to_s(encoding) should validate the passed encoding #1090

generalmimon opened this issue Mar 17, 2024 · 0 comments

Comments

@generalmimon
Copy link
Member

generalmimon commented Mar 17, 2024

Related:

Cc @GreyCat

In kaitai-io/kaitai_struct_compiler#254, a precompile step CanonicalizeEncodingNames was added. It validates every encoding YAML key, issues a warning if it specifies a non-canonical encoding and converts it to a canonical one if possible.

However, this is currently only done for the encoding key, not the argument of the bytes.to_s(encoding) method. For example, this compiles with KSC at commit 6aef6fae without any warnings despite using non-canonical utf8 encoding (canonical is UTF-8):

meta:
  id: bytes_to_s
seq:
  - id: buf
    size: 4
instances:
  v:
    value: buf.to_s('utf8')
$ kaitai-struct-compiler -- --verbose file -t cpp_stl --cpp-standard 11 bytes_to_s.ksy
parsing bytes_to_s.ksy...
reading bytes_to_s.ksy...
... compiling it for cpp_stl...
.... writing bytes_to_s.cpp
.... writing bytes_to_s.h

Checking the argument of bytes.to_s(encoding) method at compile time is always possible, because it was decided in #1051 that it will accept only string literals.

A real problem stemming from the lack of validation and canonicalization of the encoding name is that the unchecked encoding name will be reflected in the generated code exactly as written in the .ksy spec:

    m_v = kaitai::kstream::bytes_to_str(buf(), "utf8");

However, such non-canonical encodings might not work in some target languages. In particular the new Win32 API-based bytes_to_str implementation in the C++/STL runtime (STRING_ENCODING_TYPE = "WIN32API") will not accept such non-canonical spellings, because it specifically expects only the standardized names (kaitai-io/kaitai_struct_cpp_stl_runtime#61):

  • This assumes standardized encoding names, as runtime now does conversion from encoding name in string format into Windows codepage (which is an integer).

So encoding: utf8 would work (though with a warning), but bytes.to_s('utf8') would not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant