Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String interpolation / formatting #1073

Open
GreyCat opened this issue Oct 12, 2023 · 1 comment
Open

String interpolation / formatting #1073

GreyCat opened this issue Oct 12, 2023 · 1 comment

Comments

@GreyCat
Copy link
Member

GreyCat commented Oct 12, 2023

As continuation of to-string story, now we have means to implement a way to convert structure into a string summary for many languages. However, what we end up having now is pretty basic:

  data_dir:
    to-string: |
      'Data Directory <VirtualAddr: ' + virtual_address.to_s + ', Size: ' + size.to_s + ', PointerToRawData: ' + pointer_to_raw_data.to_s + '>'

Two problems with it:

  • It's ugly (and thus hard to read, hard to write, and generally error-prone). Syntax highlight kind of helps, but there are still not too many options to have good syntax highlighting for KS expression language.
  • It's very limited. It's really hard to get rich formatting capabilities: hex/oct/bin representations, padding, etc.

Most modern languages solve this with interpolated / format strings. WebIDE actually got ahead and implemented a simple version of this with -webide-representation. One can use:

  • head {foo} tail to interpolate value of foo into the string
    • it can do integers; default representation of integers is hex
    • it can do floating points
    • it can do enums (as string values)
    • it can do strings
    • it can do byte arrays (as [1, 2, 128])
    • it can do true arrays (with , separator — e.g. foo, bar, baz)
    • it can do other structs (cascading into other -webide-representation definitions)
  • {foo:dec}, {foo:hex} enforce either decimal or hexadecimal formatting
  • {foo:sep=*} dumps arrays of with custom separator (e.g. as foo*bar*baz)

Other modern languages have something similar:

  • Python has f-strings since v3.6 with both execution of code within the interpolated strings, formatting and conversion capabilities:

    foo = 123
    f"foo={foo}"            # => 'foo=123'
    f"foo={foo:7}"          # => 'foo=    123'
    f"foo={foo:07}"         # => 'foo=0000123'
    f"foo={hex(foo)}"       # => 'foo=0x7b'
    f"foo={foo:x}"          # => 'foo=7b'
    f"foo={foo:8x}"         # => 'foo=      7b'
    f"foo={foo:08x}"        # => 'foo=0000007b'
    f"foo={foo:08b}"        # => 'foo=01111011'
  • Ruby has interpolation in double-quoted strings with code execution delimited in #{ and }. There are no quick formatting options, but one can chain to_s and rjust to achieve similar effect:

    foo = 123
    "foo=#{foo}"                        # => 'foo=    123'
    "foo=#{foo.to_s.rjust(7)}"          # => 'foo=    123'
    "foo=#{foo.to_s.rjust(7, '0')}"     # => 'foo=0000123'
    "foo=#{foo.to_s(16)}"               # => 'foo=7b'
    "foo=#{foo.to_s(16).rjust(8)}"      # => 'foo=      7b'
    "foo=#{foo.to_s(16).rjust(8, '0')}" # => 'foo=0000007b'
    "foo=#{foo.to_s(2).rjust(8, '0')}"  # => 'foo=01111011'
  • Scala has s-strings and f-strings:

    val foo = 123
    s"foo=$foo"                // => "foo=123"
    s"foo=${foo.toHexString}"  // => "foo=7b"
    f"foo=$foo%7d"             // => "foo=    123"
    f"foo=$foo%07d"            // => "foo=0000123"
    f"foo=$foo%x"              // => "foo=7b"
    f"foo=$foo%8x"             // => "foo=      7b"
    f"foo=$foo%08x"            // => "foo=0000007b"
    f"foo=$foo%08s"            // => "foo=01111011"
  • JavaScript has backtick strings with ${foo} syntax, but only with code execution, no special terse formatting:

    const foo = 123;
    `foo=${foo}`                                   // 'foo=123'
    `foo=${foo.toString().padStart(7)}`            // 'foo=    123'
    `foo=${foo.toString().padStart(7, '0')}`       // 'foo=0000123'
    `foo=${foo.toString(16)}`                      // 'foo=7b'
    `foo=${foo.toString(16).padStart(8)}`          // 'foo=      7b'
    `foo=${foo.toString(16).padStart(8, '0')}`     // 'foo=0000007b'
    `foo=${foo.toString(2).padStart(8, '0')}`      // 'foo=01111011'
  • C# has dollar strings + curly brackets with both code execution and formatting:

    int foo = 123;
    
    $"foo={foo}"        // => "foo=123"
    $"foo={foo,7:D}"    // => "foo=    123"
    $"foo={foo,07:D}"   // => "foo=0000123"
    $"foo=0x{foo:X}"    // => "foo=0x7b"
    $"foo={foo:X2}"     // => "foo=7b"
    $"foo={foo:X8}"     // => "foo=      7b"
    $"foo={foo:X8}"     // => "foo=0000007b"
    $"foo={Convert.ToString(foo, 2).PadLeft(8, '0')}" // "foo=01111011"

So, ultimately, looks like 2 options: either code execution only, or code execution + formatting.

Proposal

Let's implement KS expression language formatting strings which will more or less match subset of Python (as we're already borrowing quite a lot from Python syntax):

f"foo={foo}"            # => 'foo=123'
f"foo={foo:7}"          # => 'foo=    123'
f"foo={foo:07}"         # => 'foo=0000123'
f"foo={foo:x}"          # => 'foo=7b'
f"foo={foo:8x}"         # => 'foo=      7b'
f"foo={foo:08x}"        # => 'foo=0000007b'
f"foo={foo:08b}"        # => 'foo=01111011'

So, syntax-wise, it will be:

  • f" and " to border new type of string literal with interpolation
  • { and } inside the new string literal to delimit portions of the string which will be interpreted as code and format
  • "code and format" consists of "code" (which is any expression) and optional : with "format"
  • "format" is limited to sequence of the "length" and "format letter" (both optional):
    • length — digits forming a number for width of the field; if starts with "0", it's zero-padded
    • format letter — we'll support:
      • d for integer decimal
      • x for hex
      • o for octal
      • b for binary and that's it

Execution

Can be done in 2 steps:

  • Step 1: Interpolation string syntax parser + AST constructs only for f-strings and code in them.
    • AST will contain constructs which lists pieces to concatenate — intermixing regular strings and expressions.
    • Rendering: we can start with converting that into a.to_s + b.to_s + c.to_s + ... and render it in target language.
  • Step 2: True interpolated strings support.
    • AST constructs can be rendered more elaborately, directly into interpolated string in target language.
  • Step 3: Adding formatting options (: and subsequent stuff).
    • AST will contain same concatenation, but also adding container for "expression with formatting".
    • If target language supports it, it can be reflected in similar formatted string syntax.
    • If it doesn't, we can still be successful generating concatenation.
@GreyCat
Copy link
Member Author

GreyCat commented Oct 14, 2023

Given thumbs up, went ahead and implemented basic version — kaitai-io/kaitai_struct_compiler#258 — please take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant