Skip to content
龙腾道 edited this page Sep 12, 2022 · 4 revisions

TOML: probably the best configuration file format

Configuration file format is a very basic file format, but far less complex than the requirements of data file formats (eg SQLite), documentation file formats (eg Markdown), programming languages ​​(eg JavaScript), or even binary file formats (eg PNG).

We just need it strict, supporting necessary data types and nesting structure, and easy to read and edit directly by humans.

But there has not been a good enough file format for such long time, for so widely and simple need.


INI (.ini) file is a very primitive form, but every environment defines its own rule, and it can only handle single nesting. Only very simple configuration would be fit, once you need deep nesting or array, it's not enough any more.

; simplest structure

val_1 = val_1;
val_2 = val_2; these values after equal signs are strings (the ending semicolons are not required; actually they means comments)

; a slightly more complex single nested structure

[obj_1]

x = obj_1.x
y = obj_2.y

[obj_2]

x = obj_2.x
y = obj_2.y

JSON (.json) is a very good format for data storage and transmission, but really inconvenient to read and edit. Even though JSON5 (.json5 - ECMAScript 5.1 JSON) extension allows you to write bare key names like JavaScript objects, allows trailing commas, and can have comments, writing multi-line strings is still cumbersome. Even if it adds multi-line template string syntax in the future, it still won't help, because although its syntax is based on bracket nesting, it cannot be read at all without indentation.

{
    "val_1": "val_1"
    ,
    "val_2": "val_2"
    ,
    "obj_1": {
        "x": "obj_1.x"
        ,
        "y": "obj_1.y"
    }
    ,
    "obj_2": {
        "x": "obj_2.x"
        ,
        "y": "obj_2.y"
    }
    ,
    "arr": [
        { "x":"arr[0].x", "y":"arr[0].y" }
        ,
        { "x":"arr[1].x", "y":"arr[1].y" }
    ]
}

YAML (.yaml or .yml) simply removes the JSON brackets which neccesary but not enough, and only remain the indentation. But editing and reading it leads to perpetual panic, for fear of counting the wrong spaces (in fact, for reading, syntax is not the smaller, the better). And in an editing environment that does not support smart indentation, this is really troublesome -- it's not a big deal for programming languages, but unfortunately, the most common usage scenario of configuration files is precisely so.

In addition, YAML has too many syntaxes, and you can't walk before you learn to run, even if you don't need complex features, in order to ensure that your simple features work right, you have to understand and avoid those complex syntaxes (such as what kind of key names you don't have to add quotation marks, what kind of strings can be unquoted; you can't always add quotation marks to avoid ambiguity, because it will be writing JSON). To make matters worse, even if it is so complicated, it's still hard to configure a precise multi-line string -- unless you go back to the single-line JSON string notation, it's exhausting to know whether the newline is preserved, whether the indent is ignored, whether the space is automatically inserted, whether the escape is interpreted, and so on. This, coupled with the confusing interpretation of various ambiguities by various implementations... It seems impossible to say that specification has no responsibility at all.

val_1: abcd # string
val_2: true # boolean
val_3: TRUE # ?
val_4: True # ?
val_5: TrUE # ?
val_6: yes  # ?
val_7: on   # ?
val_8: y    # ?

obj_1:
  x: obj_1.x
  y: obj_1.y
obj_2:
  x: obj_2.x
  y: obj_2.y

arr:
  - x: arr[0].x
    y: arr[0].y
  - x: arr[1].x
    y: arr[1].y

str_1: "a
b"          # ?
str_2: "a
        b"  # ?
str_3: "a
         b" # ?

Finally, TOML (.toml) was born. It completely forgoes the underlying principle of brackets or indentation, and instead, elects explicit key chains.

For convenience (and for clarity -- the fit for both read and write is crucial!), you can specify sections. Even better, section headers can also be chained.

In addition, some data may be more appropriate to use inline arrays or tables to avoid bloat, they are also supported.

val_1 = "val_1"
val_2 = "val_2"

obj_1.x = "obj_1.x"
obj_1.y = "obj_1.y"

[obj_2]

x = "obj_2.x"
y = "obj_2.y"

[[arr]]

x = "arr[0].x"
y = "arr[0].y"

[[arr]]

x = "arr[1].x"
y = "arr[1].y"

[str.x]

y.z = "str.x.y.z"

[str.a]

b.c = """
    str
   .a
  .b
 .c
""" # be equivalent to "    str\n   .a\n  .b\n .c\n"

[inline]

points = [
    { x=1, y=1, z=0 },
    { x=2, y=4, z=0 },
    { x=3, y=9, z=0 },
]

Start your TOML journey now! ​

TOML official documentation

Node.js implementation: @ltd/j-toml