Skip to content

kesh-lang/na

Repository files navigation

na

data notation for the conveyance of values

Definition

na is a simple yet flexible data notation format.

Its value types represent a minimal set of common data types.

Rationale

Notation matters.

The syntax should be simple, ergonomic, flexible, reliable and secure.

It should be a solid foundation for a wide range of use cases.

Data types

  • #truth – Boolean truth values
  • #number – arbitrary precision numbers
  • #text – a sequence of Unicode scalar values
  • #block – a sequence of linear/associative values

Names

na meets UAX31-R1-2 by using a profile of UAX31-R1-1, adding the optional start character _ (low line) and the optional medial character - (hyphen-minus). In the syntax of UAX31-D1:

<Identifier> := <Start> <Continue>* (<Medial> <Continue>+)*

<Start> := XID_Start + U+005F
<Continue> := XID_Continue
<Medial> := U+002D

That is, a name must conform with UAX31-R1-2 and:

  • may start with, contain and end with _ (low line)
  • may contain but cannot start or end with - (hyphen-minus)

Names are case insensitive, meeting UAX31-R4 with normalization form KC and UAX31-R5 with full case folding. Implementations should ignore default ignorable code points in comparison.

When compiling to a target language that does not support kebab-case, names may be transliterated to a compatible case style that maintains the separation of words within a name.

Syntax

Comments

-- this is a comment

Primitive values

Truth

Boolean truth values, represented with Unicode symbols and .

For the convenience of end users, implementations should allow the words true and false as aliases.

Number

Arbitrary precision signed numbers.

Base 10
42         -- integer
6.28       -- decimal fraction
1/3        -- rational fraction (ratio)
1.6e-35    -- scientific/exponential notation
1_771_561  -- digit grouping
007        -- leading zeros
48fps      -- suffix
99%        -- percentage (ratio to 100)
Other bases

Bases with radix from 2 to 36 is supported, using 0…9 + A…Z/a…z as numerals.

2\101010   -- binary
8\755      -- octal
16\decaf   -- hexadecimal

Text

A sequence of zero or more Unicode scalar values in UTF-8 encoding.

Inline

Single-quoted text is verbatim.

'"verbatim" text'

Double-quoted text supports escape sequences.

"\"escaped\" text"

The following escape sequences are supported:

  • \" – quotation mark U+0022
  • \\ – reverse solidus U+005C
  • \␤ - line continuation
  • \xxxxxx – Unicode code point (6 hexadecimal numerals padded with leading zeros)
Multiline

Multiline texts follow the same rules as Julia's triple-quoted string literals.

'''
this is a "verbatim" text
that's multiline
'''
"""
this is an "escaped" text
that's multiline \01F632
"""

Composite values

Block

A versatile data structure able to represent both linear and associative collections.

Blocks are enclosed by square brackets []. Inline items are separated by comma ,.

Keys are optional and can be either non-negative integers or names.

Associative items are explicitly defined with colon :. Duplicates are not allowed.

Linear items are implicitly given 0-indexed integer keys.

[]                      -- empty
[ 1, 2, 3 ]             -- implicit integer keys (list/array/sequence/stack/queue)
[ 7: true, 42: true ]   -- explicit integer keys (sparse array)
[ foo: 42, bar: true ]  -- explicit names (record/object/map/structure/dictionary/hash)

Similar to Lua tables, JavaScript objects and Dart records, a block may contain both linear and associative values.

[ 1, 2, 3, length: 3 ]  -- a mix of implicitly indexed and explicitly named values

More specific data types may be enforced with extensions.

Nested
linear: [
    [1, 2, 3]               -- inline items separated by comma
    [4, 5, 6]               -- multiline items separated by newline
    [7, 8, 9]
]
associative: [
    foo: [                  -- multiline block
        bar: [ baz: true ]  -- inline block
    ]
]
Lightweight syntax

Brackets and commas are required for inline blocks and optional for multiline blocks.

Indentation is significant for multiline blocks.

Multiline items may be prefixed with bullet point for readability.

person:
    name: 'Alan'
    age: 38
    friends:
        • 'Ada''Charles'

Encoding

Either UTF-8 or a compatible binary format, for example CBOR or a derivative of Nota.

Features

  • Lightweight
  • Human-friendly
  • Line-oriented (newline is significant)
  • Indentation-based (indentation is significant)
  • Extensible

na is the kesh word for river.