Skip to content

tdast/tdast

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tdast

Tabular Data Abstract Syntax Tree format.


tdast is a specification for representing tabular data as an abstract syntax tree. It implements the unist spec.

This document may not be released. See releases for released documents.

Contents

Introduction

This document defines a format for representing tabular data as an abstract syntax tree. This specification is written in the style of other syntax-tree specs, for potential incorporation into the ecosystem in the future. Development started in September 2020.

Where this specification fits

tdast extends unist, a format for syntax trees, and benefits from its ecosystem of utilities.

tdast relates to JavaScript in that it has an ecosystem of utilities for working with compliant syntax trees in JavaScript. However, tdast is not limited to JavaScript and can be used in other programming languages.

tdast relates to the unified project in that tdast syntax trees follow the unist spec and is compatible with utilities throughout its ecosystem.

Scope

tdast seeks to represent how tabular data from some source content (e.g. CSV, JSON), would be represented in a syntax tree. It implements the unist spec, meaning that the tree would track positional information of nodes that will be useful for remaining interoperable with unist utilities.

With associated utilities, tdast enables flexible ways to parse, transform, serialize tabular data in various content formats. CSV, JSON, HTML tables are immediate native candidates benefiting from tdast, and future content types can be supported by implementors.

While it is possible to perform data calculations in tdast with associated utilities, it is recommended that you work with more direct data representations of the source content for these purposes. tdast utilites to convert to/from JS objects exist for this purpose.

While tdast could be used to represent most tabular data, it is best used with simple cases of flat tabular data.

Nodes

Parent

interface Parent extends UnistParent {
  /** Array of child nodes */
  children: Array<Row | Cell | Column>;
  /** Additional data maybe attached. */
  data?: Data;
}

Parent (UnistParent) is an abstract node containing other nodes (said to be children).

This node is not used directly in tdast and is defined as an abstract interface.

Literal

interface Literal extends UnistLiteral {
  /** Primary data value of a literal node. Loosely typed to `any` for convenience. */
  value: any;
  /** Additional data maybe attached. */
  data?: Data;
}

Literal (UnistLiteral) is an abstract node in containing a value.

This node is not used directly in tdast and is defined as an abstract interface.

Table

interface Table extends Parent {
  /** Table node type. */
  type: 'table';
  /** Can only contain Rows for children. */
  children: Row[];
}

Table (Parent) represents the node that holds all tabular data with Row nodes.

Table can be used as the root of a tree, but never as a child.

Row

interface Row extends Parent {
  /** Row node type. */
  type: 'row';
  /** Index of Row in relation to other Rows in a Table. */
  index: number;
  /** Can only contain Cells or Columns for children. */
  children: Array<Cell | Column>;
}

Row (Parent) holds literal nodes that contain data, such as Column or Cell nodes.

Row contains an index field that tracks its relative position with other rows under a Table.

Cell

interface Cell extends Literal {
  /** Cell node type. */
  type: 'cell';
  /** Tracks which Column the Cell belongs to */
  columnIndex: number;
  /** Tracks which Row the Cell belongs to */
  rowIndex: number
}

Cell (Literal) represents the intersection of a Row and Column of a Table. This intersection information is stored on the columnIndex and rowIndex properties.

Its primary data is stored on the value property. Cell can also contain attach additional data (not relevant to tdast) under the optional data property.

Column

interface Column extends Literal {
  /** Column node type. */
  type: 'column';
  /** Display label of a column. */
  label: string;
  /** Index of Column in relation to other Columns in a Table. */
  index: number;
  /** Optional data type useful to determine data types of Cells matching the Column. */
  dataType?: string;
}

Column (Literal) is usually reserved in the first Row of a Table.

Column contains an index field that tracks its relative position with other columns under a Table. Its primary data is represented in the value property. It should have a display label specified as a string.

Column can optionally specify the dataType property, which informs the data types applied to Cells in a Row.

Glossary

  • ast: a data structure representing source content as an abstract syntax tree.
  • cell: the intersection of a table row and column. A cell contains data.
  • column: a table column provides definitions for cells in subsequent rows. A column should define the dataType of cells under it. The cardinality of a column is equal to the number of rows in a table.
  • csv: a common tabular data format that uses comma-separated values for delimiting values.
  • dataType: refers to the data type a table cell assumes. This is usually defined by a table column.
  • row: a table row contains cells. The cardinality of a row is equal to the number of columns in a table. The dataType of each cell should assume what is prescribed by the columns.
  • table: an arrangement of data in rows and columns.
  • tdast: represent tabular data as an abstract syntax tree with tdast.
  • unist: universal syntax tree that tdast is based on.

List of utilities

Related

  • unist: Universal Syntax Tree format
  • tdastscript: utility to create tdast trees.

License

CC-BY-4.0 © Chris Zhou