Skip to content

Commit

Permalink
tools: add documentation regarding our api tooling
Browse files Browse the repository at this point in the history
Introduces a proper imperative description of how the
current API documentation build system works.

Refs: nodejs/next-10#169

PR-URL: #45270
Reviewed-By: Michael Dawson <midawson@redhat.com>
  • Loading branch information
ovflowd authored and RafaelGSS committed Nov 10, 2022
1 parent 76cbc07 commit 9aa305f
Showing 1 changed file with 296 additions and 0 deletions.
296 changes: 296 additions & 0 deletions doc/contributing/api-documentation.md
@@ -0,0 +1,296 @@
# Node.js API Documentation Tooling

The Node.js API documentation is generated by an in-house tooling that resides
within the [tools/doc](https://github.com/nodejs/node/tree/main/tools/doc)
directory.

The build process (using `make doc`) uses this tooling to parse the markdown
files in [doc/api](https://github.com/nodejs/node/tree/main/doc/api) and
generate the following:

1. Human-readable HTML in `out/doc/api/*.html`
2. A JSON representation in `out/doc/api/*.json`

These are published to nodejs.org for multiple versions of Node.js. As an
example the latest version of the human-readable HTML is published to
[nodejs.org/en/doc](https://nodejs.org/en/docs/), and the latest version
of the json documentation is published to
[nodejs.org/api/all.json](https://nodejs.org/api/all.json)

<!-- TODO: Add docs about how the publishing process happens -->

**The key things to know about the tooling include:**

1. The entry-point is `tools/doc/generate.js`.
2. The tooling supports the CLI arguments listed in the table below.
3. The tooling processes one file at a time.
4. The tooling uses a set of dependencies as described in the dependencies
section.
5. The tooling parses the input files and does several transformations to the
AST (Abstract Syntax Tree).
6. The tooling generates a JSON output that contains the metadata and content of
the Markdown file.
7. The tooling generates a HTML output that contains a human-readable and ready
to-view version of the file.

This documentation serves the purpose of explaining the existing tooling
processes, to allow easier maintenance and evolution of the tooling. It is not
meant to be a guide on how to write documentation for Node.js.

#### Vocabulary & Good to Know's

* AST means "Abstract Syntax Tree" and it is a data structure that represents
the structure of a certain data format. In our case, the AST is a "graph"
representation of the contents of the Markdown file.
* MDN means [Mozilla Developer Network](https://developer.mozilla.org/en-US/)
and it is a website that contains documentation for web technologies. We use
it as a reference for the structure of the documentation.
* The
[Stability Index](https://nodejs.org/dist/latest/docs/api/documentation.html#stability-index)
is used to community the Stability of a given Node.js module. The Stability
levels include:
* Stability 0: Deprecated. (This module is Deprecated)
* Stability 1: Experimental. (This module is Experimental)
* Stability 2: Stable. (This module is Stable)
* Stability 3: Legacy. (This module is Legacy)
* Within Remark YAML snippets `<!-- something -->` are considered HTML nodes,
that's because YAML isn't valid Markdown content. (Doesn't abide by the
Markdown spec)
* "New Tooling" references to the (written from-scratch) API build tooling
introduced in `nodejs/nodejs.dev` that might replace the current one from
`nodejs/node`

## CLI Arguments

The tooling requires a `filename` argument and supports extra arguments (some
also required) as shown below:

| Argument | Description | Required | Example |
| --------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | -------- | ---------------------------------- |
| `--node-version=` | The version of Node.js that is being documented. It defaults to `process.version` which is supplied by Node.js itself | No | v19.0.0 |
| `--output-directory=` | The directory where the output files will be generated. | Yes | `./out/api/` |
| `--apilinks=` | This file is used as an index to specify the source file for each module | No | `./out/doc/api/apilinks.json` |
| `--versions-file=` | This file is used to specify an index of all previous versions of Node.js. It is used for the Version Navigation on the API docs page. | No | `./out/previous-doc-versions.json` |

**Note:** both of the `apilinks` and `versions-file` parameters are generated by
the Node.js build process (Makefile). And they're files containing a JSON
object.

### Basic Usage

```bash
# cd tools/doc
npm run node-doc-generator ${filename}
```

**OR**

```bash
# nodejs/node root directory
make doc
```

## Dependencies and how the Tooling works internally

The API tooling uses an-AST-alike library called
[unified](https://github.com/unifiedjs/unified) for processing the Input file as
a Graph that supports easy modification and update of its nodes.

In addition to `unified` we also use
[Remark](https://github.com/remarkjs/remark) for manipulating the Markdown part,
and [Rehype](https://github.com/rehypejs/rehype)to help convert to and from
Markdown.

### What are the steps of the internal tooling?

The tooling uses `unified` pipe-alike engine to pipe each part of the process.
(The description below is a simplified version)

* Starting from reading the Frontmatter section of the Markdown file with
[remark-frontmatter](https://www.npmjs.com/package/remark-frontmatter).
* Then the tooling goes to parse the Markdown by using `remark-parse` and adds
support to [GitHub Flavoured Markdown](https://github.github.com/gfm/).
* The tooling proceeds by parsing some of the Markdown nodes and transforming
them to HTML.
* The tooling proceeds to generate the JSON output of the file.
* Finally it does its final node transformations and generates a stringified
HTML.
* It then stores the output to a JSON file and adds extra styling to the HTML
and then stores the HTML file.

### What each file is responsible for?

The files listed below are the ones referenced and actually used during the
build process of the API docs as we see on <https://nodejs.org/api>. The
remaining files from the directory might be used by other steps of the Node.js
Makefile or might even be deprecated/remnant of old processes and might need to
be revisited/removed.

* **`html.mjs`**: Responsible for transforming nodes by decorating them with
visual artifacts for the HTML pages;
* For example, transforming man or JS doc references to links correctly
referring to respective External documentation.
* **`json.mjs`**: Responsible for generating the JSON output of the file;
* It is mostly responsible for going through the whole Markdown file and
generating a JSON object that represent the Metadata of a specific Module.
* For example, for the FS module, it will generate an object with all its
methods, events, classes and use several regular expressions (ReGeX) for
extracting the information needed.
* **`generate.mjs`**: Main entry-point of doc generation for a specific file. It
does e2e processing of a documentation file;
* **`allhtml.mjs`**: A script executed after all files are generated to create a
single "all" page containing all the HTML documentation;
* **`alljson.mjs`**: A script executed after all files are generated to create a
single "all" page containing all the JSON entries;
* **`markdown.mjs`**: Contains utility to replace Markdown links to work with
the <https://nodejs.org/api/> website.
* **`common.mjs`**: Contains a few utility functions that are used by the other
files.
* **`type-parser.mjs`**: Used to replace "type references" (e.g. "String", or
"Buffer") to the correct Internal/External documentation pages (i.e. MDN or
other Node.js documentation pages).

**Note:** It is important to mention that other files not mentioned here might
be used during the process but are not relevant to the generation of the API
docs themselves. You will notice that a lot of the logic within the build
process is **specific** to the current <https://nodejs.org/api/> infrastructure.
Just as adding some JavaScript snippets, styles, transforming certain Markdown
elements into HTML, and adding certain HTML classes or such things.

**Note:** Regarding the previous **Note** it is important to mention that we're
currently working on an API tooling that is generic and independent of the
current Nodejs.org Infrastructure.
[The new tooling that is functional is available at the nodejs.dev repository](https://github.com/nodejs/nodejs.dev/blob/main/scripts/syncApiDocs.js)
and uses plain ReGeX (No AST) and [MDX](https://mdxjs.com/).

## The Build Process

The build process that happens on `generate.mjs` follows the steps below:

* Links within the Markdown are replaced directly within the source Markdown
(AST) (`markdown.replaceLinks`)
* This happens within `markdown.mjs` and basically it adds suffixes or
modifies link references within the Markdown
* This is necessary for the `https://nodejs.org` infrastructure as all pages
are suffixed with `.html`
* Text (and some YAML) Nodes are transformed/modified through
`html.preprocessText`
* JSON output is generated through `json.jsonAPI`
* The title of the page is inferred through `html.firstHeader`
* Nodes are transformed into HTML Elements through `html.preprocessElements`
* The HTML Table of Contents (ToC) is generated through `html.buildToc`

### `html.mjs`

This file is responsible for doing node AST transformations that either update
Markdown nodes to decorate them with more data or transform them into HTML Nodes
that attain a certain visual responsibility; For example, to generate the "Added
at" label, or the Source Links or the Stability Index, or the History table.

**Note:** Methods not listed below are either not relevant or utility methods
for string/array/object manipulation (e.g.: are used by the other methods
mentioned below).

#### `preprocessText`

**New Tooling:** Most of the features within this method are available within
the new tooling.

This method does two things:

* Replaces the Source Link YAML entry `<-- source_link= -->` into a "Source
Link" HTML anchor element.
* Replaces type references within the Markdown (text) (i.e.: "String", "Buffer")
into the correct HTML anchor element that links to the correct documentation
page.
* The original node then gets mutated from text to HTML.
* It also updates references to Linux "MAN" pages to Web versions of them.

#### `firstHeader`

**New Tooling:** All features within this method are available within the new
Tooling.

Is used to attempt to extract the first heading of the page (recursively) to
define the "title" of the page.

**Note:** As all API Markdown files start with a Heading, this could possibly be
improved to a reduced complexity.

#### `preprocessElements`

**New Tooling:** All features within this method are available within the new
tooling.

This method is responsible for doing multiple transformations within the AST
Nodes, in majority, transforming the source node in respective HTML elements
with diverse responsibilities, such as:

* Updating Markdown `code` blocks by adding Language highlighting
* It also adds the "CJS"/"MJS" switch to Nodes that are followed by their
CJS/ESM equivalents.
* Increasing the Heading level of each Heading
* Parses YAML blocks and transforms them into HTML elements (See more at the
`parseYAML` method)
* Updates BlockQuotes that are prefixed by the "Stability" word into a Stability
Index HTML element.

#### `parseYAML`

**New Tooling:** Most of the features within this method are available within
the new tooling.

This method is responsible for parsing the `<--YAML snippets -->` and
transforming them into HTML elements.

It follows a certain kind of "schema" that basically constitues in the following
options:

| YAML Key | Description | Example | Example Result | Available on new tooling |
| ------------- | ------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------- | --------------------------- | ------------------------ |
| `added` | It's used to reference when a certain "module", "class" or "method" was added on Node.js | `added: v0.1.90` | `Added in: v0.1.90` | Yes |
| `deprecated` | It's used to reference when a certain "module", "class" or "method" was deprecated on Node.js | `deprecated: v0.1.90` | `Deprecated since: v0.1.90` | Yes |
| `removed` | It's used to reference when a certain "module", "class" or "method" was removed on Node.js | `removed: v0.1.90` | `Removed in: v0.1.90` | No |
| `changes` | It's used to describe all the changes (historical ones) that happened within a certain "module", "class" or "method" in Node.js | `[{ version: v0.1.90, pr-url: '', description: '' }]` | -- | Yes |
| `napiVersion` | It's used to describe in which version of the N-API this "module", "class" or "method" is available within Node.js | `napiVersion: 1` | `N-API version: 1` | Yes |

**Note:** The `changes` field gets prepended with the `added`, `deprecated` and
`removed` fields if they exist. The table only gets generated if a `changes`
field exists. In the new tooling only "added" is prepended for now.

#### `buildToc`

**New Tooling:** This feature is natively available within the new tooling
through MDX.

This method generates the Table of Contents based on all the Headings of the
Markdown file.

#### `altDocs`

**New Tooling:** All features within this method are available within the new
tooling.

This method generates a version picker for the current page to be shown in older
versions of the API docs.

### `json.mjs`

This file is responsible for generating a JSON object that (supposedly) is used
for IDE-Intellisense or for indexing of all the "methods", "classes", "modules",
"events", "constants" and "globals" available within a certain Markdown file.

It attempts a best effort extraction of the data by using several regular
expression patterns (ReGeX).

**Note:** JSON output generation is currently not supported by the new tooling,
but it is in the pipeline for development.

#### `jsonAPI`

This method traverses all the AST Nodes by iterating through each one of them
and infers the kind of information each node contains through ReGeX. Then it
mutate the data and appends it to the final JSON object.

For a more in-depth information we recommend to refer to the `json.mjs` file as
it contains a lot of comments.

0 comments on commit 9aa305f

Please sign in to comment.