Skip to content
This repository has been archived by the owner on Jul 27, 2020. It is now read-only.

chrisrzhou/unified-doc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

45 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

This repository is archived and no longer being maintained. Active development will be formally managed in the unified-doc organization.

๐Ÿ“œ unified-doc

unified document renderer for content.

image

Contents

Motivations

Content as structured data. -- unified

Knowledge is unified abstractly across humanity. We share common goals of acquiring, storing, and sharing knowledge. Content represents the physical manifestation of storing knowledge, and is stored in various digital formats in the modern computing age. Sharing content seamlessly across formats is a current challenge in unifying human knowledge.

Various softwares act on content types to parse, process, and render the underlying data for human consumption. Many solutions try to be interoperable, but are largely limited by the lack of a common interface across content types and programs. These solutions can be largely described as API interactions between software, but not as interactions with the actual content. The unified initiative addresses this problem by representing content in unified syntax tress where programs can work closely with the underlying structured content.

unified-doc is a project of unified document renderers and associated utilities, that use the unified ecosystem to render any supported content types into HTML-based markup. It represents content as structured data, and preserves fidelity of the original source content in the rendered document, all at the same time supporting powerful features that enrich the document (e.g. annotations), and remaining interoperable with standard and evolving web technologies.

Architecture

The following section covers the design of how unified-doc renderers and programs are implemented.

Content

At the time of writing, unified-doc supports parsing the following content types into hast trees:

  • text
  • markdown
  • html

This is done through the processor module which provides a single entry point to define how supported content types are parsed into hast trees. processor applies an opinionated (but configurable) sanitization step using the hast-util-sanitize utility.

Now that the source content is represented as unified hast tree, everything downstream can be consistently implemented. Let's talk about compiling and rendering the hast tree into an actual document.

Document

The term document refers abstractly to the output of compiling and rendering the hast tree. This output should be a HTML-based markup to support easy methods to further enrich the document with available web technologies. unified-doc supports the following renderers:

Renderers should use the processor module internally so that it can support all content types that processor supports. It can optionally include rehype plugins depending on features to be supported. react-unified-doc uses the hast-util-annotate utility to support annotation features on hast trees processed by processor.

Annotations

One of the more important and useful features when rendering documents is supporting annotations. Here are some use cases of annotations in common document workflows:

  • Highlighting: Text content is highlighted in the document with custom styles. This is the broadest domain and there are many UIUX implementations to tailored for specific document workflows.
  • Bookmarking: Loading a document with a and clicking on a valid anchor link will scroll to the bookmarked annotation.
  • Commenting: Clicking on an annotation loads associated comments.
  • Redlining: Text content is underlined, showing the difference between two versions of the document.

Definition: An annotation represents text content that is visibly marked to the user and does not disrupt the rest of the document layout.

The definition above is intentionally worded to emphasize the following:

  • text content: Only text content is meaningful to the viewer. For HTML-based markup, this is semantically represented by text nodes.
  • visibly marked: annotated text nodes should apply visual cues indicating they are annotated or 'marked'. For HTML-based markup, this is represented semantically by mark nodes, and visual customizations of these nodes is important in conveying annotation information.
  • does not disrupt: annotations should be pure semantic additions to the document without affecting the rendered document.

Annotations should support intuitive user interactions (e.g. clicking, hovering). These interactions allow building useful features that enrich the document (e.g. tooltips, permalinks, updating annotations).

Note: As mentioned earlier, it is important to view annotations as a pure additive operation when rendering documents. Annotation implementations should never couple the rendering of documents and annotations nor affect the document layout. This ensures that downstream applications of plugins and web technologies work seamlessly.

The above requirements and design choices are implemented in hast-util-annotate, which is a hast utility that powers annotation features in renderers such as react-unified-doc.

Plugins

Just as all content and programs are interoperable in the unified ecosystem, the unified-doc renderers should be compatible with the rehype plugin ecosystem. See the react-unified-doc plugins docs for an example on how this is achieved.

unified

This project is built on top of the unified ecosystem. Please check out all the inspirational and ambitious projects happening there!

Contribute

Help contribute towards making content and knowledge more accessible for machines and humans.

There are no formal contribution guidelines yet. Be respectful and nice!

Useful infomation about the project:

  • The project is linted with xo with some custom configuration.
  • While the project uses typescript, it is not a typescript project and uses it purely to aid development. This is intentional to make the code more accessible to the broader JS community.
  • Tests are managed with jest.
  • Docs are managed with docz.