RFC: Replace `DocHtmlStartTag` and `DocHtmlEndTag` with `DocXmlElement` #138

xiaoxiangmoe · 2019-01-04T06:40:29Z

Markdown table or HTML table element can't be rendered correctly in TSDoc playground

octogonz · 2019-01-07T23:43:04Z

I wanted to enable general HTML tags, however for security reasons it requires a script sanitizer such as sanitize-html or xss. @iclanton could we add that to the loader?

octogonz · 2019-01-08T00:00:44Z

dompurify looks to be more self-contained, and is hosted by cdnjs.com.

octogonz · 2019-01-08T02:36:05Z

@iclanton started a PR #139 to implement this.

However, when the playground tries to render DocHtmlStartTag and DocHtmlEndTag as React virtual DOM, there isn't any way to render half of a tag. There's a proposal for React.Fragment to support dangerouslySetInnerHTML , but it's not implemented yet. (API Extractor never encountered this problem because it renders TSDoc as Markdown instead of React, so we can simply append arbitrary text to the markdown string ignoring the tree structure.)

We considered trying to match up DocHtmlStartTag and DocHtmlEndTag during rendering, so we could create a corresponding React element, and then reattach the appropriate child nodes. But that seems messy. We realized that the TSDoc parser itself can do this fairly easily, if we require the input HTML to be well-formed XML instead of the messy ad hoc structures that CommonMark allows. This appeals to me, because when people need to embed nesting metadata in TSDoc, I generally recommend HTML5 custom elements instead of custom TSDoc inline tags, since XML is by design extensible and interoperable, whereas inline tags use a highly proprietary syntax.

Thus, I'm proposing to replace DocHtmlStartTag and DocHtmlEndTag in the TSDoc API with a new class called DocXmlElement. This will make HTML elements "first class" citizens in the TSDoc spec, and make them easier for documentation tools to work with.

TSDoc parses start and end tags separately and not as a single element, thus it's difficult to correctly render HTML elements with children. Rendering as plaintext increases verbosity and reduces clarity in the rendered documentation. Maybe a future `DocXmlElement` from TSDoc or rendering doc comments as markdown first and then passing to React could solve this issue. See microsoft/tsdoc#138

suneettipirneni · 2022-08-01T13:34:34Z

Hi @octogonz, I would be interested in implementing a DocXmlElement. However, I have few specific questions:

Would we still want to store the opening and closing delimiters for both the start and end tags within DocXmlElement or just store the start tag and end tag excerpts?

~~Would a DocXmlElement extend a DocNodeContainer? From a semantics point-of-view, XML nodes can only contain other xml nodes (or plain text), having any other tsdoc tags within them is invalid.~~
BuiltInNodes validates this anyways so I would assume DocNodeContainer would suffice.

Finally in the case of plain text within an XML node (<foo>hello world!</foo>), should we introduce another DocXmlPlainText class to represent these children?

octogonz · 2022-08-02T09:04:10Z

Hey @suneettipirneni, that would be awesome to finally get this implemented! I haven't thought about it in a long time.

Would we still want to store the opening and closing delimiters for both the start and end tags within DocXmlElement or just store the start tag and end tag excerpts?

Hmm... I haven't thought about this in a long time. 🙂 I think the AST is designed with excerpts as the leaf nodes, so it would be something like this:

Sample input:

<a>hi<b /></a>

AST pseudocode:

- DocXmlElement
  * excerpts for: "<a>"
  - DocPlainText
    * excerpts for: "hi"
  - DocXmlElement
    * excerpts for: "<b />"
  * excerpts for: "</a>"

Note that "</a>" is an excerpt belonging to the root DocXmlElement.

An AST visitor would be interested in the meaningful nodes DocXmlElement, DocPlainText, etc, whereas the excerpts are extra coordinates used for only for specialized operations such as syntax highlighting, or surgically modifying a document while preserving its original formatting.

BuiltInNodes validates this anyways so I would assume DocNodeContainer would suffice.

Yes, it would be a DocNodeContainer with those constraints. A key idea of DocNodeContainer is that it's possible for a user to introduce custom nodes that adjust these constraints. Some examples here: api-documenter/src/nodes/CustomDocNodeKind.ts

Finally in the case of plain text within an XML node (<foo>hello world!</foo>), should we introduce another DocXmlPlainText class to represent these children?

Yes.

Also keep in mind that the AST is intended to represent syntax errors as well, faithfully enough that it can regenerate the original input. Consider this example:

/**
 * <a>b</c>
 */

Maybe it would get represented as:

- (parent)
  - DocXmlElement
    * excerpts for: "<a>"
  - DocPlainText
    * excerpts for: "b"
  - DocXmlElement
    * excerpts for: "</c>"

Here "b" becomes a sibling of <a> instead of a child, and </c> is a degenerate DocXmlElement representing only an orphaned closing tag. (Or maybe </c> is DocErrorText? It depends on how much of the structure is useful, for example maybe we still want to syntax highlight </c> correctly.)

A web HTML parser has quite sophisticated heuristics that would try to render a sensible web page from broken syntax, for example inferring </a> so that b can be a child of <a>. But TSDoc doesn't need to worry about that in my opinion. Our priority is to implement a behavior that is simple and predictable and intuitive. (We don't face the problem of cross-compatibility with Chrome/Firefox/Safari/IE/etc specifically because TSDoc intends to be a strict standard with well-defined grammar rules, more like XML than HTML.)

suneettipirneni · 2022-08-04T16:09:53Z

Hi @octogonz, I made a draft pr for this. If you could give me any feedback on this initial implementation, it would be greatly appreciated 😄.

octogonz · 2022-08-04T21:53:29Z

Much appreciated! I apologize for the slow response -- I've been really busy this week. I will take a look over the weekend. Thanks for patience!

octogonz added the bug Something isn't working as intended label Jan 7, 2019

octogonz added request for comments A proposed addition to the TSDoc spec and removed bug Something isn't working as intended labels Jan 8, 2019

octogonz changed the title ~~Support for table in TSDoc~~ RFC: Replace DocHtmlStartTag and DocHtmlEndTag with DocXmlElement Jan 8, 2019

octogonz mentioned this issue Apr 20, 2019

[playground] Option to share a custom sample #159

Closed

octogonz mentioned this issue Jul 3, 2019

Where are TSDoc syntax specifications? #130

Closed

suneettipirneni linked a pull request Aug 3, 2022 that will close this issue

Add XML Parsing with DocXmlElement #331

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Replace `DocHtmlStartTag` and `DocHtmlEndTag` with `DocXmlElement` #138

RFC: Replace `DocHtmlStartTag` and `DocHtmlEndTag` with `DocXmlElement` #138

xiaoxiangmoe commented Jan 4, 2019 •

edited

octogonz commented Jan 7, 2019 •

edited

octogonz commented Jan 8, 2019

octogonz commented Jan 8, 2019 •

edited

suneettipirneni commented Aug 1, 2022 •

edited

octogonz commented Aug 2, 2022

suneettipirneni commented Aug 4, 2022 •

edited

octogonz commented Aug 4, 2022

RFC: Replace DocHtmlStartTag and DocHtmlEndTag with DocXmlElement #138

RFC: Replace DocHtmlStartTag and DocHtmlEndTag with DocXmlElement #138

Comments

xiaoxiangmoe commented Jan 4, 2019 • edited

octogonz commented Jan 7, 2019 • edited

octogonz commented Jan 8, 2019

octogonz commented Jan 8, 2019 • edited

suneettipirneni commented Aug 1, 2022 • edited

octogonz commented Aug 2, 2022

suneettipirneni commented Aug 4, 2022 • edited

octogonz commented Aug 4, 2022

RFC: Replace `DocHtmlStartTag` and `DocHtmlEndTag` with `DocXmlElement` #138

RFC: Replace `DocHtmlStartTag` and `DocHtmlEndTag` with `DocXmlElement` #138

xiaoxiangmoe commented Jan 4, 2019 •

edited

octogonz commented Jan 7, 2019 •

edited

octogonz commented Jan 8, 2019 •

edited

suneettipirneni commented Aug 1, 2022 •

edited

suneettipirneni commented Aug 4, 2022 •

edited