Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Replace DocHtmlStartTag and DocHtmlEndTag with DocXmlElement #138

Open
xiaoxiangmoe opened this issue Jan 4, 2019 · 7 comments · May be fixed by #331
Open

RFC: Replace DocHtmlStartTag and DocHtmlEndTag with DocXmlElement #138

xiaoxiangmoe opened this issue Jan 4, 2019 · 7 comments · May be fixed by #331
Labels
request for comments A proposed addition to the TSDoc spec

Comments

@xiaoxiangmoe
Copy link

xiaoxiangmoe commented Jan 4, 2019

Markdown table or HTML table element can't be rendered correctly in TSDoc playground

@octogonz octogonz added the bug Something isn't working as intended label Jan 7, 2019
@octogonz
Copy link
Collaborator

octogonz commented Jan 7, 2019

I wanted to enable general HTML tags, however for security reasons it requires a script sanitizer such as sanitize-html or xss. @iclanton could we add that to the loader?

@octogonz
Copy link
Collaborator

octogonz commented Jan 8, 2019

dompurify looks to be more self-contained, and is hosted by cdnjs.com.

@octogonz
Copy link
Collaborator

octogonz commented Jan 8, 2019

@iclanton started a PR #139 to implement this.

However, when the playground tries to render DocHtmlStartTag and DocHtmlEndTag as React virtual DOM, there isn't any way to render half of a tag. There's a proposal for React.Fragment to support dangerouslySetInnerHTML , but it's not implemented yet. (API Extractor never encountered this problem because it renders TSDoc as Markdown instead of React, so we can simply append arbitrary text to the markdown string ignoring the tree structure.)

We considered trying to match up DocHtmlStartTag and DocHtmlEndTag during rendering, so we could create a corresponding React element, and then reattach the appropriate child nodes. But that seems messy. We realized that the TSDoc parser itself can do this fairly easily, if we require the input HTML to be well-formed XML instead of the messy ad hoc structures that CommonMark allows. This appeals to me, because when people need to embed nesting metadata in TSDoc, I generally recommend HTML5 custom elements instead of custom TSDoc inline tags, since XML is by design extensible and interoperable, whereas inline tags use a highly proprietary syntax.

Thus, I'm proposing to replace DocHtmlStartTag and DocHtmlEndTag in the TSDoc API with a new class called DocXmlElement. This will make HTML elements "first class" citizens in the TSDoc spec, and make them easier for documentation tools to work with.

@octogonz octogonz added request for comments A proposed addition to the TSDoc spec and removed bug Something isn't working as intended labels Jan 8, 2019
@octogonz octogonz changed the title Support for table in TSDoc RFC: Replace DocHtmlStartTag and DocHtmlEndTag with DocXmlElement Jan 8, 2019
velut added a commit to jsdocs-io/web that referenced this issue Feb 11, 2021
TSDoc parses start and end tags separately and not as a single element,
thus it's difficult to correctly render HTML elements with children.

Rendering as plaintext increases verbosity and
reduces clarity in the rendered documentation.

Maybe a future `DocXmlElement` from TSDoc or
rendering doc comments as markdown first
and then passing to React could solve this issue.

See microsoft/tsdoc#138
@suneettipirneni
Copy link

suneettipirneni commented Aug 1, 2022

Hi @octogonz, I would be interested in implementing a DocXmlElement. However, I have few specific questions:

Would we still want to store the opening and closing delimiters for both the start and end tags within DocXmlElement or just store the start tag and end tag excerpts?

Would a DocXmlElement extend a DocNodeContainer? From a semantics point-of-view, XML nodes can only contain other xml nodes (or plain text), having any other tsdoc tags within them is invalid.
BuiltInNodes validates this anyways so I would assume DocNodeContainer would suffice.

Finally in the case of plain text within an XML node (<foo>hello world!</foo>), should we introduce another DocXmlPlainText class to represent these children?

@octogonz
Copy link
Collaborator

octogonz commented Aug 2, 2022

Hey @suneettipirneni, that would be awesome to finally get this implemented! I haven't thought about it in a long time.

Would we still want to store the opening and closing delimiters for both the start and end tags within DocXmlElement or just store the start tag and end tag excerpts?

Hmm... I haven't thought about this in a long time. 🙂 I think the AST is designed with excerpts as the leaf nodes, so it would be something like this:

Sample input:

<a>hi<b /></a>

AST pseudocode:

- DocXmlElement
  * excerpts for: "<a>"
  - DocPlainText
    * excerpts for: "hi"
  - DocXmlElement
    * excerpts for: "<b />"
  * excerpts for: "</a>"

Note that "</a>" is an excerpt belonging to the root DocXmlElement.

An AST visitor would be interested in the meaningful nodes DocXmlElement, DocPlainText, etc, whereas the excerpts are extra coordinates used for only for specialized operations such as syntax highlighting, or surgically modifying a document while preserving its original formatting.

BuiltInNodes validates this anyways so I would assume DocNodeContainer would suffice.

Yes, it would be a DocNodeContainer with those constraints. A key idea of DocNodeContainer is that it's possible for a user to introduce custom nodes that adjust these constraints. Some examples here: api-documenter/src/nodes/CustomDocNodeKind.ts

Finally in the case of plain text within an XML node (<foo>hello world!</foo>), should we introduce another DocXmlPlainText class to represent these children?

Yes.

Also keep in mind that the AST is intended to represent syntax errors as well, faithfully enough that it can regenerate the original input. Consider this example:

/**
 * <a>b</c>
 */

Maybe it would get represented as:

- (parent)
  - DocXmlElement
    * excerpts for: "<a>"
  - DocPlainText
    * excerpts for: "b"
  - DocXmlElement
    * excerpts for: "</c>"

Here "b" becomes a sibling of <a> instead of a child, and </c> is a degenerate DocXmlElement representing only an orphaned closing tag. (Or maybe </c> is DocErrorText? It depends on how much of the structure is useful, for example maybe we still want to syntax highlight </c> correctly.)

A web HTML parser has quite sophisticated heuristics that would try to render a sensible web page from broken syntax, for example inferring </a> so that b can be a child of <a>. But TSDoc doesn't need to worry about that in my opinion. Our priority is to implement a behavior that is simple and predictable and intuitive. (We don't face the problem of cross-compatibility with Chrome/Firefox/Safari/IE/etc specifically because TSDoc intends to be a strict standard with well-defined grammar rules, more like XML than HTML.)

@suneettipirneni suneettipirneni linked a pull request Aug 3, 2022 that will close this issue
@suneettipirneni
Copy link

suneettipirneni commented Aug 4, 2022

Hi @octogonz, I made a draft pr for this. If you could give me any feedback on this initial implementation, it would be greatly appreciated 😄.

@octogonz
Copy link
Collaborator

octogonz commented Aug 4, 2022

Much appreciated! I apologize for the slow response -- I've been really busy this week. I will take a look over the weekend. Thanks for patience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
request for comments A proposed addition to the TSDoc spec
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants