Skip to content

Commit

Permalink
Add setHTMLUnsafe() and parseHTMLUnsafe()
Browse files Browse the repository at this point in the history
These are modern HTML-parsing methods, replacing the innerHTML setter and (new DOMParser()).parseFromString(). See https://github.com/otherdaniel/purification/blob/explainer-examples/explainer.md#proposed-api for more background. The "unsafe" part of their names comes from the fact that safe versions, which sanitize by default, will be introduced in the future.

Notable differences from the older versions include support for declarative shadow roots (whatwg#5465) by default, no mode-switching between XML and HTML, and (for parseHTMLUnsafe()) no inheritance from the outer document.
  • Loading branch information
josepharhar committed Oct 11, 2023
1 parent 8fc80bd commit 3ca0811
Showing 1 changed file with 142 additions and 22 deletions.
164 changes: 142 additions & 22 deletions source
Original file line number Diff line number Diff line change
Expand Up @@ -10609,6 +10609,8 @@ typedef (<span>HTMLScriptElement</span> or <span>SVGScriptElement</span>) <dfn t

[<span>LegacyOverrideBuiltIns</span>]
partial interface <dfn id="document" data-lt="">Document</dfn> {
static <code>Document</code> <span data-x="dom-parseHTMLUnsafe">parseHTMLUnsafe</span>(DOMString html);

// <span>resource metadata management</span>
[PutForwards=<span data-x="dom-location-href">href</span>, <span>LegacyUnforgeable</span>] readonly attribute <span>Location</span>? <span data-x="dom-document-location">location</span>;
attribute USVString <span data-x="dom-document-domain">domain</span>;
Expand Down Expand Up @@ -109471,6 +109473,8 @@ document.body.appendChild(frame)</code></pre>
also live here? -->
<h3 id="dom-parsing-and-serialization">DOM parsing</h3>

<h4>The <code>DOMParser</code> interface</h4>

<p>The <code>DOMParser</code> interface allows authors to create new <code>Document</code> objects
by parsing strings, as either HTML or XML.</p>

Expand All @@ -109491,17 +109495,19 @@ document.body.appendChild(frame)</code></pre>

<p>Note that <code>script</code> elements are not evaluated during parsing, and the resulting
document's <span data-x="document's character encoding">encoding</span> will always be
<span>UTF-8</span>.</p>
<span>UTF-8</span>. The document's <span data-x="concept-document-url">URL</span> will be
inherited from <var>parser</var>'s <span>relevant global object</span>.</p>

<p>Values other than the above for <var>type</var> will cause a <code>TypeError</code> exception
to be thrown.</p>
</dd>
</dl>

<p class="note">The design of <code>DOMParser</code>, as a class that needs to be constructed and
then have its <code data-x="dom-DOMParser-parseFromString">parseFromString()</code> method called,
is an unfortunate historical artifact. If we were designing this functionality today it would be a
standalone function.</p>
then have its <code data-x="dom-DOMParser-parseFromString">parseFromString()</code> method
called, is an unfortunate historical artifact. If we were designing this functionality today it
would be a standalone function. For parsing HTML, the modern alternative is <code
data-x="dom-parseHTMLUnsafe">Document.parseHTMLUnsafe()</code>.</p>

<pre><code class="idl">[Exposed=Window]
interface <dfn interface>DOMParser</dfn> {
Expand Down Expand Up @@ -109531,7 +109537,7 @@ enum <dfn enum>DOMParserSupportedType</dfn> {
<li>
<p>Let <var>document</var> be a new <code>Document</code>, whose <span
data-x="concept-document-content-type">content type</span> is <var>type</var> and <span
data-x="concept-document-URL">url</span> is this's <span>relevant global object</span>'s <span
data-x="concept-document-URL">URL</span> is this's <span>relevant global object</span>'s <span
data-x="concept-document-window">associated <code>Document</code></span>'s <span
data-x="concept-document-URL">URL</span>.</p>
<!-- When https://github.com/whatwg/html/issues/4792 gets fixed we need to investigate which of
Expand All @@ -109552,23 +109558,8 @@ enum <dfn enum>DOMParserSupportedType</dfn> {
data-x="dom-DOMParserSupportedType-texthtml"><code>text/html</code>"</dfn></dt>
<dd>
<ol>
<li><p>Set <var>document</var>'s <span data-x="concept-document-type">type</span> to "<code
data-x="">html</code>".</p></li>

<li><p>Create an <span>HTML parser</span> <var>parser</var>, associated with
<var>document</var>.</p></li>

<li><p>Place <var>string</var> into the <span>input stream</span> for <var>parser</var>. The
encoding <span data-x="concept-encoding-confidence">confidence</span> is
<i>irrelevant</i>.</p></li>

<li>
<p>Start <var>parser</var> and let it run until it has consumed all the characters just
inserted into the input stream.</p>

<p class="note">This might mutate the document's <span
data-x="concept-document-mode">mode</span>.</p>
</li>
<li><p><span>Parse HTML from a string</span> given <var>document</var> and
<var>string</var>.</p></li>
</ol>

<p class="note">Since <var>document</var> does not have a <span
Expand Down Expand Up @@ -109610,6 +109601,135 @@ enum <dfn enum>DOMParserSupportedType</dfn> {
<li><p>Return <var>document</var>.</p>
</ol>

<p>To <dfn>parse HTML from a string</dfn>, given a <var>document</var> <code>Document</code> and a
<span>string</span> <var>html</var>:</p>

<ol>
<li><p>Set <var>document</var>'s <span data-x="concept-document-type">type</span> to "<code
data-x="">html</code>".</p></li>

<li><p>Create an <span>HTML parser</span> <var>parser</var>, associated with
<var>document</var>.</p></li>

<li><p>Place <var>html</var> into the <span>input stream</span> for <var>parser</var>. The
encoding <span data-x="concept-encoding-confidence">confidence</span> is
<i>irrelevant</i>.</p></li>

<li>
<p>Start <var>parser</var> and let it run until it has consumed all the characters just
inserted into the input stream.</p>

<p class="note">This might mutate the document's <span
data-x="concept-document-mode">mode</span>.</p>
</li>
</ol>

</div>

<h4>Unsafe HTML parsing methods</h4>

<dl class="domintro">
<dt><code data-x=""><var>element</var>.<span subdfn
data-x="dom-Element-setHTMLUnsafe">setHTMLUnsafe</span>(<var>html</var>)</code></dt>

<dd>
<p>Parses <var>html</var> using the HTML parser, and replaces the children of <var>element</var>
with the result. <var>element</var> provides context for the HTML parser.</p>
</dd>

<dt><code data-x=""><var>shadowRoot</var>.<span subdfn
data-x="dom-ShadowRoot-setHTMLUnsafe">setHTMLUnsafe</span>(<var>html</var>)</code></dt>

<dd>
<p>Parses <var>html</var> using the HTML parser, and replaces the children of
<var>shadowRoot</var> with the result. <var>shadowRoot</var>'s <span
data-x="concept-DocumentFragment-host">host</span> provides context for the HTML parser.</p>
</dd>

<dt><code data-x=""><var>doc</var> = Document.<span
data-x="dom-parseHTMLUnsafe">parseHTMLUnsafe</span>(<var>html</var>)</code></dt>

<dd>
<p>Parses <var>html</var> using the HTML parser, and returns the resulting
<code>Document</code>.</p>

<p>Note that <code>script</code> elements are not evaluated during parsing, and the resulting
document's <span data-x="document's character encoding">encoding</span> will always be
<span>UTF-8</span>. The document's <span data-x="concept-document-url">URL</span> will be
<code>about:blank</code>.</p>
</dd>
</dl>

<p class="warning">These methods perform no sanitization to remove potentially-dangerous elements
and attributes like <code>script</code> or <span>event handler content attributes</span>.</p>

<pre><code class="idl">partial interface <span id="Element-partial">Element</span> {
undefined <span data-x="dom-Element-setHTMLUnsafe">setHTMLUnsafe</span>(DOMString html);
};

partial interface <span id="ShadowRoot-partial">ShadowRoot</span> {
undefined <span data-x="dom-ShadowRoot-setHTMLUnsafe">setHTMLUnsafe</span>(DOMString html);
};</code></pre>

<div w-nodev>

<p><code>Element</code>'s <dfn method for="Element"><code
data-x="dom-Element-setHTMLUnsafe">setHTMLUnsafe(<var>html</var>)</code></dfn> method steps
are:</p>

<ol>
<li><p>Let <var>target</var> be <span>this</span>'s <span>template contents</span> if
<span>this</span> is a <code>template</code> element; otherwise <span>this</span>.</p></li>

<li><p><span>Unsafely set HTML</span> given <var>target</var>, <span>this</span>, and
<var>html</var>.</p></li>
</ol>

<p><code>ShadowRoot</code>'s <dfn method for="ShadowRoot"><code
data-x="dom-ShadowRoot-setHTMLUnsafe">setHTMLUnsafe(<var>html</var>)</code></dfn> method steps
are to <span>unsafely set HTML</span> given <span>this</span>, <span>this</span>'s <span
data-x="concept-DocumentFragment-host">shadow host</span>, and <var>html</var>.</p>

<p>To <dfn>unsafely set HTML</dfn>, given an <code>Element</code> or <code>DocumentFragment</code>
<var>target</var>, an <code>Element</code> <var>contextElement</var>, and a <span>string</span>
<var>html</var>:</p>

<ol>
<li><p>Let <var>newChildren</var> be the result of the <span>HTML fragment parsing algorithm</span>
given <var>contextElement</var> and <var>html</var>.</p></li>

<li><p>Let <var>fragment</var> be a new <code>DocumentFragment</code> whose <span>node
document</span> is <var>contextElement</var>'s <span>node document</span>.</p></li>

<li><p>For each <var>node</var> in <var>newChildren</var>, <span
data-x="concept-node-append">append</span> <var>node</var> to <var>fragment</var>.</p></li>

<li><p><span data-x="concept-node-replace-all">Replace all</span> with <var>fragment</var> within
<var>target</var>.</p></li>
</ol>

<hr>

<p>The static <dfn method for="Document"><code
data-x="dom-parseHTMLUnsafe">parseHTMLUnsafe(<var>html</var>)</code></dfn> method steps are:</p>

<ol>
<li>
<p>Let <var>document</var> be a new <code>Document</code>, whose <span
data-x="concept-document-content-type">content type</span> is "<code
data-x="">text/html</code>".</p>

<p class="note">Since <var>document</var> does not have a <span
data-x="concept-document-bc">browsing context</span>, <span data-x="concept-n-script">scripting
is disabled</span>.</p>
</li>

<li><p><span>Parse HTML from a string</span> given <var>document</var> and
<var>html</var>.</p></li>

<li><p>Return <var>document</var>.</p></li>
</ol>

</div>


Expand Down

0 comments on commit 3ca0811

Please sign in to comment.