Skip to content

Commit

Permalink
Editorial: Remove special-casing of U+200C and U+200D (#3074)
Browse files Browse the repository at this point in the history
Unicode v15.1.0 makes both U+200C and U+200D `ID_Continue` characters, meaning we no longer need to explicitly special-case them for them to match `IdentifierPart`.

Issue: #3073
  • Loading branch information
mathiasbynens authored and ljharb committed Feb 21, 2024
1 parent fdde1c9 commit 467819a
Showing 1 changed file with 2 additions and 66 deletions.
68 changes: 2 additions & 66 deletions spec.html
Original file line number Diff line number Diff line change
Expand Up @@ -16257,69 +16257,7 @@ <h2>Syntax</h2>
<h1>Unicode Format-Control Characters</h1>
<p>The Unicode format-control characters (i.e., the characters in category “Cf” in the Unicode Character Database such as LEFT-TO-RIGHT MARK or RIGHT-TO-LEFT MARK) are control codes used to control the formatting of a range of text in the absence of higher-level protocols for this (such as mark-up languages).</p>
<p>It is useful to allow format-control characters in source text to facilitate editing and display. All format control characters may be used within comments, and within string literals, template literals, and regular expression literals.</p>
<p>U+200C (ZERO WIDTH NON-JOINER) and U+200D (ZERO WIDTH JOINER) are format-control characters that are used to make necessary distinctions when forming words or phrases in certain languages. In ECMAScript source text these code points may also be used in an |IdentifierName| after the first character.</p>
<p>U+FEFF (ZERO WIDTH NO-BREAK SPACE) is a format-control character used primarily at the start of a text to mark it as Unicode and to allow detection of the text's encoding and byte order. &lt;ZWNBSP> characters intended for this purpose can sometimes also appear after the start of a text, for example as a result of concatenating files. In ECMAScript source text &lt;ZWNBSP> code points are treated as white space characters (see <emu-xref href="#sec-white-space"></emu-xref>).</p>
<p>The special treatment of certain format-control characters outside of comments, string literals, and regular expression literals is summarized in <emu-xref href="#table-format-control-code-point-usage"></emu-xref>.</p>
<emu-table id="table-format-control-code-point-usage" caption="Format-Control Code Point Usage" oldids="table-31">
<table>
<tr>
<th>
Code Point
</th>
<th>
Name
</th>
<th>
Abbreviation
</th>
<th>
Usage
</th>
</tr>
<tr>
<td>
`U+200C`
</td>
<td>
ZERO WIDTH NON-JOINER
</td>
<td>
&lt;ZWNJ>
</td>
<td>
|IdentifierPart|
</td>
</tr>
<tr>
<td>
`U+200D`
</td>
<td>
ZERO WIDTH JOINER
</td>
<td>
&lt;ZWJ>
</td>
<td>
|IdentifierPart|
</td>
</tr>
<tr>
<td>
`U+FEFF`
</td>
<td>
ZERO WIDTH NO-BREAK SPACE
</td>
<td>
&lt;ZWNBSP>
</td>
<td>
|WhiteSpace|
</td>
</tr>
</table>
</emu-table>
<p>U+FEFF (ZERO WIDTH NO-BREAK SPACE) is a format-control character used primarily at the start of a text to mark it as Unicode and to allow detection of the text's encoding and byte order. &lt;ZWNBSP> characters intended for this purpose can sometimes also appear after the start of a text, for example as a result of concatenating files. In ECMAScript source text &lt;ZWNBSP> code points are treated as white space characters (see <emu-xref href="#sec-white-space"></emu-xref>) outside of comments, string literals, template literals, and regular expression literals.</p>
</emu-clause>

<emu-clause id="sec-white-space">
Expand Down Expand Up @@ -16568,7 +16506,7 @@ <h2>Syntax</h2>
<h1>Names and Keywords</h1>
<p>|IdentifierName| and |ReservedWord| are tokens that are interpreted according to the Default Identifier Syntax given in Unicode Standard Annex #31, Identifier and Pattern Syntax, with some small modifications. |ReservedWord| is an enumerated subset of |IdentifierName|. The syntactic grammar defines |Identifier| as an |IdentifierName| that is not a |ReservedWord|. The Unicode identifier grammar is based on character properties specified by the Unicode Standard. The Unicode code points in the specified categories in the latest version of the Unicode Standard must be treated as in those categories by all conforming ECMAScript implementations. ECMAScript implementations may recognize identifier code points defined in later editions of the Unicode Standard.</p>
<emu-note>
<p>This standard specifies specific code point additions: U+0024 (DOLLAR SIGN) and U+005F (LOW LINE) are permitted anywhere in an |IdentifierName|, and the code points U+200C (ZERO WIDTH NON-JOINER) and U+200D (ZERO WIDTH JOINER) are permitted anywhere after the first code point of an |IdentifierName|.</p>
<p>This standard specifies specific code point additions: U+0024 (DOLLAR SIGN) and U+005F (LOW LINE) are permitted anywhere in an |IdentifierName|.</p>
</emu-note>
<h2>Syntax</h2>
<emu-grammar type="definition">
Expand All @@ -16595,8 +16533,6 @@ <h2>Syntax</h2>
IdentifierPartChar ::
UnicodeIDContinue
`$`
&lt;ZWNJ&gt;
&lt;ZWJ&gt;

// emu-format ignore
AsciiLetter :: one of
Expand Down

0 comments on commit 467819a

Please sign in to comment.