Skip to content

Sucrase Optional Chaining and Nullish Coalescing Technical Plan

Alan Pierce edited this page Jan 2, 2020 · 22 revisions

Background

Optional Chaining (?.) and Nullish Coalescing (??) are highly-anticipated JavaScript features that reached stage 3 in July 2019 and stage 4 in December 2019. Since the features reached stage 3, TypeScript and other tools have started officially supporting the syntax. This document outlines a plan for transpiling the syntax within Sucrase.

Currently, Sucrase's behavior is to pass through optional chaining syntax, so that the underlying JS engine (and other tools) needs to support it. Here's which JS engines/tools support it as of December 2019:

  • Chrome: Chrome stable (79) supports the syntax behind a flag and Chrome beta (80) supports the syntax by default.
  • Other browsers: No support yet except Safari, which doesn't seem to have fully shipped it. Details at the Chrome status page.
  • Webpack: No support yet, so projects using Webpack and Sucrase will crash in the build step even if the browser supports the syntax. Wepack uses Acorn to parse the loader output, and Acorn doesn't support the syntax yet, though there is an open PR to add it.
  • Node: Node 13 supports it behind the --harmony-optional-chaining and --harmony-nullish flags. The upcoming Node 14 (scheduled for release in April 2020) will likely be the first version to support it by default. Node 12 may support it behind a flag if https://github.com/nodejs/node/pull/30109 is merged.

Sucrase's primary use case is to transpile typical modern source code to JavaScript that can be run in a typical professional dev environment. (Support for older browsers is out of scope and better handled by other tools.) For optional chaining, it looks like the limiting factor is Node: the feature will be officially available in April 2020, then hit LTS in October 2020, and dev environments may often lag behind that. Getting it working in Sucrase now means that we can start using it now without the awkward tradeoff between compile speed and nice syntax, rather than having to wait a year.

Implementation ideas

Existing implementations

Overview of constraints

Sucrase intentionally doesn't have a syntax tree and keeps the transformation restricted to a (mostly) left-to-right scan, so it's much more limited than other transpilers. Some amount of lookahead is possible, but it's challenging to safely combine that with other transforms. The token format is flexible and it may be useful to add additional custom information to tokens (like "start of optional chain"), but ideally this would be kept to a minimum.

Challenges

  • In an expression like a()?.b, the a function must be called exactly once, so repeating the left-hand side won't be allowed. Babel and TS solve this by extracting computed expressions into variables (block-scoped for Babel and function-scoped for TS). decaffeinate solves this by using a helper function that reuses its parameters.
  • In an expression like a.b?.(), we need to make sure that a is used as the proper this. Babel and TS solve this by sometimes injecting a .call. decaffeinate solves this by making a separate __guardMethod__ helper.
  • Both of these operators are infix operators that may have an arbitrary number of tokens on their left-hand side. Under normal token processing, by the time we see the operator, it's too late to change earlier code. This means we'll need lookahead of some sort. (Other transpilers do AST-based transforms, so don't run into this problem.)
  • Operator nesting is a bit unintuitive: a?.b.c is a single chain that short-circuits on failure rather than a?.b being evaluated as its own step, like (a?.b).c. In the Babel AST, the entire a.b?.c.d node is an OptionalMemberExpression to indicate that it's a chain, and some nodes have optional=false because they are required components of the chain.
  • The spec also includes optional deletion: delete a?.b, so that will need a special implementation. The spec disallows optional new, optional templates, and optional assignment, so delete is the only real additional case to think about. delete may have an arbitrary chain as its operand, with any failure in the chain causing the delete to not happen and the expression to evaluate to true. Given that optional deletion is relatively obscure, it may be fine to ignore in Sucrase.
  • As with any feature based around code edits, the implementation will need to be careful to respect operator precedence and not change code in a way that changes the behavior of automatic semicolon insertion.

Techniques available

  • The parser at parseSubscripts and parseSubscript should be able to collect information about access chains, e.g. a token marker for the beginning and end of a chain.
  • We can find unique names that are not found anywhere else in the file using NameManager.claimFreeName. At the moment, there isn't a way to declare that variable in the nearest block or function scope. Mutable global variables may work, but there are probably cases of reentrant functions where it would break. Free names could also be used for lambda parameters.
  • There's a HelperManager system for generating code snippets at the top of the file, so any helper function could be included. Generally it may be preferable to pull logic into a helper function if it simplifies the syntax transform.
  • It may be possible to insert a fake token like nullishCoalescingStart, though tokens today generally represent non-empty ranges of code. It's also fairly difficult for the parser to insert tokens earlier because things like token indices may have already been established, like in scopes that have been pushed within the LHS.

Nullish Coalescing ideas

Example input:

E1 ?? E2 ?? E3

Idea 1: Nested calls to a helper

nullishCoalesce(nullishCoalesce(E1, () => E2), () => E3)

The operator is implemented as a function and we rewrite the code to call that function.

Thoughts:

  • Pro: Matches the AST structure.
  • Pro: Operator transform is just ?? to , () => .
  • Con: If we use token tagging rather than lookahead, the token needs a way of recording how many nullishCoalesce( snippets to insert, since there may be multiple on the same token.

Idea 2: Send call chain to a helper

nullishCoalesce([E1, () => E2, () => E3])

Nullish coalescing is associative, so either evaluation order should be able to transform to this.

Thoughts:

  • Pro: Operator transform is just ?? to , () => .
  • Pro: The code transformation is hopefully more straightforward than option 1. ?? becomes , () => and we tag the start and end tokens to have nullishCoalesce([ inserted before the first and ]) inserted after the last.

Idea 3: Babel's approach

(_ref = (_E = E1) !== null && _E !== void 0 ? _E : E2) !== null && _ref !== void 0 ? _ref : E3;

Uses an assignment expression

Thoughts:

  • Con: This requires implementing a way of extracting variables.
  • Con: Code snippet inserted requires knowing the name of the extracted variable.

Idea 4: TypeScript's approach

(_b = (_a = E1, (_a !== null && _a !== void 0 ? _a : E2)), (_b !== null && _b !== void 0 ? _b : E3))

Uses comma operator to evaluate each expression before using it in comparison and the ternary consequent.

Thoughts:

  • Con: This requires implementing a way of extracting variables.
  • Con: Code snippet inserted requires knowing the name of the extracted variable.
  • Con: Comma operators are tricky when used inside function calls, since we don't want it to expand to two arguments. This may be solvable by just wrapping the expression in parens.

Decision

It seems like either Idea 1 or Idea 2 will be most straightforward. The main goal with Idea 2 is to avoid the need to think about nesting and to reduce the number of paren inserts, since each one has a chance of being misplaced in subtle edge cases. This leads to some follow-up questions:

  • Is it actually true that the Idea 2 strategy avoids the need to insert multiple start and end fragments in the same position?
  • Is it actually difficult to insert multiple start or multiple end fragments in the same position?
  • Are there other challenges differentiating the two strategies?

From some code investigation, it looks like one one advantage of Idea 1 is that it may be easier for the parser since the parser can make one change for each ?? operator it observes. The structure of parseExprOp does have some opportunity to recognize chains of the same operator, but it's not trivial. So the current plan is to try out Idea 1 and switch to Idea 2 if necessary.

Optional Chaining ideas

Example input:

a(b)?.c(d).e?.(f)?.[g(h)]

Idea 1: Nested calls

optionalChain(optionalChainMethod(optionalChain(a(b), _1 => _1.c(d)), 'e', _2 => _2(f)), _3 => _3[g(h)])

Thoughts:

  • Pro: Avoids the need to extract variables, follows the nested structure of evaluation order.
  • Pro: This is how the transform works in decaffeinate, so it's at least a little "battle tested".
  • Con: The prefix to add to the first token is complex to express in the token system, since it's a sequence of invocation starts of different names.
  • Con: The optionalChainMethod detail surfaces quite a bit of additional complexity and may not handle all cases correctly (noted in a decaffeinate comment). It also means needing to transform the property .e to the string 'e'.

Idea 2: One call per chain

optionalChain([a, 'call', _1 => _1(b), 'optionalAccess', _2 => _2.c, 'call', _3 => _3(d), 'access', _4 => _4.e, 'optionalCall', _5 => _5(f), 'access', _6 => _6[g(h)]])

Thoughts:

  • Pro: Avoids the need to extract variables.
  • Pro: Logic around specifying the proper this can happen in the helper function rather than being a concern of the transpiler.
  • Pro: The code transformation is hopefully relatively straightforward. For example, ?. becomes , 'optionalAccess', _1 => _1., with _1 as a name that's globally unique in the file.
  • Con: There's a certain ugliness to the string literals and using the generic function even for simple cases.

Idea 3: Babel/TypeScript's approach

(_a = a(b)) === null || _a === void 0 ? void 0 : (_a$c$e = (_a$c = _a.c(d)).e) === null || _a$c$e === void 0 ? void 0 : (_a$c$e$call = _a$c$e.call(_a$c, f)) === null || _a$c$e$call === void 0 ? void 0 : _a$c$e$call[g(h)];

For optional chaining, the TS output is very similar to the Babel output, so I'm considering them the same case here.

Thoughts:

  • Pro: Widely tested, so the transform would be safe if I can implement it.
  • Con: Sucrase doesn't currently have a way to extract variables into a function/block scope.
  • Con: The necessary code edits (e.g. the code between c(d) and f) are likely very complex to generate. Rather than the typical strategy of inserting code snippets, it may be possible to replace the code block with newly-generated code, but this is complex because sub-expressions need to go through the general Sucrase transform.

Decision

Idea 2 seems the most promising. It will require tagging which operators (., ?., (, etc) are part of a chain, and tagging the start and end positions of each chain, but those should all hopefully be doable. It also gives the helper method the flexibility to implement proper this semantics.

Optional Chaining deletion ideas

Ideally, any implementation of delete a?.b will use the same strategy as the other optional chaining implementation, i.e. idea 2: one call per chain.

Example input:

delete a?.b.c

Idea 1

optionalChainDelete([a, 'optionalAccess', _1 => _1.b, 'access', _2 => delete _2.c])

Idea 2

optionalChainDelete([a, 'optionalAccess', _1 => _1.b], 'c')

Thoughts:

  • Con: When a is {b: null}, we need a way to crash on delete a?.b.c but not on a?.b?.c.

Idea 3

optionalChainDelete([a, 'optionalAccess', _1 => _1.b, 'delete', 'c'])

Idea 4

nullishCoalesce(optionalChain([a, 'optionalAccess', _1 => _1.b, 'access', _2 => delete _2.c]), true)

Idea 5

delete optionalChainDelete([a, 'optionalAccess', _1 => _1.b]).c

Here, optionalChainDelete would default to {} rather than undefined so that the delete always works.

Decision

A common theme with all of these is that we need a way to identify the very last access in the chain, so that seems unavoidable. Beyond that, there are a few tradeoffs and considerations:

  • Should "identify the last access in the chain" happen in the parser or the transformer?
    • The transformer could walk the tokens, bookkeeping optional chain starts and ends and finding the last operator at depth 1. This feels a bit inefficient, but would be for an obscure case, and it may be nice to avoid adding more complexity to the parser.
  • Do we want to use the same optionalChain function or a new one, and if a new one, what should it be?
    • We need some way of defaulting to true as the return value. This could be done like in idea 4 by wrapping it in nullishCoalesce as well, but it seems probably easiest from a transpile standpoint to just make a new function. We can conditionally emit optionalChain or optionalChainDelete based on whether there's an immediately-preceding delete token, and then the closing code snippet doesn't need to distinguish them.
  • Do we add deletion support to the optionalChain function or transpile the delete operator in the right place?
    • Emitting a 'delete' string is about the same difficult as emitting , 'access', _ => delete _., and the custom operator has its own complexity because we need to pass the property as a string, so keeping optionalChain as-is and emitting delete at transpile time seems best.

All of these point to idea 1 as the most promising.

Proposed transform

Nullish Coalescing

E1 ?? E2 ?? E3

becomes

function nullishCoalesce(lhs, rhsFn) {
  if (lhs != null) {
    return lhs;
  } else {
    return rhsFn();
  }
}

nullishCoalesce(nullishCoalesce(E1, () => E2), () => E3)

Optional Chaining

a(b)?.c(d).e?.(f)?.[g(h)]

becomes

function optionalChain(ops) {
  let lastAccessLHS = undefined;
  let value = ops[0];
  let i = 1;
  while (i < ops.length) {
    const op = ops[i];
    const fn = ops[i + 1];
    i += 2;
    if ((op === 'optionalAccess' || op === 'optionalCall') && value == null) {
      return undefined;
    }
    if (op === 'access' || op === 'optionalAccess') {
      lastAccessLHS = value;
      value = fn(value);
    } else if (op === 'call' || op === 'optionalCall') {
      value = fn((...args) => value.call(lastAccessLHS, ...args));
      lastAccessLHS = undefined;
    }
  }
  return value;
}

optionalChain([a, 'call', _1 => _1(b), 'optionalAccess', _2 => _2.c, 'call', _3 => _3(d), 'access', _4 => _4.e, 'optionalCall', _5 => _5(f), 'access', _6 => _6[g(h)]])

Optional Chain delete

delete a?.b.c

becomes:

// Same optionalChain as above.

function optionalChainDelete(ops) {
  const result = optionalChain(ops);
  return result == null ? true : result;
}

optionalChainDelete([a, 'optionalAccess', _1 => _1.b, 'access', _2 => delete _2.c])

Test Plan

A worry is that there isn't much code in the wild using these new syntax features, so it's hard to get confidence that it's working correctly. Before releasing, the plan is to get tests working in a number of scenarios:

  • The Babel project already uses the syntax, so I'll update it to latest and make sure everything works there.
  • As always, there will be regular tests with the input and output code.
  • There will also be some behavioral tests (assertResult and related functions in the test helpers).
    • Since the syntax is available behind a flag in Node 13, I'm hoping that all behavioral tests can be validated with the V8 behavior as well.
  • The test262 suite has a number of tests that I could make use of. Integrating them may have some technical challenges, but I'll try to integrate them (or at least run them) as part of this project.

Bugs from test262 suite

[Fixed] Can't use await syntax within an optional chain expression or nullish coalescing RHS

Example:

a?.b(await f())

becomes

_optionalChain([a, 'optionalAccess', _ => _.b, 'call', _2 => _2(await f())])

which fails because await can't be used inside a regular arrow function.

Approach

We can support this syntax by making an async alternative to the helper functions:

await _asyncOptionalChain([a, 'optionalAccess', async _ => _.b, 'call', async _2 => _2(await f())])

This gets nearly the correct behavior, though it's a little different; because the function has a few more async exit points, there may be some race conditions possible in the transpiled code that wouldn't be possible in the ideal code (different code sections of the optional chain expecting no interruptions between them). Microtasks complicate the reasoning here and probably make the issue less likely to cause problems. Overall, the race condition concern seems minor/obscure enough that the _asyncOptionalChain transform seems fine.

To make this transform work, we need a way of detecting whether the optional chain uses the await keyword (except within a nested async function). This can be done by scanning the tokens between the optional chain/nullish coalesce start/end and finding await keywords with the same scope depth.

[Fixed] Optional chains starting with super cause problems

Example:

super.a()?.b;

This breaks because the transformed code treats super as a standalone expression, even though it needs to be used as part of . syntax.

Approach

In most cases, this can be fixed by leaving off the transform if the left-hand side is a super token. For example:

super()?.a;

becomes:

_optionalChain([super(), 'optionalAccess', _ => _.a]);

However, this won't quite work for this example:

super.a()?.b;

since we need the function to be called with the proper this. One approach that we can take here is to bind this after the function:

_optionalChain([super.a.bind(this), 'call', _ => _(), 'optionalAccess', _2 => _2.b]);

Optional chaining can result in this not being passed with particular placement of parens

Example:

(a?.b)();

The parens around a?.b correctly stop the short circuiting behavior, but in the case where a is non-null, it should still be passed as the this in the function call, but isn't.

Approach

Getting this right seems very difficult, and would need a smarter form of parsing than Sucrase has right now. As noted in the proposal README, there's no practical reason to put parens around an optional chain expression, so the current plan is to call this detail out of scope for Sucrase.