Prepare @babel/core for asynchronicity #10507

nicolo-ribaudo · 2019-09-27T23:06:22Z

Q	A
Fixed Issues?	First step for #9888
Patch: Bug Fix?
Major: Breaking Change?
Minor: New Feature?
Tests Added + Pass?	Yes
Documentation PR Link
Any Dependency Changes?
License	MIT

In this PR, I am refactoring @babel/core to use gensync. It's a tool by @loganfsmyth which allows defining functions as generators which can then be run either synchronously or asynchronously, pushinc control to the edges of the program.

For example, it allows doing this:

const fs = require("fs-for-gensync");

const readJSON = gensync(function* (path) {
  const contents = yield* fs.readFile(path, "utf8"); // Sync or async? It doesn't matter
  return JSON.parse(contents);
});

readJson.sync("file.json"); // internally calls fs.readFileSync(path, encoding)
readJson.async("file.json"); // internally calls fs.readFile(path, encoding, cb)

By doing so, we will be able to easily allow some parts of Babel to be asynchronous. I identified 4 possible parts:

.pre functions of plugins
.post functions of plugins
Configuration functions (i.e. in babel.config.json)
Configuration file loading

The latter is necessary because, starting from node v12.11.0, is a user has "type": "module" in their package.json we won't be able to load babel.config.js using require but we'll need to use import(). (nodejs/modules#389)

Since JavaScript doesn't have native algebraic effects^[1], which would have allowed me to use yield only in those 4 places, we have to use yield* in the whole call stack. The last two parts are deep inside @babel/core, and that's why yield* spread almost everywhere.

This PR already makes I/O actions asynchronous when using the async functions.

This PR aligns the API of all the functions which could interact with the disk: now parse, transform, transformFile, transformFromAst, loadPartialConfig and loadOptions all have three version:

fooSync, which synchronously returns the result;
fooAsync, which returns a promise;
foo, which accepts a callback. From compatibility reasons, some of these behave synchronously when the callback is not provided.

TODO:

Improve flow types loganfsmyth/gensync#1
Unify caching logic for generators and normal functions
Test performance differences

^[1] I learned exactly what they are while preparing this PR

Performance differences: Not significant.
In order to test this PR, I compiled every file inside packages/*/src (405 files) both synchronously and asynchronously, using the following script. It's always run using node demo.js --sync && node demo.js --sync && node demo.js --sync && node demo.js --sync && node demo.js --async && node demo.js --async && node demo.js --async && node demo.js --async.

"use strict";

const load = name => require(`./packages/babel-${name}`);

const babel = load("core");
const fs = require("fs");

function walkSync(dir, filelist) {
  const files = fs.readdirSync(dir);
  filelist = filelist || [];
  files.forEach(file => {
    if (fs.statSync(dir + "/" + file).isDirectory()) {
      filelist = walkSync(dir + "/" + file, filelist);
    } else {
      filelist.push(dir + "/" + file);
    }
  });
  return filelist;
}

const files = [];

const packages = fs.readdirSync("packages");
for (const name of packages) {
  if (name === "README.md" || name.startsWith("babel-runtime")) continue;
  files.push.apply(files, walkSync("packages/" + name + "/src"));
}


const start = process.hrtime();
global.results = {};

if (process.argv.includes("--sync")) {
  process.env.SYNC_OR_ASYNC = "sync";
  for (const file of files) {
    global.results[file] = babel.transformFileSync(file);
  }
  console.log("Sync: " + process.hrtime(start));
} else {
  process.env.SYNC_OR_ASYNC = "async";
  let i = 0;
  for (const file of files) {
    i++;
    babel.transformFile(file, (err, out) => {
      if (err) {
        console.log(err);
        process.exit(1);
      }
      global.results[file] = out;
  
      if (--i === 0) console.log("Async: " + process.hrtime(start));
    });
  }
}

Results

NODE_ENV is always "test"

master branch, with api.cache.never():

Sync: 5,886560734
Sync: 5,885043757
Sync: 5,867874524
Sync: 5,803315282
Async: 5,949791895
Async: 5,789683786
Async: 5,806937770
Async: 5,853401118

core-async branch, with api.cache.never():

Sync: 5,960720491
Sync: 5,894450713
Sync: 5,891051295
Sync: 5,946484727
Async: 5,913818398
Async: 5,929368030
Async: 6,11599499
Async: 6,58814871

master branch, with api.cache.using(() => process.env.SYNC_OR_ASYNC):

Sync: 5,883534231
Sync: 4,840278891
Sync: 4,994791415
Sync: 4,897095093
Async: 4,874467892
Async: 4,824938870
Async: 4,826639093
Async: 4,851304427

core-async branch, with api.cache.using(() => process.env.SYNC_OR_ASYNC):

Sync: 4,866956272
Sync: 4,820806372
Sync: 4,857302419
Sync: 4,783584388
Async: 5,869067030
Async: 5,931280575
Async: 6,17525481
Async: 5,993042102

Conclusion: When running synchronously, the performance is almost the same. On the other hand, when running asynchronously this branch is consistently a bit slower than master. I think that it's because gensync waits one tick for each yielded value, the same as await behaves in async functions.

buildsize · 2019-09-27T23:13:05Z

File name	Previous Size	New Size	Change
babel-preset-env.js	2.77 MB	2.83 MB	59.72 KB (2%)
babel-preset-env.min.js	1.67 MB	1.69 MB	22.47 KB (1%)
babel.js	2.95 MB	3.01 MB	59.73 KB (2%)
babel.min.js	1.63 MB	1.65 MB	22.48 KB (1%)

nicolo-ribaudo · 2019-09-28T00:52:55Z

nicolo-ribaudo@3f640b2 (not included in this PR) shows how with the new architecture it is straightforward to allow some parts of Babel to be async.

loganfsmyth · 2019-09-28T03:31:25Z

packages/babel-core/src/config/full.js

@@ -45,10 +47,10 @@ type SimpleContext = {
  caller: CallerMetadata | void,
 };

-export default function loadFullConfig(
+export default gensync<[any], ResolvedConfig | null>(function* loadFullConfig(


Why the any instead of mixed?

It's because I get errors about objects not being compatible with mixed:

babel/packages/babel-core/src/transformation/block-hoist-plugin.js

Lines 14 to 18 in 0da4f7e

const config = loadConfig.sync({

babelrc: false,

configFile: false,

plugins: [blockHoistPlugin],

});

babel/packages/babel-core/src/config/index.js

Lines 18 to 19 in 0da4f7e

export function loadOptions(opts: {}): Object | null {

const config = loadFullConfig.sync(opts);

Huh, seems like a Flow bug, but fair enough.

loganfsmyth · 2019-09-28T03:45:59Z

packages/babel-core/src/config/caching-a.js

+
+    const cache = new CacheConfigurator(data);
+
+    const value = yield* handler(arg, cache);


One big complexity here is that, if this is async, you can end up with multiple calls to handler running in parallel. In my opinion, we should only ever allow one call to handler at a time for a given arg.

I'm not sure about how they could run in parallel. If it is async, handler's effects will be yielded and no other code will be run until they are completed, including other handler calls.

For instance, if you do

var doThing = makeWeakCache(function*(arg) { yield* delay(100); return {}; }); var obj = {}; const [r1, r2] = yield* gensync.all[doThing(obj), doThing(obj)]); console.log(r1 === r2);

would be true in sync code and false in async code with your current implementation because the first would run then populate the cache with the r1 object, but then the second one would finish and replace the callCache data with r2. So really the second call to see that there is already a call in-progress and wait for it to finish before checking if the cache entry matches.

Was this explanation enough to make my concern clearer?

Yes sorry, I just didn't have time to update it! There is a reason if this PR is marked as "draft" 😛

@loganfsmyth I was implementing this, and I'm not sure that it is something that we want, since it can cause a few problems:

If a function is called recursively (like a plugin loading another plugin via inherits), it will cause a deadlock: the outer function needs the inner one's return value, but the inner one won't run until the outer one is finished.

It will make async calls to Babel slower. Suppose that we have this code:

var maybeDelay = gensync({ sync: () => {}, async: t => delay(t), }); var doThing = gensync(makeWeakCache(function*(arg) { yield* maybeDelay(100); return {}; })); doThing.async(); await delay(50); doThins.sync();

here doThins.sync() runs while doThing.async is paused by the delay call. If we can't call doThing.sync until doThing.async is finished, the only option we have is to throw an error, because it can't wait.

maybeDelay might seem like a function crafted only to create this example, without real-world applications, but it is exactly like any fs.* operation.

I think that randomly throwing errors caused by race conditions is a too hight cost for to be introduced only for performance reasons. I think that we have two options:

Leave is as-is, allowing two different async calls to happen in parallel

Always invalidate the cache when switching between sync/async. I already had to do it when interacting with an either sync or async cache.invalidate call, but it shouldn't be observable.

The big issue for me is that I could really see this being a noticable perf hit for something that starts compiling a bunch of files in parallel, like @babel/cli. If you do something like

var results = Promise.all(files.map(filename => babel.transformFileAsync(filename)));

then you'll likely end up calling every plugin and preset function for every single file, because config resolving will all be kicked off in parallel and each file's process will kick off an operation that then won't get the cached valid of the previous file compilation get, because the operation will have started before the previous one finished. I haven't actually tested that though, so if you want to verify that could help the discussion. Maybe I'm missing something.

If a function is called recursively (like a plugin loading another plugin via inherits), it will cause a deadlock: the outer function needs the inner one's return value, but the inner one won't run until the outer one is finished.

We should probably disallow that anyway, if we don't already, since it would be an unresolvable problem in sync code too. I agree that recursion could be a concern, but I'm not sure it's one that wouldn't already be an issue in sync cases in general too.

If we can't call doThing.sync until doThing.async is finished, the only option we have is to throw an error, because it can't wait.

I think it's probable find to have entirely separate caching for sync and async calls to avoid this conflict. In fact we likely explicitly want that, because plugins could return different results depending on whether the system is sync or async.

Sounds like you need a mutex.

loganfsmyth · 2019-09-28T03:49:23Z

packages/babel-core/src/config/caching-a.js

+
+    if (cachedValue) {
+      for (const { value, valid } of cachedValue) {
+        if (valid(data)) return value;


Do you think we should consider allowing this to be async too? We're currently stuck with the one statSync because it is sync-only. We shouldn't expose gensync to consumers, but we could repeat the .then check here too and allow returning a promise, if we wanted.

Not needed now and easy to expand later, but curious for your thoughts.

I didn't change it because I still have to finish refactoring the caching logic, but I will do it 👍

loganfsmyth · 2019-09-28T04:07:29Z

This is exactly the usage I had in mind when designing gensync, so I'm happy to hear that it seems to have worked pretty smoothly overall.

packages/babel-core/package.json

loganfsmyth · 2019-09-30T01:39:26Z

packages/babel-core/src/config/full.js

@@ -12,7 +15,8 @@ import {
 } from "./config-chain";
 import type { UnloadedDescriptor } from "./config-descriptors";
 import traverse from "@babel/traverse";
-import { makeWeakCache, type CacheConfigurator } from "./caching";
+import { makeWeakCache, type CacheConfigurator } from "./caching-a";
+import { makeWeakCache as makeWeakCacheSyncSync } from "./caching";


makeWeakCacheSyncSync => makeWeakCacheSync?

I still have to refactor it, but there is a difference: Sync allows an async cache invalidation function and a sync cached function, while with SyncSync both are sinchronous.

loganfsmyth · 2019-09-30T01:45:44Z

packages/babel-core/src/gensync-utils/forward.js

+  });
+}
+
+export const isAsync = gensync<[any], any>({


Is there a way we can avoid any? I tried really hard to never use any in the core codebase, which is why I keep mentioning it.

This can definitely be removed, I don't know why I used it

loganfsmyth · 2019-09-30T01:48:34Z

packages/babel-core/src/gensync-utils/forward.js

+
+type MaybePromise<T> = T | Promise<T>;
+
+export default function forward<ActionArgs: mixed[], ActionReturn, Return>(


It would be good to have some comments in these, since I don't think it's very clear what their expected usecases would be.

loganfsmyth · 2019-09-30T01:49:10Z

packages/babel-core/src/gensync-utils/maybe-async.js

+
+export function isThenable(val: mixed): boolean %checks {
+  return (
+    /*:: val instanceof Promise && */


loganfsmyth · 2019-09-30T01:51:01Z

packages/babel-core/src/config/caching-a.js

+
+    const cache = new CacheConfigurator(data);
+
+    const value = yield* handler(arg, cache);


Was this explanation enough to make my concern clearer?

nicolo-ribaudo · 2019-10-02T22:12:26Z

packages/babel-core/src/config/caching.js

+ *   1. If there is a valid cache associated to the current "arg" parameter,
+ *       a. RETURN the cached value
+ *   2. If there is a ConfigLock associated to the current "arg" parameter,
+ *       a. Wait for that lock to be released
+ *       b. GOTO step 1
+ *   3. If there is a FinishLock associated to the current "arg" parameter representing a valid cache,
+ *       a. Wait for that lock to be released
+ *       b. RETURN the value associated with that lock
+ *   4. Acquire to locks: ConfigLock and FinishLock
+ *   5. Start executing the cached function
+ *   6. When the cache if configured, release ConfigLock
+ *   7. Wait for the function to finish executing
+ *   8. Release FinishLock
+ *   9. Send the function result to anyone waiting on FinishLock
+ *   10. Store the result in the cache
+ *   11. RETURN the result


@loganfsmyth What do you think abuot this algorithm, to solve the concurrency issue?
It is sub optimal in cases like the following one (it will take 2s instead of 1s), but I don't think that they are common:

const fn = makeStrongCache(function* (arg) { yield* wait(1000); cache.never(); }).async; fn(); fn();

I initially tried to force cache.* to be called synchronously (before any yield), but then found out that it doesn't even work for our internal code: https://github.com/babel/babel/pull/10507/files#diff-83f25d98662c0f8e8ad7cc7d1ed06f8eR32

I tested the performance of this change, and there is a problem.
I have compiled every src file of this repo four times (405 x 4 files) and there are the results:

cache.forever() cache.never()

async 15.372567052 s 470.434076412 s

sync 15.862861386 s 19.743430078 s

As you can see, there is a big problem with cache.never() + async.
It is related to re-compiling the same file. If I compile every file once instead of four times, it doesn't have this big difference:

cache.never()

async 10.391877456 s

sync 6.149023702 s

If I compile the same file multiple times with cache.never(), the time initially grows linearly (as expected), and then it slows down:

# async sync

100 2.308958374 s 2.207794832 s

200 4.242482485 s 3.747946110 s

400 10.356814312 s 6.574130764 s

800 50.817692998 s 11.663716923 s

1600 466.833293529 s 22.983728867 s

As you can see, the numbers for 1x1600 files are very similar to the number for 405x4 files (first table), and the numbers for 1x400 files are very similar to 405x1 (second table).
This shows that it doesn't matter if we are always transpiling the same file or not: when compiling more than 400 files the async version becomes super slow. My guess is that it is caused by all the promises getting created, but I don't know neither how to verify it, neither if it can be fixed.

@loganfsmyth We can solve this big performance problem if we disallow the cache to be configured asynchronously. Synchronously configuring the cache with an async validator would still be allowed:

// waitFor transforms a "Promise<T> | T" into a "Gensync<T>" makeWeakCache(function* (arg) { // Allowed, because cache.using is called synchronously const val = yield* waitFor( cache.using(async () => { await wait(100); return process.env.FOO; }) ); }); makeWeakCache(function* (arg) { // Disallowed, because cache.using is called asynchronously yield* waitFor(wait(100)); const val = cache.using(() => process.env.FOO); });

In the future, this change will be observable to our users because they won't be able to use await before configuring the cache.

In order to implement this behaviour I had to modify gensync's internals, because I couldn't find a way "from the outside" of knowing if a generator has been suspended. I will open a PR soon because I edited it in my node_modules and I fear to loose those changes, but if you have a better solution feel free to propose it!

These are the new numbers with cache.never():

# async sync sync master

100 2.219376366 s 2.256803449 s 2.143184428 s

200 3.714403922 s 3.641993385 s 3.591130842 s

400 6.338499966 s 6.638144645 s 6.299714871 s

800 11.772867713 s 12.198001927 s 11.539516664 s

1600 21.811404623 s 22.632490371 s 21.436522896 s

nicolo-ribaudo · 2019-10-07T13:40:31Z

Ready for review!

nicolo-ribaudo · 2019-10-16T06:41:03Z

Note that node 13.0.0 will be released soon (nodejs/node#29504), and v13.1.0 will have unflagged modules (https://twitter.com/jasnell/status/1184265267951493120?s=20), around the beginning of November.

As soon as this PR is merged, I will open another PR to make it possible to use js config files in packages using es modules.

JLHwung

Nicely crafted!

packages/babel-core/src/gensync-utils/fs.js

nicolo-ribaudo added PR: Internal 🏠 A type of pull request used for our changelog categories pkg: core labels Sep 27, 2019

loganfsmyth reviewed Sep 28, 2019

View reviewed changes

coreyfarrell mentioned this pull request Sep 28, 2019

Async API / Support loading config from ESM istanbuljs/load-nyc-config#6

Closed

loganfsmyth reviewed Sep 30, 2019

View reviewed changes

nicolo-ribaudo force-pushed the core-async branch from 60d7a7c to 963a2bd Compare October 2, 2019 22:07

nicolo-ribaudo commented Oct 2, 2019

View reviewed changes

nicolo-ribaudo mentioned this pull request Oct 6, 2019

Add onFirstPause utility method loganfsmyth/gensync#2

Closed

nicolo-ribaudo force-pushed the core-async branch 2 times, most recently from cfe206d to f83fc29 Compare October 7, 2019 10:40

nicolo-ribaudo marked this pull request as ready for review October 7, 2019 13:40

nicolo-ribaudo requested a review from loganfsmyth October 16, 2019 06:42

nicolo-ribaudo mentioned this pull request Oct 22, 2019

Use babel.transformAsync rollup/rollup-plugin-babel#331

Merged

nicolo-ribaudo force-pushed the core-async branch from f83fc29 to 494ae83 Compare November 3, 2019 23:38

JLHwung self-requested a review November 4, 2019 01:45

JLHwung approved these changes Nov 4, 2019

View reviewed changes

packages/babel-core/src/gensync-utils/fs.js Show resolved Hide resolved

This was referenced Nov 25, 2019

Add support for .mjs config files #10755

Closed

Dec 04 meeting babel/notes#104

Closed

JLHwung added this to the v7.8.0 milestone Dec 12, 2019

nicolo-ribaudo force-pushed the core-async branch 3 times, most recently from 43616c6 to c762cdd Compare December 20, 2019 23:31

nicolo-ribaudo mentioned this pull request Dec 21, 2019

Add support for babel.config.mjs and .babelrc.mjs #10903

Merged

nicolo-ribaudo force-pushed the core-async branch 2 times, most recently from 9be12dc to 313a5a7 Compare December 22, 2019 23:41

github-actions bot added the outdated A closed issue/PR that is archived due to age. Recommended to make a new issue label Apr 11, 2020

github-actions bot locked as resolved and limited conversation to collaborators Apr 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepare @babel/core for asynchronicity #10507

Prepare @babel/core for asynchronicity #10507

nicolo-ribaudo commented Sep 27, 2019 •

edited

buildsize bot commented Sep 27, 2019 •

edited

nicolo-ribaudo commented Sep 28, 2019

loganfsmyth Sep 28, 2019

nicolo-ribaudo Sep 28, 2019

loganfsmyth Sep 28, 2019

loganfsmyth Sep 28, 2019

nicolo-ribaudo Sep 28, 2019

loganfsmyth Sep 28, 2019

loganfsmyth Sep 30, 2019

nicolo-ribaudo Sep 30, 2019

nicolo-ribaudo Sep 30, 2019

nicolo-ribaudo Oct 1, 2019

loganfsmyth Oct 2, 2019

Jessidhia Oct 7, 2019

loganfsmyth Sep 28, 2019

nicolo-ribaudo Sep 28, 2019

loganfsmyth Sep 28, 2019

loganfsmyth commented Sep 28, 2019

loganfsmyth Sep 30, 2019

nicolo-ribaudo Sep 30, 2019

loganfsmyth Sep 30, 2019

nicolo-ribaudo Sep 30, 2019

loganfsmyth Sep 30, 2019

loganfsmyth Sep 30, 2019

loganfsmyth Sep 30, 2019

nicolo-ribaudo Oct 2, 2019

nicolo-ribaudo Oct 3, 2019 •

edited

nicolo-ribaudo Oct 6, 2019

nicolo-ribaudo Oct 6, 2019 •

edited

nicolo-ribaudo commented Oct 7, 2019

nicolo-ribaudo commented Oct 16, 2019 •

edited

JLHwung left a comment

	const config = loadConfig.sync({
	babelrc: false,
	configFile: false,
	plugins: [blockHoistPlugin],
	});

	export function loadOptions(opts: {}): Object \| null {
	const config = loadFullConfig.sync(opts);


		const cache = new CacheConfigurator(data);

		const value = yield* handler(arg, cache);


		type MaybePromise<T> = T \| Promise<T>;

		export default function forward<ActionArgs: mixed[], ActionReturn, Return>(

	`cache.forever()`	`cache.never()`
async	15.372567052 s	470.434076412 s
sync	15.862861386 s	19.743430078 s

#	async	sync
100	2.308958374 s	2.207794832 s
200	4.242482485 s	3.747946110 s
400	10.356814312 s	6.574130764 s
800	50.817692998 s	11.663716923 s
1600	466.833293529 s	22.983728867 s

#	async	sync	sync `master`
100	2.219376366 s	2.256803449 s	2.143184428 s
200	3.714403922 s	3.641993385 s	3.591130842 s
400	6.338499966 s	6.638144645 s	6.299714871 s
800	11.772867713 s	12.198001927 s	11.539516664 s
1600	21.811404623 s	22.632490371 s	21.436522896 s

Prepare @babel/core for asynchronicity #10507

Prepare @babel/core for asynchronicity #10507

Conversation

nicolo-ribaudo commented Sep 27, 2019 • edited

buildsize bot commented Sep 27, 2019 • edited

nicolo-ribaudo commented Sep 28, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TODO

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

loganfsmyth commented Sep 28, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicolo-ribaudo Oct 3, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicolo-ribaudo Oct 6, 2019 • edited

Choose a reason for hiding this comment

nicolo-ribaudo commented Oct 7, 2019

nicolo-ribaudo commented Oct 16, 2019 • edited

JLHwung left a comment

Choose a reason for hiding this comment

nicolo-ribaudo commented Sep 27, 2019 •

edited

buildsize bot commented Sep 27, 2019 •

edited

nicolo-ribaudo Oct 3, 2019 •

edited

nicolo-ribaudo Oct 6, 2019 •

edited

nicolo-ribaudo commented Oct 16, 2019 •

edited