New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prepare @babel/core for asynchronicity #10507
Conversation
nicolo-ribaudo@3f640b2 (not included in this PR) shows how with the new architecture it is straightforward to allow some parts of Babel to be async. |
@@ -45,10 +47,10 @@ type SimpleContext = { | |||
caller: CallerMetadata | void, | |||
}; | |||
|
|||
export default function loadFullConfig( | |||
export default gensync<[any], ResolvedConfig | null>(function* loadFullConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the any
instead of mixed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because I get errors about objects not being compatible with mixed
:
babel/packages/babel-core/src/transformation/block-hoist-plugin.js
Lines 14 to 18 in 0da4f7e
const config = loadConfig.sync({ | |
babelrc: false, | |
configFile: false, | |
plugins: [blockHoistPlugin], | |
}); |
babel/packages/babel-core/src/config/index.js
Lines 18 to 19 in 0da4f7e
export function loadOptions(opts: {}): Object | null { | |
const config = loadFullConfig.sync(opts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, seems like a Flow bug, but fair enough.
|
||
const cache = new CacheConfigurator(data); | ||
|
||
const value = yield* handler(arg, cache); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One big complexity here is that, if this is async, you can end up with multiple calls to handler
running in parallel. In my opinion, we should only ever allow one call to handler
at a time for a given arg
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about how they could run in parallel. If it is async, handler
's effects will be yielded and no other code will be run until they are completed, including other handler
calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For instance, if you do
var doThing = makeWeakCache(function*(arg) {
yield* delay(100);
return {};
});
var obj = {};
const [r1, r2] = yield* gensync.all[doThing(obj), doThing(obj)]);
console.log(r1 === r2);
would be true
in sync code and false
in async code with your current implementation because the first would run then populate the cache with the r1
object, but then the second one would finish and replace the callCache
data with r2
. So really the second call to see that there is already a call in-progress and wait for it to finish before checking if the cache entry matches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this explanation enough to make my concern clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sorry, I just didn't have time to update it! There is a reason if this PR is marked as "draft" 😛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@loganfsmyth I was implementing this, and I'm not sure that it is something that we want, since it can cause a few problems:
-
If a function is called recursively (like a plugin loading another plugin via
inherits
), it will cause a deadlock: the outer function needs the inner one's return value, but the inner one won't run until the outer one is finished. -
It will make async calls to Babel slower. Suppose that we have this code:
var maybeDelay = gensync({ sync: () => {}, async: t => delay(t), }); var doThing = gensync(makeWeakCache(function*(arg) { yield* maybeDelay(100); return {}; })); doThing.async(); await delay(50); doThins.sync();
here
doThins.sync()
runs whiledoThing.async
is paused by thedelay
call. If we can't calldoThing.sync
untildoThing.async
is finished, the only option we have is to throw an error, because it can't wait.maybeDelay
might seem like a function crafted only to create this example, without real-world applications, but it is exactly like anyfs.*
operation.
I think that randomly throwing errors caused by race conditions is a too hight cost for to be introduced only for performance reasons. I think that we have two options:
- Leave is as-is, allowing two different async calls to happen in parallel
- Always invalidate the cache when switching between sync/async. I already had to do it when interacting with an either sync or async
cache.invalidate
call, but it shouldn't be observable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The big issue for me is that I could really see this being a noticable perf hit for something that starts compiling a bunch of files in parallel, like @babel/cli
. If you do something like
var results = Promise.all(files.map(filename => babel.transformFileAsync(filename)));
then you'll likely end up calling every plugin and preset function for every single file, because config resolving will all be kicked off in parallel and each file's process will kick off an operation that then won't get the cached valid of the previous file compilation get, because the operation will have started before the previous one finished. I haven't actually tested that though, so if you want to verify that could help the discussion. Maybe I'm missing something.
If a function is called recursively (like a plugin loading another plugin via inherits), it will cause a deadlock: the outer function needs the inner one's return value, but the inner one won't run until the outer one is finished.
We should probably disallow that anyway, if we don't already, since it would be an unresolvable problem in sync code too. I agree that recursion could be a concern, but I'm not sure it's one that wouldn't already be an issue in sync cases in general too.
If we can't call doThing.sync until doThing.async is finished, the only option we have is to throw an error, because it can't wait.
I think it's probable find to have entirely separate caching for sync and async calls to avoid this conflict. In fact we likely explicitly want that, because plugins could return different results depending on whether the system is sync or async.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like you need a mutex.
|
||
if (cachedValue) { | ||
for (const { value, valid } of cachedValue) { | ||
if (valid(data)) return value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should consider allowing this to be async too? We're currently stuck with the one statSync
because it is sync-only. We shouldn't expose gensync to consumers, but we could repeat the .then
check here too and allow returning a promise, if we wanted.
Not needed now and easy to expand later, but curious for your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't change it because I still have to finish refactoring the caching logic, but I will do it 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
This is exactly the usage I had in mind when designing |
@@ -12,7 +15,8 @@ import { | |||
} from "./config-chain"; | |||
import type { UnloadedDescriptor } from "./config-descriptors"; | |||
import traverse from "@babel/traverse"; | |||
import { makeWeakCache, type CacheConfigurator } from "./caching"; | |||
import { makeWeakCache, type CacheConfigurator } from "./caching-a"; | |||
import { makeWeakCache as makeWeakCacheSyncSync } from "./caching"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makeWeakCacheSyncSync
=> makeWeakCacheSync
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have to refactor it, but there is a difference: Sync
allows an async cache invalidation function and a sync cached function, while with SyncSync
both are sinchronous.
}); | ||
} | ||
|
||
export const isAsync = gensync<[any], any>({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way we can avoid any
? I tried really hard to never use any
in the core codebase, which is why I keep mentioning it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can definitely be removed, I don't know why I used it
|
||
type MaybePromise<T> = T | Promise<T>; | ||
|
||
export default function forward<ActionArgs: mixed[], ActionReturn, Return>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have some comments in these, since I don't think it's very clear what their expected usecases would be.
|
||
export function isThenable(val: mixed): boolean %checks { | ||
return ( | ||
/*:: val instanceof Promise && */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tricky!
|
||
const cache = new CacheConfigurator(data); | ||
|
||
const value = yield* handler(arg, cache); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this explanation enough to make my concern clearer?
60d7a7c
to
963a2bd
Compare
* 1. If there is a valid cache associated to the current "arg" parameter, | ||
* a. RETURN the cached value | ||
* 2. If there is a ConfigLock associated to the current "arg" parameter, | ||
* a. Wait for that lock to be released | ||
* b. GOTO step 1 | ||
* 3. If there is a FinishLock associated to the current "arg" parameter representing a valid cache, | ||
* a. Wait for that lock to be released | ||
* b. RETURN the value associated with that lock | ||
* 4. Acquire to locks: ConfigLock and FinishLock | ||
* 5. Start executing the cached function | ||
* 6. When the cache if configured, release ConfigLock | ||
* 7. Wait for the function to finish executing | ||
* 8. Release FinishLock | ||
* 9. Send the function result to anyone waiting on FinishLock | ||
* 10. Store the result in the cache | ||
* 11. RETURN the result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@loganfsmyth What do you think abuot this algorithm, to solve the concurrency issue?
It is sub optimal in cases like the following one (it will take 2s instead of 1s), but I don't think that they are common:
const fn = makeStrongCache(function* (arg) {
yield* wait(1000);
cache.never();
}).async;
fn(); fn();
I initially tried to force cache.*
to be called synchronously (before any yield
), but then found out that it doesn't even work for our internal code: https://github.com/babel/babel/pull/10507/files#diff-83f25d98662c0f8e8ad7cc7d1ed06f8eR32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the performance of this change, and there is a problem.
I have compiled every src file of this repo four times (405 x 4 files) and there are the results:
cache.forever() |
cache.never() |
|
---|---|---|
async | 15.372567052 s | 470.434076412 s |
sync | 15.862861386 s | 19.743430078 s |
As you can see, there is a big problem with cache.never()
+ async.
It is related to re-compiling the same file. If I compile every file once instead of four times, it doesn't have this big difference:
cache.never() |
|
---|---|
async | 10.391877456 s |
sync | 6.149023702 s |
If I compile the same file multiple times with cache.never()
, the time initially grows linearly (as expected), and then it slows down:
# | async | sync |
---|---|---|
100 | 2.308958374 s | 2.207794832 s |
200 | 4.242482485 s | 3.747946110 s |
400 | 10.356814312 s | 6.574130764 s |
800 | 50.817692998 s | 11.663716923 s |
1600 | 466.833293529 s | 22.983728867 s |
As you can see, the numbers for 1x1600 files are very similar to the number for 405x4 files (first table), and the numbers for 1x400 files are very similar to 405x1 (second table).
This shows that it doesn't matter if we are always transpiling the same file or not: when compiling more than 400 files the async version becomes super slow. My guess is that it is caused by all the promises getting created, but I don't know neither how to verify it, neither if it can be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@loganfsmyth We can solve this big performance problem if we disallow the cache to be configured asynchronously. Synchronously configuring the cache with an async validator would still be allowed:
// waitFor transforms a "Promise<T> | T" into a "Gensync<T>"
makeWeakCache(function* (arg) {
// Allowed, because cache.using is called synchronously
const val = yield* waitFor(
cache.using(async () => {
await wait(100);
return process.env.FOO;
})
);
});
makeWeakCache(function* (arg) {
// Disallowed, because cache.using is called asynchronously
yield* waitFor(wait(100));
const val = cache.using(() => process.env.FOO);
});
In the future, this change will be observable to our users because they won't be able to use await
before configuring the cache.
In order to implement this behaviour I had to modify gensync
's internals, because I couldn't find a way "from the outside" of knowing if a generator has been suspended. I will open a PR soon because I edited it in my node_modules
and I fear to loose those changes, but if you have a better solution feel free to propose it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the new numbers with cache.never()
:
# | async | sync | sync master |
---|---|---|---|
100 | 2.219376366 s | 2.256803449 s | 2.143184428 s |
200 | 3.714403922 s | 3.641993385 s | 3.591130842 s |
400 | 6.338499966 s | 6.638144645 s | 6.299714871 s |
800 | 11.772867713 s | 12.198001927 s | 11.539516664 s |
1600 | 21.811404623 s | 22.632490371 s | 21.436522896 s |
cfe206d
to
f83fc29
Compare
Ready for review! |
Note that node 13.0.0 will be released soon (nodejs/node#29504), and v13.1.0 will have unflagged modules (https://twitter.com/jasnell/status/1184265267951493120?s=20), around the beginning of November. As soon as this PR is merged, I will open another PR to make it possible to use js config files in packages using es modules. |
f83fc29
to
494ae83
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely crafted!
43616c6
to
c762cdd
Compare
9be12dc
to
313a5a7
Compare
In this PR, I am refactoring
@babel/core
to usegensync
. It's a tool by @loganfsmyth which allows defining functions as generators which can then be run either synchronously or asynchronously, pushinc control to the edges of the program.For example, it allows doing this:
By doing so, we will be able to easily allow some parts of Babel to be asynchronous. I identified 4 possible parts:
.pre
functions of plugins.post
functions of pluginsbabel.config.json
)The latter is necessary because, starting from node v12.11.0, is a user has
"type": "module"
in theirpackage.json
we won't be able to loadbabel.config.js
usingrequire
but we'll need to useimport()
. (nodejs/modules#389)Since JavaScript doesn't have native algebraic effects[1], which would have allowed me to use
yield
only in those 4 places, we have to useyield*
in the whole call stack. The last two parts are deep inside@babel/core
, and that's whyyield*
spread almost everywhere.This PR already makes I/O actions asynchronous when using the async functions.
This PR aligns the API of all the functions which could interact with the disk: now
parse
,transform
,transformFile
,transformFromAst
,loadPartialConfig
andloadOptions
all have three version:fooSync
, which synchronously returns the result;fooAsync
, which returns a promise;foo
, which accepts a callback. From compatibility reasons, some of these behave synchronously when the callback is not provided.TODO:
[1] I learned exactly what they are while preparing this PR
Performance differences: Not significant.
In order to test this PR, I compiled every file inside
packages/*/src
(405 files) both synchronously and asynchronously, using the following script. It's always run usingnode demo.js --sync && node demo.js --sync && node demo.js --sync && node demo.js --sync && node demo.js --async && node demo.js --async && node demo.js --async && node demo.js --async
.Results
NODE_ENV
is always"test"
master
branch, withapi.cache.never()
:core-async
branch, withapi.cache.never()
:master
branch, withapi.cache.using(() => process.env.SYNC_OR_ASYNC)
:core-async
branch, withapi.cache.using(() => process.env.SYNC_OR_ASYNC)
:Conclusion: When running synchronously, the performance is almost the same. On the other hand, when running asynchronously this branch is consistently a bit slower than
master
. I think that it's becausegensync
waits one tick for each yielded value, the same asawait
behaves in async functions.