Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isolates scaling problem #55713

Open
jensjoha opened this issue May 14, 2024 · 2 comments
Open

Isolates scaling problem #55713

jensjoha opened this issue May 14, 2024 · 2 comments
Assignees
Labels
area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. library-isolate P2 A bug or feature request we're likely to work on triaged Issue has been triaged by sub team type-performance Issue relates to performance or code size

Comments

@jensjoha
Copy link
Contributor

On the CFE team we have quite a bit of "suites", to speed things up (and integrate with the approval system and whatnot) we have a "unit test suites" runner which launches the suites in isolates (using #cores-1 isolates).

My machine has 12 cores (with hyperthreading) and so it launches 11 isolates. According to htop though it only uses 4-500% CPU though. Prepending a run with time gives this:

(Runs on f774ec5)

$ time out/ReleaseX64/dart pkg/front_end/test/unit_test_suites.dart --skipTestsThatRequireGit
[...]
real    10m34.861s
user    41m52.658s
sys     10m22.388s

Using (user+sys)/real this becomes ~494% CPU.

If instead launching two processes, asking each to do half and only use 5 isolates each:

time out/ReleaseX64/dart pkg/front_end/test/unit_test_suites.dart --skipTestsThatRequireGit -j5 --shards 2 --shard 1
time out/ReleaseX64/dart pkg/front_end/test/unit_test_suites.dart --skipTestsThatRequireGit -j5 --shards 2 --shard 2

(run in two different terminals at the same time)

I get these numbers:

[...]
real    5m31.369s
user    18m8.244s
sys     2m56.053s

Using (user+sys)/real this becomes ~381% CPU.

and

[...]
real    5m39.273s
user    18m16.937s
sys     2m54.928s

Using (user+sys)/real this becomes ~374% CPU.

Adding up these numbers it becomes ~755% CPU (or ~747% if doing (user1+sys1+user2+sys2)/max(real1, real2)) --- in both cases quite a bit more than the <500% (more than 50% more actually).
Looking at the (real) runtime it's almost twice as fast (~87% faster, taking ~53.5% of the time).

So: Are isolates expected to be slower, is it being held wrong or is something else up?

@mraleph suggested that "the answer is probably our GC does not scale well for this usecase" and suggested I file a bug, so here we are.

@lrhn lrhn added area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. library-isolate type-performance Issue relates to performance or code size labels May 14, 2024
@aam aam added the triaged Issue has been triaged by sub team label May 15, 2024
@aam aam added the P2 A bug or feature request we're likely to work on label May 15, 2024
@rmacnak-google
Copy link
Contributor

While the GC could do better here, I think this is mostly from uneven work distribution across the isolates. I noticed the timeline often had fewer open HandleMessage events than the number of workers, especially toward the end. Here I've grouped the events by isolate instead of thread. (-j12, also --shards 4 --shard 1 because otherwise the trace is too big for Catapult to handle).

Screenshot from 2024-05-28 14-34-14
Screenshot from 2024-05-28 14-34-28

--- a/pkg/front_end/test/unit_test_suites.dart
+++ b/pkg/front_end/test/unit_test_suites.dart
@@ -6,6 +6,7 @@ import 'dart:async' show Timer;
 import 'dart:convert' show jsonEncode;
 import 'dart:io' show File, Platform, exitCode;
 import 'dart:isolate' show Isolate, ReceivePort, SendPort;
+import 'dart:developer';
 
 import 'package:args/args.dart' show ArgParser;
 import 'package:testing/src/chain.dart' show CreateContext, Result, Step;
@@ -502,9 +503,11 @@ class SuiteConfiguration {
 }
 
 Future<void> runSuite(SuiteConfiguration configuration) async {
+  var t = new TimelineTask();
   Suite suite = configuration.suite;
   String name = suite.prefix;
   String fullSuiteName = "$suiteNamePrefix/$name";
+  t.start(fullSuiteName);
   Uri suiteUri = Platform.script.resolve(suite.path ?? "${name}_suite.dart");
   if (!new File.fromUri(suiteUri).existsSync()) {
     throw "File doesn't exist: $suiteUri";
@@ -528,6 +531,7 @@ Future<void> runSuite(SuiteConfiguration configuration) async {
     shards: suite.shardCount,
     shard: configuration.shard,
   );
+  t.finish();
   if (logger.gotFrameworkError) {
     throw "Got framework error!";
   }

@jensjoha
Copy link
Contributor Author

Let's try something slightly different to make it more apples to apples:

import "dart:isolate";

import "unit_test_suites.dart" as suite;

void entry(List<String> args) {
  suite.main(args);
}

Future<void> main() async {
  ReceivePort exitPort1 = new ReceivePort();
  await Isolate.spawn(entry,
      ["--skipTestsThatRequireGit", "-j5", "--shards", "2", "--shard", "1"],
      onExit: exitPort1.sendPort);
  ReceivePort exitPort2 = new ReceivePort();
  await Isolate.spawn(entry,
      ["--skipTestsThatRequireGit", "-j5", "--shards", "2", "--shard", "2"],
      onExit: exitPort2.sendPort);
  await exitPort1.first;
  await exitPort2.first;
  print("done");
}

vs

time out/ReleaseX64/dart pkg/front_end/test/unit_test_suites.dart --skipTestsThatRequireGit -j5 --shards 2 --shard 1
time out/ReleaseX64/dart pkg/front_end/test/unit_test_suites.dart --skipTestsThatRequireGit -j5 --shards 2 --shard 2

(run in two different terminals at the same time)

This should do the exact same thing, with the exact same work distribution, just using either isolates or processes.

Using the isolates approach I get

[...]
Entire run took 0:07:25.244488.
[...]
Entire run took 0:07:25.573479.
done

real    7m38.368s
user    27m46.395s
sys     8m43.946s

So it finished in 7 minutes 38 seconds real runtime, using ~478% CPU.

Running in processes I get

[...]
Entire run took 0:03:50.552587.

real    4m4.012s
user    12m31.571s
sys     2m36.091s

and

[...]
Entire run took 0:04:09.658705.

real    4m23.415s
user    13m25.618s
sys     2m41.306s

So it finished in 4 minutes 23 seconds real runtime, using ~712% CPU.
That's ~48% more CPU usage and finishing in <58% of the time (being more than 74% faster.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. library-isolate P2 A bug or feature request we're likely to work on triaged Issue has been triaged by sub team type-performance Issue relates to performance or code size
Projects
None yet
Development

No branches or pull requests

4 participants