[DYOD] NUMA - Node stealing and Grouping changes #2608

Tratori · 2023-08-21T10:08:00Z

The Scheduler now queries the OS for the distance between its queries node ids and gives a prioritized queue id ordering based on that. This way, workers can now steal from "close" queues/nodes first.

The number of groups in the scheduler increased from 10 to 30, as we noticed increased performance from that.

To verify, a series of benchmarks were executed on RAPA, comparing the master with all changes included in this PR.

When executing TPC-H with multiple clients (60), increasing the number of groups from 10 to 30 did not seem to increase performance, as we could only witness a small increase in throughput at the cost of higher latency.

TPC-H Multiple Clients SF100


+Configuration Overview---------+------------------------------------------+--------------------------------------------------------+
| Parameter                     | hyrise_main_100_shuffled.json            | hyrise_numa-scheduler-changes_100_shuffled_actual.json |
+-------------------------------+------------------------------------------+--------------------------------------------------------+
|  GIT-HASH                     | a23585c4360efda5413d7dbce86b7b6f2292ed3d | 4add1a2af43690f743391daef713f3c836ae8aec               |
|  benchmark_mode               | Shuffled                                 | Shuffled                                               |
|  build_type                   | release                                  | release                                                |
|  chunk_indexes                | False                                    | False                                                  |
|  chunk_size                   | 65535                                    | 65535                                                  |
|  clients                      | 60                                       | 60                                                     |
|  clustering                   | None                                     | None                                                   |
|  compiler                     | gcc 9.2                                  | gcc 9.2                                                |
|  cores                        | 0                                        | 0                                                      |
|  data_preparation_cores       | 0                                        | 0                                                      |
|  date                         | 2023-08-23 11:29:32                      | 2023-08-24 11:24:03                                    |
|  encoding                     | {'default': {'encoding': 'Dictionary'}}  | {'default': {'encoding': 'Dictionary'}}                |
|  max_duration                 | 2400000000000                            | 2400000000000                                          |
|  max_runs                     | -1                                       | -1                                                     |
|  scale_factor                 | 100.0                                    | 100.0                                                  |
|  time_unit                    | ns                                       | ns                                                     |
|  use_prepared_statements      | False                                    | False                                                  |
|  using_scheduler              | True                                     | True                                                   |
|  utilized_cores_per_numa_node | [30, 30, 30, 30, 30, 30, 30, 30]         | [30, 30, 30, 30, 30, 30, 30, 30]                       |
|  verify                       | False                                    | False                                                  |
|  warmup_duration              | 0                                        | 0                                                      |
+-------------------------------+------------------------------------------+--------------------------------------------------------+

+----------++------------+------------+--------++----------+----------+--------+---------+
| Item     || Latency (ms/iter)       | Change || Throughput (iter/s) | Change | p-value |
|          ||        old |        new |        ||      old |      new |        |         |
+----------++------------+------------+--------++----------+----------+--------+---------+
| TPC-H 01 ||  156546.26 |  176553.72 |  +13%  ||     0.03 |     0.03 |   -3%  |  0.1801 |
| TPC-H 02 ||    5298.59 |   12016.36 | +127%  ||     0.03 |     0.03 |   -1%  |  0.1241 |
| TPC-H 03 ||  107776.55 |   77831.18 |  -28%  ||     0.03 |     0.03 |   -3%  |  0.1512 |
| TPC-H 04 ||   68373.22 |   52222.92 |  -24%  ||     0.03 |     0.03 |   +1%  |  0.2792 |
| TPC-H 05 ||  109252.25 |  121460.35 |  +11%  ||     0.03 |     0.03 |   -4%  |  0.3382 |
| TPC-H 06 ||    9519.80 |   11671.80 |  +23%  ||     0.03 |     0.03 |   +0%  |  0.3511 |
| TPC-H 07 ||   60634.40 |   91285.35 |  +51%  ||     0.03 |     0.03 |   -3%  |  0.1504 |
| TPC-H 08 ||   79717.51 |   44371.47 |  -44%  ||     0.03 |     0.03 |   -1%  |  0.0162 |
| TPC-H 09 ||  197933.08 |  162228.31 |  -18%  ||     0.03 |     0.03 |   -1%  |  0.0314 |
| TPC-H 10 ||  114584.12 |  161357.41 |  +41%  ||     0.03 |     0.03 |   -3%  |  0.0258 |
| TPC-H 11 ||   10749.21 |   12594.78 |  +17%  ||     0.03 |     0.03 |   -1%  |  0.6394 |
| TPC-H 12 ||   89155.66 |   63091.19 |  -29%  ||     0.03 |     0.03 |   -2%  |  0.1969 |
| TPC-H 13 ||  315728.94 |  414530.24 |  +31%  ||     0.03 |     0.02 |   -5%  |  0.0101 |
| TPC-H 14 ||   33363.36 |   27656.48 |  -17%  ||     0.03 |     0.03 |   +1%  |  0.5385 |
| TPC-H 15 ||   37367.25 |   36618.65 |   -2%  ||     0.03 |     0.03 |   -2%  |  0.9462 |
| TPC-H 16 ||   51007.80 |   55534.31 |   +9%  ||     0.03 |     0.03 |   +0%  |  0.6093 |
| TPC-H 17 ||   21529.45 |   34906.64 |  +62%  ||     0.03 |     0.03 |   +0%  |  0.4221 |
| TPC-H 18 ||  105816.19 |  128844.52 |  +22%  ||     0.03 |     0.03 |   -1%  |  0.1180 |
| TPC-H 19 ||   25147.21 |   19066.15 |  -24%  ||     0.03 |     0.03 |   -4%  |  0.2917 |
| TPC-H 20 ||   28826.88 |   44621.99 |  +55%  ||     0.03 |     0.03 |   -4%  |  0.0013 |
| TPC-H 21 ||  151217.67 |  114081.86 |  -25%  ||     0.03 |     0.03 |   +1%  |  0.0545 |
| TPC-H 22 ||   20178.14 |   19254.20 |   -5%  ||     0.03 |     0.03 |   +0%  |  0.7888 |
+----------++------------+------------+--------++----------+----------+--------+---------+
| Sum      || 1799723.54 | 1881799.90 |   +5%  ||          |          |        |         |
| Geomean  ||            |            |        ||          |          |   -2%  |         |
+----------++------------+------------+--------++----------+----------+--------+---------+

Executing with a single client, increasing the number of groups from 10 to 30, lead to a significant increase in performance:

TPC-H Single Client SF10

+Configuration Overview---------+------------------------------------------+------------------------------------------+
| Parameter                     | hyrise_main__scale_10_single_client.json | numa_merged_scale_10_single_client.json  |
+-------------------------------+------------------------------------------+------------------------------------------+
|  GIT-HASH                     | a23585c4360efda5413d7dbce86b7b6f2292ed3d | 0bf7270dda8f07660b3a16e712a789cc5be1b2a8 |
|  benchmark_mode               | Ordered                                  | Ordered                                  |
|  build_type                   | release                                  | release                                  |
|  chunk_indexes                | False                                    | False                                    |
|  chunk_size                   | 65535                                    | 65535                                    |
|  clients                      | 1                                        | 1                                        |
|  clustering                   | None                                     | None                                     |
|  compiler                     | gcc 9.2                                  | gcc 9.2                                  |
|  cores                        | 0                                        | 0                                        |
|  data_preparation_cores       | 0                                        | 0                                        |
|  date                         | 2023-09-05 14:56:55                      | 2023-09-05 12:29:19                      |
|  encoding                     | {'default': {'encoding': 'Dictionary'}}  | {'default': {'encoding': 'Dictionary'}}  |
|  max_duration                 | 90000000000                              | 90000000000                              |
|  max_runs                     | -1                                       | -1                                       |
|  scale_factor                 | 10.0                                     | 10.0                                     |
|  time_unit                    | ns                                       | ns                                       |
|  use_prepared_statements      | False                                    | False                                    |
|  using_scheduler              | True                                     | True                                     |
|  utilized_cores_per_numa_node | [30, 30, 30, 30, 30, 30, 30, 30]         | [30, 30, 30, 30, 30, 30, 30, 30]         |
|  verify                       | False                                    | False                                    |
|  warmup_duration              | 0                                        | 0                                        |
+-------------------------------+------------------------------------------+------------------------------------------+

+----------++----------+----------+--------++----------+----------+--------+---------+
| Item     || Latency (ms/iter)   | Change || Throughput (iter/s) | Change | p-value |
|          ||      old |      new |        ||      old |      new |        |         |
+----------++----------+----------+--------++----------+----------+--------+---------+
| TPC-H 01 ||  5706.79 |  5786.96 |   +1%  ||     0.17 |     0.17 |   +0%  |  0.0509 |
| TPC-H 02 ||    36.38 |    43.65 |  +20%  ||    24.42 |    19.91 |  -18%  |  0.0000 |
| TPC-H 03 ||  1107.88 |   973.94 |  -12%  ||     0.89 |     1.01 |  +14%  |  0.0000 |
| TPC-H 04 ||   681.98 |   581.63 |  -15%  ||     1.44 |     1.70 |  +18%  |  0.0000 |
| TPC-H 05 ||  1019.06 |   873.92 |  -14%  ||     0.97 |     1.13 |  +17%  |  0.0000 |
| TPC-H 06 ||    67.10 |    68.52 |   +2%  ||    13.69 |    13.62 |   -0%  |  0.0000 |
| TPC-H 07 ||   353.83 |   350.59 |   -1%  ||     2.78 |     2.81 |   +1%  |  0.0433 |
| TPC-H 08 ||   389.17 |   321.71 |  -17%  ||     2.53 |     3.06 |  +21%  |  0.0000 |
| TPC-H 09 ||  2900.34 |  2608.08 |  -10%  ||     0.33 |     0.38 |  +13%  |  0.0000 |
| TPC-H 10 ||  1882.20 |  1379.68 |  -27%  ||     0.52 |     0.72 |  +38%  |  0.0000 |
| TPC-H 11 ||    91.82 |    95.40 |   +4%  ||    10.29 |     9.98 |   -3%  |  0.0000 |
| TPC-H 12 ||   593.50 |   522.10 |  -12%  ||     1.67 |     1.89 |  +13%  |  0.0000 |
| TPC-H 13 ||  7859.61 |  6378.12 |  -19%  ||     0.12 |     0.16 |  +27%  |  0.0000 |
| TPC-H 14 ||   160.59 |   156.24 |   -3%  ||     6.02 |     6.19 |   +3%  |  0.0000 |
| TPC-H 15 ||   291.67 |   228.18 |  -22%  ||     3.37 |     4.28 |  +27%  |  0.0000 |
| TPC-H 16 ||   751.56 |   754.17 |   +0%  ||     1.31 |     1.31 |   +0%  |  0.4523 |
| TPC-H 17 ||   125.54 |    69.99 |  -44%  ||     7.64 |    13.33 |  +74%  |  0.0000 |
| TPC-H 18 ||  3546.77 |  3411.68 |   -4%  ||     0.28 |     0.29 |   +4%  |  0.0530 |
| TPC-H 19 ||   159.77 |   106.72 |  -33%  ||     6.07 |     8.92 |  +47%  |  0.0000 |
| TPC-H 20 ||   245.93 |   194.11 |  -21%  ||     3.98 |     5.01 |  +26%  |  0.0000 |
| TPC-H 21 ||  1362.45 |  1254.41 |   -8%  ||     0.72 |     0.79 |   +9%  |  0.0000 |
| TPC-H 22 ||   209.45 |   143.21 |  -32%  ||     4.66 |     6.72 |  +44%  |  0.0000 |
+----------++----------+----------+--------++----------+----------+--------+---------+
| Sum      || 29543.39 | 26303.00 |  -11%  ||          |          |        |         |
| Geomean  ||          |          |        ||          |          |  +15%  |         |
+----------++----------+----------+--------++----------+----------+--------+---------+

To determine the origin of this performance improvement (either grouping or stealing changes), we benchmarked both changes isolated:

TPC-H Single Client SF10 GROUPING CHANGES

+Configuration Overview---------+---------------------------------------------+------------------------------------------------+
| Parameter                     | hyrise_main_scale_10_single_client_new.json | grouping_30_scale_10_single_client_new.json    |
+-------------------------------+---------------------------------------------+------------------------------------------------+
|  GIT-HASH                     | a23585c4360efda5413d7dbce86b7b6f2292ed3d    | a23585c4360efda5413d7dbce86b7b6f2292ed3d-dirty |
|  benchmark_mode               | Ordered                                     | Ordered                                        |
|  build_type                   | release                                     | release                                        |
|  chunk_indexes                | False                                       | False                                          |
|  chunk_size                   | 65535                                       | 65535                                          |
|  clients                      | 1                                           | 1                                              |
|  clustering                   | None                                        | None                                           |
|  compiler                     | gcc 9.2                                     | gcc 9.2                                        |
|  cores                        | 0                                           | 0                                              |
|  data_preparation_cores       | 0                                           | 0                                              |
|  date                         | 2023-09-11 16:18:38                         | 2023-09-11 15:41:26                            |
|  encoding                     | {'default': {'encoding': 'Dictionary'}}     | {'default': {'encoding': 'Dictionary'}}        |
|  max_duration                 | 90000000000                                 | 90000000000                                    |
|  max_runs                     | -1                                          | -1                                             |
|  scale_factor                 | 10.0                                        | 10.0                                           |
|  time_unit                    | ns                                          | ns                                             |
|  use_prepared_statements      | False                                       | False                                          |
|  using_scheduler              | True                                        | True                                           |
|  utilized_cores_per_numa_node | [30, 30, 30, 30, 30, 30, 30, 30]            | [30, 30, 30, 30, 30, 30, 30, 30]               |
|  verify                       | False                                       | False                                          |
|  warmup_duration              | 0                                           | 0                                              |
+-------------------------------+---------------------------------------------+------------------------------------------------+

+----------++----------+----------+--------++----------+----------+--------+---------+
| Item     || Latency (ms/iter)   | Change || Throughput (iter/s) | Change | p-value |
|          ||      old |      new |        ||      old |      new |        |         |
+----------++----------+----------+--------++----------+----------+--------+---------+
| TPC-H 01 ||  5749.97 |  5607.62 |   -2%  ||     0.17 |     0.17 |   -0%  |  0.0007 |
| TPC-H 02 ||    37.04 |    44.97 |  +21%  ||    24.14 |    19.44 |  -19%  |  0.0000 |
| TPC-H 03 ||  1167.85 |  1096.28 |   -6%  ||     0.84 |     0.90 |   +7%  |  0.0000 |
| TPC-H 04 ||   720.16 |   650.70 |  -10%  ||     1.38 |     1.52 |  +10%  |  0.0000 |
| TPC-H 05 ||  1130.17 |   968.87 |  -14%  ||     0.88 |     1.02 |  +16%  |  0.0000 |
| TPC-H 06 ||    78.96 |    45.65 |  -42%  ||    11.90 |    19.63 |  +65%  |  0.0000 |
| TPC-H 07 ||   402.88 |   387.25 |   -4%  ||     2.44 |     2.54 |   +4%  |  0.0000 |
| TPC-H 08 ||   434.95 |   342.42 |  -21%  ||     2.27 |     2.87 |  +26%  |  0.0000 |
| TPC-H 09 ||  3505.42 |  3461.48 |   -1%  ||     0.28 |     0.28 |   -0%  |  0.3708 |
| TPC-H 10 ||  2293.74 |  2253.41 |   -2%  ||     0.43 |     0.43 |   -0%  |  0.2016 |
| TPC-H 11 ||    98.18 |   115.94 |  +18%  ||     9.68 |     8.26 |  -15%  |  0.0000 |
| TPC-H 12 ||   707.09 |   640.87 |   -9%  ||     1.40 |     1.54 |  +10%  |  0.0000 |
| TPC-H 13 ||  7954.08 |  8071.47 |   +1%  ||     0.12 |     0.12 |   +0%  |  0.0000 |
| TPC-H 14 ||   202.82 |   175.20 |  -14%  ||     4.80 |     5.54 |  +16%  |  0.0000 |
| TPC-H 15 ||   437.42 |   341.53 |  -22%  ||     2.26 |     2.88 |  +28%  |  0.0000 |
| TPC-H 16 ||   837.68 |  1047.12 |  +25%  ||     1.18 |     0.94 |  -20%  |  0.0000 |
| TPC-H 17 ||   166.42 |    95.89 |  -42%  ||     5.82 |     9.89 |  +70%  |  0.0000 |
| TPC-H 18 ||  3418.65 |  3360.40 |   -2%  ||     0.29 |     0.29 |   +0%  |  0.2120 |
| TPC-H 19 ||   190.81 |   119.61 |  -37%  ||     5.11 |     8.09 |  +58%  |  0.0000 |
| TPC-H 20 ||   315.13 |   224.49 |  -29%  ||     3.11 |     4.34 |  +40%  |  0.0000 |
| TPC-H 21 ||  1610.10 |  1202.77 |  -25%  ||     0.61 |     0.82 |  +35%  |  0.0000 |
| TPC-H 22 ||   248.82 |   163.62 |  -34%  ||     3.93 |     5.90 |  +50%  |  0.0000 |
+----------++----------+----------+--------++----------+----------+--------+---------+
| Sum      || 31708.34 | 30417.55 |   -4%  ||          |          |        |         |
| Geomean  ||          |          |        ||          |          |  +15%  |         |
+----------++----------+----------+--------++----------+----------+--------+---------+

TPC-H Single Client SF10 STEALING CHANGES

+Configuration Overview---------+---------------------------------------------+------------------------------------------------------+
| Parameter                     | hyrise_main_scale_10_single_client_new.json | prioritized_stealing_scale_10_single_client_new.json |
+-------------------------------+---------------------------------------------+------------------------------------------------------+
|  GIT-HASH                     | a23585c4360efda5413d7dbce86b7b6f2292ed3d    | 770b8bfaa1465f40ede35e8f626e8d39f77659d0-dirty       |
|  benchmark_mode               | Ordered                                     | Ordered                                              |
|  build_type                   | release                                     | release                                              |
|  chunk_indexes                | False                                       | False                                                |
|  chunk_size                   | 65535                                       | 65535                                                |
|  clients                      | 1                                           | 1                                                    |
|  clustering                   | None                                        | None                                                 |
|  compiler                     | gcc 9.2                                     | gcc 9.2                                              |
|  cores                        | 0                                           | 0                                                    |
|  data_preparation_cores       | 0                                           | 0                                                    |
|  date                         | 2023-09-11 16:18:38                         | 2023-09-11 15:01:49                                  |
|  encoding                     | {'default': {'encoding': 'Dictionary'}}     | {'default': {'encoding': 'Dictionary'}}              |
|  max_duration                 | 90000000000                                 | 90000000000                                          |
|  max_runs                     | -1                                          | -1                                                   |
|  scale_factor                 | 10.0                                        | 10.0                                                 |
|  time_unit                    | ns                                          | ns                                                   |
|  use_prepared_statements      | False                                       | False                                                |
|  using_scheduler              | True                                        | True                                                 |
|  utilized_cores_per_numa_node | [30, 30, 30, 30, 30, 30, 30, 30]            | [30, 30, 30, 30, 30, 30, 30, 30]                     |
|  verify                       | False                                       | False                                                |
|  warmup_duration              | 0                                           | 0                                                    |
+-------------------------------+---------------------------------------------+------------------------------------------------------+

+----------++----------+----------+--------++----------+----------+--------+---------+
| Item     || Latency (ms/iter)   | Change || Throughput (iter/s) | Change | p-value |
|          ||      old |      new |        ||      old |      new |        |         |
+----------++----------+----------+--------++----------+----------+--------+---------+
| TPC-H 01 ||  5749.97 |  5791.88 |   +1%  ||     0.17 |     0.17 |   -0%  |  0.6112 |
| TPC-H 02 ||    37.04 |    38.41 |   +4%  ||    24.14 |    23.29 |   -4%  |  0.0000 |
| TPC-H 03 ||  1167.85 |  1195.16 |   +2%  ||     0.84 |     0.83 |   -1%  |  0.0004 |
| TPC-H 04 ||   720.16 |   737.17 |   +2%  ||     1.38 |     1.34 |   -2%  |  0.0150 |
| TPC-H 05 ||  1130.17 |  1151.45 |   +2%  ||     0.88 |     0.86 |   -3%  |  0.0414 |
| TPC-H 06 ||    78.96 |    79.55 |   +1%  ||    11.90 |    11.89 |   -0%  |  0.0046 |
| TPC-H 07 ||   402.88 |   409.63 |   +2%  ||     2.44 |     2.41 |   -1%  |  0.0003 |
| TPC-H 08 ||   434.95 |   444.51 |   +2%  ||     2.27 |     2.22 |   -2%  |  0.0000 |
| TPC-H 09 ||  3505.42 |  3478.86 |   -1%  ||     0.28 |     0.28 |   -0%  |  0.5473 |
| TPC-H 10 ||  2293.74 |  2933.05 |  +28%  ||     0.43 |     0.33 |  -23%  |  0.0000 |
| TPC-H 11 ||    98.18 |    95.37 |   -3%  ||     9.68 |     9.93 |   +3%  |  0.0000 |
| TPC-H 12 ||   707.09 |   738.83 |   +4%  ||     1.40 |     1.34 |   -4%  |  0.0000 |
| TPC-H 13 ||  7954.08 |  7821.55 |   -2%  ||     0.12 |     0.12 |   +0%  |  0.0001 |
| TPC-H 14 ||   202.82 |   203.32 |   +0%  ||     4.80 |     4.79 |   -0%  |  0.3841 |
| TPC-H 15 ||   437.42 |   441.17 |   +1%  ||     2.26 |     2.23 |   -1%  |  0.3888 |
| TPC-H 16 ||   837.68 |   729.15 |  -13%  ||     1.18 |     1.36 |  +15%  |  0.0000 |
| TPC-H 17 ||   166.42 |   172.14 |   +3%  ||     5.82 |     5.62 |   -3%  |  0.0000 |
| TPC-H 18 ||  3418.65 |  3441.60 |   +1%  ||     0.29 |     0.29 |   -0%  |  0.7076 |
| TPC-H 19 ||   190.81 |   196.40 |   +3%  ||     5.11 |     4.96 |   -3%  |  0.0000 |
| TPC-H 20 ||   315.13 |   319.82 |   +1%  ||     3.11 |     3.07 |   -1%  |  0.0000 |
| TPC-H 21 ||  1610.10 |  1629.73 |   +1%  ||     0.61 |     0.61 |   +0%  |  0.1320 |
| TPC-H 22 ||   248.82 |   300.17 |  +21%  ||     3.93 |     3.27 |  -17%  |  0.0000 |
+----------++----------+----------+--------++----------+----------+--------+---------+
| Sum      || 31708.34 | 32348.93 |   +2%  ||          |          |        |         |
| Geomean  ||          |          |        ||          |          |   -2%  |         |
+----------++----------+----------+--------++----------+----------+--------+---------+

As you can see, the impact of stealing tasks from closer nodes first is negatable. This is likely due to the fact, that right now, there is no smart scheduling applied to the tasks, so they are run on random nodes. As soon as this changes, we might see an increase in performance, though.

Thus, increasing to the group size from 10 to 30 seems to be the single source of performance increases.

Executing TPC-H SF1 on Rapa, with all new changes on a single numa node, showed slight performance increases

TPC-H SF1 single NUMA node


+Configuration Overview---------+----------------------------------------------------+----------------------------------------------------+
| Parameter                     | hyrise_main_scale_1_single_client_single_node.json | numa_merged_scale_1_single_client_single_node.json |
+-------------------------------+----------------------------------------------------+----------------------------------------------------+
|  GIT-HASH                     | a23585c4360efda5413d7dbce86b7b6f2292ed3d           | 770b8bfaa1465f40ede35e8f626e8d39f77659d0           |
|  benchmark_mode               | Ordered                                            | Ordered                                            |
|  build_type                   | release                                            | release                                            |
|  chunk_indexes                | False                                              | False                                              |
|  chunk_size                   | 65535                                              | 65535                                              |
|  clients                      | 1                                                  | 1                                                  |
|  clustering                   | None                                               | None                                               |
|  compiler                     | gcc 9.2                                            | gcc 9.2                                            |
|  cores                        | 0                                                  | 0                                                  |
|  data_preparation_cores       | 0                                                  | 0                                                  |
|  date                         | 2023-09-07 14:10:13                                | 2023-09-07 16:26:36                                |
|  encoding                     | {'default': {'encoding': 'Dictionary'}}            | {'default': {'encoding': 'Dictionary'}}            |
|  max_duration                 | 60000000000                                        | 60000000000                                        |
|  max_runs                     | -1                                                 | -1                                                 |
|  scale_factor                 | 1.0                                                | 1.0                                                |
|  time_unit                    | ns                                                 | ns                                                 |
|  use_prepared_statements      | False                                              | False                                              |
|  using_scheduler              | True                                               | True                                               |
|  utilized_cores_per_numa_node | [30, 0, 0, 0, 0, 0, 0, 0]                          | [30, 0, 0, 0, 0, 0, 0, 0]                          |
|  verify                       | False                                              | False                                              |
|  warmup_duration              | 0                                                  | 0                                                  |
+-------------------------------+----------------------------------------------------+----------------------------------------------------+

+----------++----------+---------+--------++----------+----------+--------+---------+
| Item     || Latency (ms/iter)  | Change || Throughput (iter/s) | Change | p-value |
|          ||      old |     new |        ||      old |      new |        |         |
+----------++----------+---------+--------++----------+----------+--------+---------+
| TPC-H 01 ||   557.84 |  554.21 |   -1%  ||     1.77 |     1.78 |   +1%  |  0.2282 |
| TPC-H 02 ||     7.46 |    7.58 |   +2%  ||    77.78 |    77.43 |   -0%  |  0.3126 |
| TPC-H 03 ||    50.64 |   45.73 |  -10%  ||    18.31 |    19.48 |   +6%  |  0.0000 |
| TPC-H 04 ||    27.99 |   26.88 |   -4%  ||    32.78 |    33.00 |   +1%  |  0.0000 |
| TPC-H 05 ||   116.53 |  115.62 |   -1%  ||     8.20 |     8.28 |   +1%  |  0.1659 |
| TPC-H 06 ||     6.84 |    4.66 |  -32%  ||    98.46 |    98.44 |   -0%  |  0.0000 |
| TPC-H 07 ||    47.36 |   49.15 |   +4%  ||    18.40 |    18.13 |   -1%  |  0.0097 |
| TPC-H 08 ||    53.05 |   51.24 |   -3%  ||    17.60 |    17.56 |   -0%  |  0.0001 |
| TPC-H 09 ||   197.47 |  194.79 |   -1%  ||     4.93 |     5.00 |   +1%  |  0.0681 |
| TPC-H 10 ||    97.73 |   99.96 |   +2%  ||     9.72 |     9.50 |   -2%  |  0.0000 |
| TPC-H 11 ||     7.49 |    7.23 |   -4%  ||    97.26 |    96.96 |   -0%  |  0.0000 |
| TPC-H 12 ||    39.07 |   33.46 |  -14%  ||    23.83 |    24.77 |   +4%  |  0.0000 |
| TPC-H 13 ||   202.91 |  188.27 |   -7%  ||     4.80 |     5.18 |   +8%  |  0.0000 |
| TPC-H 14 ||    26.66 |   26.10 |   -2%  ||    32.34 |    32.58 |   +1%  |  0.0000 |
| TPC-H 15 ||    17.30 |   15.88 |   -8%  ||    49.39 |    49.45 |   +0%  |  0.0000 |
| TPC-H 16 ||    79.49 |   77.71 |   -2%  ||    11.95 |    12.13 |   +2%  |  0.0000 |
| TPC-H 17 ||    10.89 |    9.72 |  -11%  ||    69.34 |    80.04 |  +15%  |  0.0000 |
| TPC-H 18 ||   280.66 |  281.24 |   +0%  ||     3.48 |     3.48 |   +0%  |  0.7321 |
| TPC-H 19 ||    25.21 |   24.38 |   -3%  ||    33.02 |    33.00 |   -0%  |  0.0000 |
| TPC-H 20 ||    27.59 |   25.27 |   -8%  ||    32.75 |    34.03 |   +4%  |  0.0000 |
| TPC-H 21 ||   112.49 |  107.86 |   -4%  ||     8.52 |     8.85 |   +4%  |  0.0174 |
| TPC-H 22 ||    28.83 |   28.29 |   -2%  ||    30.62 |    30.98 |   +1%  |  0.0000 |
+----------++----------+---------+--------++----------+----------+--------+---------+
| Sum      ||  2021.51 | 1975.23 |   -2%  ||          |          |        |         |
| Geomean  ||          |         |        ||          |          |   +2%  |         |
+----------++----------+---------+--------++----------+----------+--------+---------+

…yrise#2606) Fixes performance and optimizer breakdown plots for cached queries.

…etrics` (hyrise#2606)" This reverts commit 08b9cc0.

Bouncner · 2023-08-31T22:18:01Z

The number of groups in the scheduler increased from 10 to 30, as we noticed increased performance from that.

Contrary to what we noticed during development, increasing the number of groups from 10 to 30 did not seem to increase performance, as we could only witness a small increase in throughput at the cost of higher latency.

So why change it?

What did you actually benchmark there? What about single-client? Did you measure "both main changes" together?
Can you also run a SF 1 run with a few cores on a single NUMA node, please?

Tratori · 2023-09-07T14:56:37Z

The number of groups in the scheduler increased from 10 to 30, as we noticed increased performance from that.
Contrary to what we noticed during development, increasing the number of groups from 10 to 30 did not seem to increase performance, as we could only witness a small increase in throughput at the cost of higher latency.

So why change it?

What did you actually benchmark there? What about single-client? Did you measure "both main changes" together? Can you also run a SF 1 run with a few cores on a single NUMA node, please?

Updated the PR description, including the requested benchmarks.
When using a single client performance improvements are visible.

Bouncner · 2023-09-07T20:42:36Z

So why change it?

Coming back to this question. If the benchmarks did not show a positive effect of 30, why still change it from 10 to 30?

Tratori · 2023-09-08T09:38:08Z

So why change it?

Coming back to this question. If the benchmarks did not show a positive effect of 30, why still change it from 10 to 30?

In the single-client mode, we can see an increase of 10 - 15 % in latency and throughput.
If that is not enough, I would suggest dropping the grouping changes (especially considering the planned dynamic changing of that parameter).

Bouncner · 2023-09-08T09:42:53Z

But there are no benchmarks for this change, right? I only see the benchmark for all three changes together.

Tratori · 2023-09-08T09:44:27Z

But there are no benchmarks for this change, right? I only see the benchmark for all three changes together.

Fair, will benchmark this isolated.

Tratori · 2023-09-12T08:41:10Z

But there are no benchmarks for this change, right? I only see the benchmark for all three changes together.

Added new benchmarks, measuring the changes isolated.

Bouncner · 2023-09-13T14:01:43Z

Why did you use a scale factor og 10? The working set almost fits into the caches.

pick changes from numa work

4add1a2

rrcomtech added the FullCI Run all CI tests (slow, but required for merge) label Aug 24, 2023

dey4ss and others added 5 commits August 24, 2023 12:15

Switch off plan caching for benchmark runs with --pipeline_metrics (h…

08b9cc0

…yrise#2606) Fixes performance and optimizer breakdown plots for cached queries.

linting

0e66578

Revert "Switch off plan caching for benchmark runs with `--pipeline_m…

802c614

…etrics` (hyrise#2606)" This reverts commit 08b9cc0.

guard numa include with HYRISE_NUMA_SUPPORT

1a4b260

Trigger CI

770b8bf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DYOD] NUMA - Node stealing and Grouping changes #2608

[DYOD] NUMA - Node stealing and Grouping changes #2608

Tratori commented Aug 21, 2023 •

edited

Bouncner commented Aug 31, 2023

Tratori commented Sep 7, 2023

Bouncner commented Sep 7, 2023

Tratori commented Sep 8, 2023

Bouncner commented Sep 8, 2023

Tratori commented Sep 8, 2023

Tratori commented Sep 12, 2023

Bouncner commented Sep 13, 2023

[DYOD] NUMA - Node stealing and Grouping changes #2608

Are you sure you want to change the base?

[DYOD] NUMA - Node stealing and Grouping changes #2608

Conversation

Tratori commented Aug 21, 2023 • edited

Bouncner commented Aug 31, 2023

Tratori commented Sep 7, 2023

Bouncner commented Sep 7, 2023

Tratori commented Sep 8, 2023

Bouncner commented Sep 8, 2023

Tratori commented Sep 8, 2023

Tratori commented Sep 12, 2023

Bouncner commented Sep 13, 2023

Tratori commented Aug 21, 2023 •

edited