Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DYOD] NUMA - Node stealing and Grouping changes #2608

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Tratori
Copy link
Contributor

@Tratori Tratori commented Aug 21, 2023

The Scheduler now queries the OS for the distance between its queries node ids and gives a prioritized queue id ordering based on that. This way, workers can now steal from "close" queues/nodes first.

The number of groups in the scheduler increased from 10 to 30, as we noticed increased performance from that.

To verify, a series of benchmarks were executed on RAPA, comparing the master with all changes included in this PR.

When executing TPC-H with multiple clients (60), increasing the number of groups from 10 to 30 did not seem to increase performance, as we could only witness a small increase in throughput at the cost of higher latency.

TPC-H Multiple Clients SF100

+Configuration Overview---------+------------------------------------------+--------------------------------------------------------+
| Parameter                     | hyrise_main_100_shuffled.json            | hyrise_numa-scheduler-changes_100_shuffled_actual.json |
+-------------------------------+------------------------------------------+--------------------------------------------------------+
|  GIT-HASH                     | a23585c4360efda5413d7dbce86b7b6f2292ed3d | 4add1a2af43690f743391daef713f3c836ae8aec               |
|  benchmark_mode               | Shuffled                                 | Shuffled                                               |
|  build_type                   | release                                  | release                                                |
|  chunk_indexes                | False                                    | False                                                  |
|  chunk_size                   | 65535                                    | 65535                                                  |
|  clients                      | 60                                       | 60                                                     |
|  clustering                   | None                                     | None                                                   |
|  compiler                     | gcc 9.2                                  | gcc 9.2                                                |
|  cores                        | 0                                        | 0                                                      |
|  data_preparation_cores       | 0                                        | 0                                                      |
|  date                         | 2023-08-23 11:29:32                      | 2023-08-24 11:24:03                                    |
|  encoding                     | {'default': {'encoding': 'Dictionary'}}  | {'default': {'encoding': 'Dictionary'}}                |
|  max_duration                 | 2400000000000                            | 2400000000000                                          |
|  max_runs                     | -1                                       | -1                                                     |
|  scale_factor                 | 100.0                                    | 100.0                                                  |
|  time_unit                    | ns                                       | ns                                                     |
|  use_prepared_statements      | False                                    | False                                                  |
|  using_scheduler              | True                                     | True                                                   |
|  utilized_cores_per_numa_node | [30, 30, 30, 30, 30, 30, 30, 30]         | [30, 30, 30, 30, 30, 30, 30, 30]                       |
|  verify                       | False                                    | False                                                  |
|  warmup_duration              | 0                                        | 0                                                      |
+-------------------------------+------------------------------------------+--------------------------------------------------------+

+----------++------------+------------+--------++----------+----------+--------+---------+
| Item     || Latency (ms/iter)       | Change || Throughput (iter/s) | Change | p-value |
|          ||        old |        new |        ||      old |      new |        |         |
+----------++------------+------------+--------++----------+----------+--------+---------+
| TPC-H 01 ||  156546.26 |  176553.72 |  +13%  ||     0.03 |     0.03 |   -3%  |  0.1801 |
| TPC-H 02 ||    5298.59 |   12016.36 | +127%  ||     0.03 |     0.03 |   -1%  |  0.1241 |
| TPC-H 03 ||  107776.55 |   77831.18 |  -28%  ||     0.03 |     0.03 |   -3%  |  0.1512 |
| TPC-H 04 ||   68373.22 |   52222.92 |  -24%  ||     0.03 |     0.03 |   +1%  |  0.2792 |
| TPC-H 05 ||  109252.25 |  121460.35 |  +11%  ||     0.03 |     0.03 |   -4%  |  0.3382 |
| TPC-H 06 ||    9519.80 |   11671.80 |  +23%  ||     0.03 |     0.03 |   +0%  |  0.3511 |
| TPC-H 07 ||   60634.40 |   91285.35 |  +51%  ||     0.03 |     0.03 |   -3%  |  0.1504 |
| TPC-H 08 ||   79717.51 |   44371.47 |  -44%  ||     0.03 |     0.03 |   -1%  |  0.0162 |
| TPC-H 09 ||  197933.08 |  162228.31 |  -18%  ||     0.03 |     0.03 |   -1%  |  0.0314 |
| TPC-H 10 ||  114584.12 |  161357.41 |  +41%  ||     0.03 |     0.03 |   -3%  |  0.0258 |
| TPC-H 11 ||   10749.21 |   12594.78 |  +17%  ||     0.03 |     0.03 |   -1%  |  0.6394 |
| TPC-H 12 ||   89155.66 |   63091.19 |  -29%  ||     0.03 |     0.03 |   -2%  |  0.1969 |
| TPC-H 13 ||  315728.94 |  414530.24 |  +31%  ||     0.03 |     0.02 |   -5%  |  0.0101 |
| TPC-H 14 ||   33363.36 |   27656.48 |  -17%  ||     0.03 |     0.03 |   +1%  |  0.5385 |
| TPC-H 15 ||   37367.25 |   36618.65 |   -2%  ||     0.03 |     0.03 |   -2%  |  0.9462 |
| TPC-H 16 ||   51007.80 |   55534.31 |   +9%  ||     0.03 |     0.03 |   +0%  |  0.6093 |
| TPC-H 17 ||   21529.45 |   34906.64 |  +62%  ||     0.03 |     0.03 |   +0%  |  0.4221 |
| TPC-H 18 ||  105816.19 |  128844.52 |  +22%  ||     0.03 |     0.03 |   -1%  |  0.1180 |
| TPC-H 19 ||   25147.21 |   19066.15 |  -24%  ||     0.03 |     0.03 |   -4%  |  0.2917 |
| TPC-H 20 ||   28826.88 |   44621.99 |  +55%  ||     0.03 |     0.03 |   -4%  |  0.0013 |
| TPC-H 21 ||  151217.67 |  114081.86 |  -25%  ||     0.03 |     0.03 |   +1%  |  0.0545 |
| TPC-H 22 ||   20178.14 |   19254.20 |   -5%  ||     0.03 |     0.03 |   +0%  |  0.7888 |
+----------++------------+------------+--------++----------+----------+--------+---------+
| Sum      || 1799723.54 | 1881799.90 |   +5%  ||          |          |        |         |
| Geomean  ||            |            |        ||          |          |   -2%  |         |
+----------++------------+------------+--------++----------+----------+--------+---------+

Executing with a single client, increasing the number of groups from 10 to 30, lead to a significant increase in performance:

TPC-H Single Client SF10
+Configuration Overview---------+------------------------------------------+------------------------------------------+
| Parameter                     | hyrise_main__scale_10_single_client.json | numa_merged_scale_10_single_client.json  |
+-------------------------------+------------------------------------------+------------------------------------------+
|  GIT-HASH                     | a23585c4360efda5413d7dbce86b7b6f2292ed3d | 0bf7270dda8f07660b3a16e712a789cc5be1b2a8 |
|  benchmark_mode               | Ordered                                  | Ordered                                  |
|  build_type                   | release                                  | release                                  |
|  chunk_indexes                | False                                    | False                                    |
|  chunk_size                   | 65535                                    | 65535                                    |
|  clients                      | 1                                        | 1                                        |
|  clustering                   | None                                     | None                                     |
|  compiler                     | gcc 9.2                                  | gcc 9.2                                  |
|  cores                        | 0                                        | 0                                        |
|  data_preparation_cores       | 0                                        | 0                                        |
|  date                         | 2023-09-05 14:56:55                      | 2023-09-05 12:29:19                      |
|  encoding                     | {'default': {'encoding': 'Dictionary'}}  | {'default': {'encoding': 'Dictionary'}}  |
|  max_duration                 | 90000000000                              | 90000000000                              |
|  max_runs                     | -1                                       | -1                                       |
|  scale_factor                 | 10.0                                     | 10.0                                     |
|  time_unit                    | ns                                       | ns                                       |
|  use_prepared_statements      | False                                    | False                                    |
|  using_scheduler              | True                                     | True                                     |
|  utilized_cores_per_numa_node | [30, 30, 30, 30, 30, 30, 30, 30]         | [30, 30, 30, 30, 30, 30, 30, 30]         |
|  verify                       | False                                    | False                                    |
|  warmup_duration              | 0                                        | 0                                        |
+-------------------------------+------------------------------------------+------------------------------------------+

+----------++----------+----------+--------++----------+----------+--------+---------+
| Item     || Latency (ms/iter)   | Change || Throughput (iter/s) | Change | p-value |
|          ||      old |      new |        ||      old |      new |        |         |
+----------++----------+----------+--------++----------+----------+--------+---------+
| TPC-H 01 ||  5706.79 |  5786.96 |   +1%  ||     0.17 |     0.17 |   +0%  |  0.0509 |
| TPC-H 02 ||    36.38 |    43.65 |  +20%  ||    24.42 |    19.91 |  -18%  |  0.0000 |
| TPC-H 03 ||  1107.88 |   973.94 |  -12%  ||     0.89 |     1.01 |  +14%  |  0.0000 |
| TPC-H 04 ||   681.98 |   581.63 |  -15%  ||     1.44 |     1.70 |  +18%  |  0.0000 |
| TPC-H 05 ||  1019.06 |   873.92 |  -14%  ||     0.97 |     1.13 |  +17%  |  0.0000 |
| TPC-H 06 ||    67.10 |    68.52 |   +2%  ||    13.69 |    13.62 |   -0%  |  0.0000 |
| TPC-H 07 ||   353.83 |   350.59 |   -1%  ||     2.78 |     2.81 |   +1%  |  0.0433 |
| TPC-H 08 ||   389.17 |   321.71 |  -17%  ||     2.53 |     3.06 |  +21%  |  0.0000 |
| TPC-H 09 ||  2900.34 |  2608.08 |  -10%  ||     0.33 |     0.38 |  +13%  |  0.0000 |
| TPC-H 10 ||  1882.20 |  1379.68 |  -27%  ||     0.52 |     0.72 |  +38%  |  0.0000 |
| TPC-H 11 ||    91.82 |    95.40 |   +4%  ||    10.29 |     9.98 |   -3%  |  0.0000 |
| TPC-H 12 ||   593.50 |   522.10 |  -12%  ||     1.67 |     1.89 |  +13%  |  0.0000 |
| TPC-H 13 ||  7859.61 |  6378.12 |  -19%  ||     0.12 |     0.16 |  +27%  |  0.0000 |
| TPC-H 14 ||   160.59 |   156.24 |   -3%  ||     6.02 |     6.19 |   +3%  |  0.0000 |
| TPC-H 15 ||   291.67 |   228.18 |  -22%  ||     3.37 |     4.28 |  +27%  |  0.0000 |
| TPC-H 16 ||   751.56 |   754.17 |   +0%  ||     1.31 |     1.31 |   +0%  |  0.4523 |
| TPC-H 17 ||   125.54 |    69.99 |  -44%  ||     7.64 |    13.33 |  +74%  |  0.0000 |
| TPC-H 18 ||  3546.77 |  3411.68 |   -4%  ||     0.28 |     0.29 |   +4%  |  0.0530 |
| TPC-H 19 ||   159.77 |   106.72 |  -33%  ||     6.07 |     8.92 |  +47%  |  0.0000 |
| TPC-H 20 ||   245.93 |   194.11 |  -21%  ||     3.98 |     5.01 |  +26%  |  0.0000 |
| TPC-H 21 ||  1362.45 |  1254.41 |   -8%  ||     0.72 |     0.79 |   +9%  |  0.0000 |
| TPC-H 22 ||   209.45 |   143.21 |  -32%  ||     4.66 |     6.72 |  +44%  |  0.0000 |
+----------++----------+----------+--------++----------+----------+--------+---------+
| Sum      || 29543.39 | 26303.00 |  -11%  ||          |          |        |         |
| Geomean  ||          |          |        ||          |          |  +15%  |         |
+----------++----------+----------+--------++----------+----------+--------+---------+

To determine the origin of this performance improvement (either grouping or stealing changes), we benchmarked both changes isolated:

TPC-H Single Client SF10 GROUPING CHANGES
+Configuration Overview---------+---------------------------------------------+------------------------------------------------+
| Parameter                     | hyrise_main_scale_10_single_client_new.json | grouping_30_scale_10_single_client_new.json    |
+-------------------------------+---------------------------------------------+------------------------------------------------+
|  GIT-HASH                     | a23585c4360efda5413d7dbce86b7b6f2292ed3d    | a23585c4360efda5413d7dbce86b7b6f2292ed3d-dirty |
|  benchmark_mode               | Ordered                                     | Ordered                                        |
|  build_type                   | release                                     | release                                        |
|  chunk_indexes                | False                                       | False                                          |
|  chunk_size                   | 65535                                       | 65535                                          |
|  clients                      | 1                                           | 1                                              |
|  clustering                   | None                                        | None                                           |
|  compiler                     | gcc 9.2                                     | gcc 9.2                                        |
|  cores                        | 0                                           | 0                                              |
|  data_preparation_cores       | 0                                           | 0                                              |
|  date                         | 2023-09-11 16:18:38                         | 2023-09-11 15:41:26                            |
|  encoding                     | {'default': {'encoding': 'Dictionary'}}     | {'default': {'encoding': 'Dictionary'}}        |
|  max_duration                 | 90000000000                                 | 90000000000                                    |
|  max_runs                     | -1                                          | -1                                             |
|  scale_factor                 | 10.0                                        | 10.0                                           |
|  time_unit                    | ns                                          | ns                                             |
|  use_prepared_statements      | False                                       | False                                          |
|  using_scheduler              | True                                        | True                                           |
|  utilized_cores_per_numa_node | [30, 30, 30, 30, 30, 30, 30, 30]            | [30, 30, 30, 30, 30, 30, 30, 30]               |
|  verify                       | False                                       | False                                          |
|  warmup_duration              | 0                                           | 0                                              |
+-------------------------------+---------------------------------------------+------------------------------------------------+

+----------++----------+----------+--------++----------+----------+--------+---------+
| Item     || Latency (ms/iter)   | Change || Throughput (iter/s) | Change | p-value |
|          ||      old |      new |        ||      old |      new |        |         |
+----------++----------+----------+--------++----------+----------+--------+---------+
| TPC-H 01 ||  5749.97 |  5607.62 |   -2%  ||     0.17 |     0.17 |   -0%  |  0.0007 |
| TPC-H 02 ||    37.04 |    44.97 |  +21%  ||    24.14 |    19.44 |  -19%  |  0.0000 |
| TPC-H 03 ||  1167.85 |  1096.28 |   -6%  ||     0.84 |     0.90 |   +7%  |  0.0000 |
| TPC-H 04 ||   720.16 |   650.70 |  -10%  ||     1.38 |     1.52 |  +10%  |  0.0000 |
| TPC-H 05 ||  1130.17 |   968.87 |  -14%  ||     0.88 |     1.02 |  +16%  |  0.0000 |
| TPC-H 06 ||    78.96 |    45.65 |  -42%  ||    11.90 |    19.63 |  +65%  |  0.0000 |
| TPC-H 07 ||   402.88 |   387.25 |   -4%  ||     2.44 |     2.54 |   +4%  |  0.0000 |
| TPC-H 08 ||   434.95 |   342.42 |  -21%  ||     2.27 |     2.87 |  +26%  |  0.0000 |
| TPC-H 09 ||  3505.42 |  3461.48 |   -1%  ||     0.28 |     0.28 |   -0%  |  0.3708 |
| TPC-H 10 ||  2293.74 |  2253.41 |   -2%  ||     0.43 |     0.43 |   -0%  |  0.2016 |
| TPC-H 11 ||    98.18 |   115.94 |  +18%  ||     9.68 |     8.26 |  -15%  |  0.0000 |
| TPC-H 12 ||   707.09 |   640.87 |   -9%  ||     1.40 |     1.54 |  +10%  |  0.0000 |
| TPC-H 13 ||  7954.08 |  8071.47 |   +1%  ||     0.12 |     0.12 |   +0%  |  0.0000 |
| TPC-H 14 ||   202.82 |   175.20 |  -14%  ||     4.80 |     5.54 |  +16%  |  0.0000 |
| TPC-H 15 ||   437.42 |   341.53 |  -22%  ||     2.26 |     2.88 |  +28%  |  0.0000 |
| TPC-H 16 ||   837.68 |  1047.12 |  +25%  ||     1.18 |     0.94 |  -20%  |  0.0000 |
| TPC-H 17 ||   166.42 |    95.89 |  -42%  ||     5.82 |     9.89 |  +70%  |  0.0000 |
| TPC-H 18 ||  3418.65 |  3360.40 |   -2%  ||     0.29 |     0.29 |   +0%  |  0.2120 |
| TPC-H 19 ||   190.81 |   119.61 |  -37%  ||     5.11 |     8.09 |  +58%  |  0.0000 |
| TPC-H 20 ||   315.13 |   224.49 |  -29%  ||     3.11 |     4.34 |  +40%  |  0.0000 |
| TPC-H 21 ||  1610.10 |  1202.77 |  -25%  ||     0.61 |     0.82 |  +35%  |  0.0000 |
| TPC-H 22 ||   248.82 |   163.62 |  -34%  ||     3.93 |     5.90 |  +50%  |  0.0000 |
+----------++----------+----------+--------++----------+----------+--------+---------+
| Sum      || 31708.34 | 30417.55 |   -4%  ||          |          |        |         |
| Geomean  ||          |          |        ||          |          |  +15%  |         |
+----------++----------+----------+--------++----------+----------+--------+---------+


TPC-H Single Client SF10 STEALING CHANGES
+Configuration Overview---------+---------------------------------------------+------------------------------------------------------+
| Parameter                     | hyrise_main_scale_10_single_client_new.json | prioritized_stealing_scale_10_single_client_new.json |
+-------------------------------+---------------------------------------------+------------------------------------------------------+
|  GIT-HASH                     | a23585c4360efda5413d7dbce86b7b6f2292ed3d    | 770b8bfaa1465f40ede35e8f626e8d39f77659d0-dirty       |
|  benchmark_mode               | Ordered                                     | Ordered                                              |
|  build_type                   | release                                     | release                                              |
|  chunk_indexes                | False                                       | False                                                |
|  chunk_size                   | 65535                                       | 65535                                                |
|  clients                      | 1                                           | 1                                                    |
|  clustering                   | None                                        | None                                                 |
|  compiler                     | gcc 9.2                                     | gcc 9.2                                              |
|  cores                        | 0                                           | 0                                                    |
|  data_preparation_cores       | 0                                           | 0                                                    |
|  date                         | 2023-09-11 16:18:38                         | 2023-09-11 15:01:49                                  |
|  encoding                     | {'default': {'encoding': 'Dictionary'}}     | {'default': {'encoding': 'Dictionary'}}              |
|  max_duration                 | 90000000000                                 | 90000000000                                          |
|  max_runs                     | -1                                          | -1                                                   |
|  scale_factor                 | 10.0                                        | 10.0                                                 |
|  time_unit                    | ns                                          | ns                                                   |
|  use_prepared_statements      | False                                       | False                                                |
|  using_scheduler              | True                                        | True                                                 |
|  utilized_cores_per_numa_node | [30, 30, 30, 30, 30, 30, 30, 30]            | [30, 30, 30, 30, 30, 30, 30, 30]                     |
|  verify                       | False                                       | False                                                |
|  warmup_duration              | 0                                           | 0                                                    |
+-------------------------------+---------------------------------------------+------------------------------------------------------+

+----------++----------+----------+--------++----------+----------+--------+---------+
| Item     || Latency (ms/iter)   | Change || Throughput (iter/s) | Change | p-value |
|          ||      old |      new |        ||      old |      new |        |         |
+----------++----------+----------+--------++----------+----------+--------+---------+
| TPC-H 01 ||  5749.97 |  5791.88 |   +1%  ||     0.17 |     0.17 |   -0%  |  0.6112 |
| TPC-H 02 ||    37.04 |    38.41 |   +4%  ||    24.14 |    23.29 |   -4%  |  0.0000 |
| TPC-H 03 ||  1167.85 |  1195.16 |   +2%  ||     0.84 |     0.83 |   -1%  |  0.0004 |
| TPC-H 04 ||   720.16 |   737.17 |   +2%  ||     1.38 |     1.34 |   -2%  |  0.0150 |
| TPC-H 05 ||  1130.17 |  1151.45 |   +2%  ||     0.88 |     0.86 |   -3%  |  0.0414 |
| TPC-H 06 ||    78.96 |    79.55 |   +1%  ||    11.90 |    11.89 |   -0%  |  0.0046 |
| TPC-H 07 ||   402.88 |   409.63 |   +2%  ||     2.44 |     2.41 |   -1%  |  0.0003 |
| TPC-H 08 ||   434.95 |   444.51 |   +2%  ||     2.27 |     2.22 |   -2%  |  0.0000 |
| TPC-H 09 ||  3505.42 |  3478.86 |   -1%  ||     0.28 |     0.28 |   -0%  |  0.5473 |
| TPC-H 10 ||  2293.74 |  2933.05 |  +28%  ||     0.43 |     0.33 |  -23%  |  0.0000 |
| TPC-H 11 ||    98.18 |    95.37 |   -3%  ||     9.68 |     9.93 |   +3%  |  0.0000 |
| TPC-H 12 ||   707.09 |   738.83 |   +4%  ||     1.40 |     1.34 |   -4%  |  0.0000 |
| TPC-H 13 ||  7954.08 |  7821.55 |   -2%  ||     0.12 |     0.12 |   +0%  |  0.0001 |
| TPC-H 14 ||   202.82 |   203.32 |   +0%  ||     4.80 |     4.79 |   -0%  |  0.3841 |
| TPC-H 15 ||   437.42 |   441.17 |   +1%  ||     2.26 |     2.23 |   -1%  |  0.3888 |
| TPC-H 16 ||   837.68 |   729.15 |  -13%  ||     1.18 |     1.36 |  +15%  |  0.0000 |
| TPC-H 17 ||   166.42 |   172.14 |   +3%  ||     5.82 |     5.62 |   -3%  |  0.0000 |
| TPC-H 18 ||  3418.65 |  3441.60 |   +1%  ||     0.29 |     0.29 |   -0%  |  0.7076 |
| TPC-H 19 ||   190.81 |   196.40 |   +3%  ||     5.11 |     4.96 |   -3%  |  0.0000 |
| TPC-H 20 ||   315.13 |   319.82 |   +1%  ||     3.11 |     3.07 |   -1%  |  0.0000 |
| TPC-H 21 ||  1610.10 |  1629.73 |   +1%  ||     0.61 |     0.61 |   +0%  |  0.1320 |
| TPC-H 22 ||   248.82 |   300.17 |  +21%  ||     3.93 |     3.27 |  -17%  |  0.0000 |
+----------++----------+----------+--------++----------+----------+--------+---------+
| Sum      || 31708.34 | 32348.93 |   +2%  ||          |          |        |         |
| Geomean  ||          |          |        ||          |          |   -2%  |         |
+----------++----------+----------+--------++----------+----------+--------+---------+


As you can see, the impact of stealing tasks from closer nodes first is negatable. This is likely due to the fact, that right now, there is no smart scheduling applied to the tasks, so they are run on random nodes. As soon as this changes, we might see an increase in performance, though.

Thus, increasing to the group size from 10 to 30 seems to be the single source of performance increases.

Executing TPC-H SF1 on Rapa, with all new changes on a single numa node, showed slight performance increases

TPC-H SF1 single NUMA node

+Configuration Overview---------+----------------------------------------------------+----------------------------------------------------+
| Parameter                     | hyrise_main_scale_1_single_client_single_node.json | numa_merged_scale_1_single_client_single_node.json |
+-------------------------------+----------------------------------------------------+----------------------------------------------------+
|  GIT-HASH                     | a23585c4360efda5413d7dbce86b7b6f2292ed3d           | 770b8bfaa1465f40ede35e8f626e8d39f77659d0           |
|  benchmark_mode               | Ordered                                            | Ordered                                            |
|  build_type                   | release                                            | release                                            |
|  chunk_indexes                | False                                              | False                                              |
|  chunk_size                   | 65535                                              | 65535                                              |
|  clients                      | 1                                                  | 1                                                  |
|  clustering                   | None                                               | None                                               |
|  compiler                     | gcc 9.2                                            | gcc 9.2                                            |
|  cores                        | 0                                                  | 0                                                  |
|  data_preparation_cores       | 0                                                  | 0                                                  |
|  date                         | 2023-09-07 14:10:13                                | 2023-09-07 16:26:36                                |
|  encoding                     | {'default': {'encoding': 'Dictionary'}}            | {'default': {'encoding': 'Dictionary'}}            |
|  max_duration                 | 60000000000                                        | 60000000000                                        |
|  max_runs                     | -1                                                 | -1                                                 |
|  scale_factor                 | 1.0                                                | 1.0                                                |
|  time_unit                    | ns                                                 | ns                                                 |
|  use_prepared_statements      | False                                              | False                                              |
|  using_scheduler              | True                                               | True                                               |
|  utilized_cores_per_numa_node | [30, 0, 0, 0, 0, 0, 0, 0]                          | [30, 0, 0, 0, 0, 0, 0, 0]                          |
|  verify                       | False                                              | False                                              |
|  warmup_duration              | 0                                                  | 0                                                  |
+-------------------------------+----------------------------------------------------+----------------------------------------------------+

+----------++----------+---------+--------++----------+----------+--------+---------+
| Item     || Latency (ms/iter)  | Change || Throughput (iter/s) | Change | p-value |
|          ||      old |     new |        ||      old |      new |        |         |
+----------++----------+---------+--------++----------+----------+--------+---------+
| TPC-H 01 ||   557.84 |  554.21 |   -1%  ||     1.77 |     1.78 |   +1%  |  0.2282 |
| TPC-H 02 ||     7.46 |    7.58 |   +2%  ||    77.78 |    77.43 |   -0%  |  0.3126 |
| TPC-H 03 ||    50.64 |   45.73 |  -10%  ||    18.31 |    19.48 |   +6%  |  0.0000 |
| TPC-H 04 ||    27.99 |   26.88 |   -4%  ||    32.78 |    33.00 |   +1%  |  0.0000 |
| TPC-H 05 ||   116.53 |  115.62 |   -1%  ||     8.20 |     8.28 |   +1%  |  0.1659 |
| TPC-H 06 ||     6.84 |    4.66 |  -32%  ||    98.46 |    98.44 |   -0%  |  0.0000 |
| TPC-H 07 ||    47.36 |   49.15 |   +4%  ||    18.40 |    18.13 |   -1%  |  0.0097 |
| TPC-H 08 ||    53.05 |   51.24 |   -3%  ||    17.60 |    17.56 |   -0%  |  0.0001 |
| TPC-H 09 ||   197.47 |  194.79 |   -1%  ||     4.93 |     5.00 |   +1%  |  0.0681 |
| TPC-H 10 ||    97.73 |   99.96 |   +2%  ||     9.72 |     9.50 |   -2%  |  0.0000 |
| TPC-H 11 ||     7.49 |    7.23 |   -4%  ||    97.26 |    96.96 |   -0%  |  0.0000 |
| TPC-H 12 ||    39.07 |   33.46 |  -14%  ||    23.83 |    24.77 |   +4%  |  0.0000 |
| TPC-H 13 ||   202.91 |  188.27 |   -7%  ||     4.80 |     5.18 |   +8%  |  0.0000 |
| TPC-H 14 ||    26.66 |   26.10 |   -2%  ||    32.34 |    32.58 |   +1%  |  0.0000 |
| TPC-H 15 ||    17.30 |   15.88 |   -8%  ||    49.39 |    49.45 |   +0%  |  0.0000 |
| TPC-H 16 ||    79.49 |   77.71 |   -2%  ||    11.95 |    12.13 |   +2%  |  0.0000 |
| TPC-H 17 ||    10.89 |    9.72 |  -11%  ||    69.34 |    80.04 |  +15%  |  0.0000 |
| TPC-H 18 ||   280.66 |  281.24 |   +0%  ||     3.48 |     3.48 |   +0%  |  0.7321 |
| TPC-H 19 ||    25.21 |   24.38 |   -3%  ||    33.02 |    33.00 |   -0%  |  0.0000 |
| TPC-H 20 ||    27.59 |   25.27 |   -8%  ||    32.75 |    34.03 |   +4%  |  0.0000 |
| TPC-H 21 ||   112.49 |  107.86 |   -4%  ||     8.52 |     8.85 |   +4%  |  0.0174 |
| TPC-H 22 ||    28.83 |   28.29 |   -2%  ||    30.62 |    30.98 |   +1%  |  0.0000 |
+----------++----------+---------+--------++----------+----------+--------+---------+
| Sum      ||  2021.51 | 1975.23 |   -2%  ||          |          |        |         |
| Geomean  ||          |         |        ||          |          |   +2%  |         |
+----------++----------+---------+--------++----------+----------+--------+---------+

@rrcomtech rrcomtech added the FullCI Run all CI tests (slow, but required for merge) label Aug 24, 2023
@Bouncner
Copy link
Collaborator

The number of groups in the scheduler increased from 10 to 30, as we noticed increased performance from that.

Contrary to what we noticed during development, increasing the number of groups from 10 to 30 did not seem to increase performance, as we could only witness a small increase in throughput at the cost of higher latency.

So why change it?

What did you actually benchmark there? What about single-client? Did you measure "both main changes" together?
Can you also run a SF 1 run with a few cores on a single NUMA node, please?

@Tratori
Copy link
Contributor Author

Tratori commented Sep 7, 2023

The number of groups in the scheduler increased from 10 to 30, as we noticed increased performance from that.
Contrary to what we noticed during development, increasing the number of groups from 10 to 30 did not seem to increase performance, as we could only witness a small increase in throughput at the cost of higher latency.

So why change it?

What did you actually benchmark there? What about single-client? Did you measure "both main changes" together? Can you also run a SF 1 run with a few cores on a single NUMA node, please?

Updated the PR description, including the requested benchmarks.
When using a single client performance improvements are visible.

@Bouncner
Copy link
Collaborator

Bouncner commented Sep 7, 2023

So why change it?

Coming back to this question. If the benchmarks did not show a positive effect of 30, why still change it from 10 to 30?

@Tratori
Copy link
Contributor Author

Tratori commented Sep 8, 2023

So why change it?

Coming back to this question. If the benchmarks did not show a positive effect of 30, why still change it from 10 to 30?

In the single-client mode, we can see an increase of 10 - 15 % in latency and throughput.
If that is not enough, I would suggest dropping the grouping changes (especially considering the planned dynamic changing of that parameter).

@Bouncner
Copy link
Collaborator

Bouncner commented Sep 8, 2023

But there are no benchmarks for this change, right? I only see the benchmark for all three changes together.

@Tratori
Copy link
Contributor Author

Tratori commented Sep 8, 2023

But there are no benchmarks for this change, right? I only see the benchmark for all three changes together.

Fair, will benchmark this isolated.

@Tratori
Copy link
Contributor Author

Tratori commented Sep 12, 2023

But there are no benchmarks for this change, right? I only see the benchmark for all three changes together.

Added new benchmarks, measuring the changes isolated.

@Bouncner
Copy link
Collaborator

Why did you use a scale factor og 10? The working set almost fits into the caches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FullCI Run all CI tests (slow, but required for merge)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants