-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DYOD] NUMA - Node stealing and Grouping changes #2608
base: master
Are you sure you want to change the base?
Conversation
…yrise#2606) Fixes performance and optimizer breakdown plots for cached queries.
…etrics` (hyrise#2606)" This reverts commit 08b9cc0.
So why change it? What did you actually benchmark there? What about single-client? Did you measure "both main changes" together? |
Updated the PR description, including the requested benchmarks. |
Coming back to this question. If the benchmarks did not show a positive effect of 30, why still change it from 10 to 30? |
In the single-client mode, we can see an increase of 10 - 15 % in latency and throughput. |
But there are no benchmarks for this change, right? I only see the benchmark for all three changes together. |
Fair, will benchmark this isolated. |
Added new benchmarks, measuring the changes isolated. |
Why did you use a scale factor og 10? The working set almost fits into the caches. |
The Scheduler now queries the OS for the distance between its queries node ids and gives a prioritized queue id ordering based on that. This way, workers can now steal from "close" queues/nodes first.
The number of groups in the scheduler increased from 10 to 30, as we noticed increased performance from that.
To verify, a series of benchmarks were executed on RAPA, comparing the master with all changes included in this PR.
When executing TPC-H with multiple clients (60), increasing the number of groups from 10 to 30 did not seem to increase performance, as we could only witness a small increase in throughput at the cost of higher latency.
TPC-H Multiple Clients SF100
Executing with a single client, increasing the number of groups from 10 to 30, lead to a significant increase in performance:
TPC-H Single Client SF10
To determine the origin of this performance improvement (either grouping or stealing changes), we benchmarked both changes isolated:
TPC-H Single Client SF10 GROUPING CHANGES
TPC-H Single Client SF10 STEALING CHANGES
As you can see, the impact of stealing tasks from closer nodes first is negatable. This is likely due to the fact, that right now, there is no smart scheduling applied to the tasks, so they are run on random nodes. As soon as this changes, we might see an increase in performance, though.
Thus, increasing to the group size from 10 to 30 seems to be the single source of performance increases.
Executing TPC-H SF1 on Rapa, with all new changes on a single numa node, showed slight performance increases
TPC-H SF1 single NUMA node