PERF-5441 Adds the `time_series_lastpoint_high_cardinality` workload. #1218

yun-soo · 2024-05-10T23:25:28Z

Jira Ticket: PERF-5441

Whats Changed

Adds the time_series_lastpoint_high_cardinality workload.

Patch Testing Results

https://spruce.mongodb.com/version/663f01f365c3b80007df9990/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC

src/phases/query/TimeSeriesLastpoint.yml

src/workloads/query/TimeSeriesLastpoint.yml

alicedoherty · 2024-05-13T12:33:24Z

Thanks @yun-soo! Would you be able to fill out the New Workload Request form? Also just want to confirm that for this PR only the TimeSeriesLastpointHighCardinality workload file needs to be reviewed, and the changes to the TimeSeriesLastpoint workload/phase file will be dealt with in #1219?

Do you have an idea of how noisy this workload is? We'd generally recommend running x5 patches to get a sense of this. You can use the performance analyzer to automate this (see the third method here).

yun-soo · 2024-05-13T15:16:18Z

Thanks a lot for the review, @alicedoherty!

Would you be able to fill out the New Workload Request form?

Filed the request. I noticed that build variant list is not consistent with the up-to-date one.

Also just want to confirm that for this PR only the TimeSeriesLastpointHighCardinality workload file needs to be reviewed, and the changes to the TimeSeriesLastpoint workload/phase file will be dealt with in #1219?

Yes, only TimeSeriesLastpointHighCardinality needs to be reviewed and they will in #1219

Do you have an idea of how noisy this workload is? We'd generally recommend running x5 patches to get a sense of this. You can use the performance analyzer to automate this (see the third method here).

Just scheduled one here. Will update this as soon as I will get results.
Update:
Got perf analyzer results.
OperationThroughput, Latency50thPercentile, Latency95Percentil's min/max values against mean fall into [-4.86%, 7.41%] except two outliers. Two outliers are:

MetaDescTimeDesc.RunLastPointQueryWithSmallPredicate's min/max on perf-3-node-replSet-intel.intel.aws.2023-11: [-6.50%, 25.38%]
MetaDescTimeDesc.RunLastPointQueryWithSmallPredicate's min/max on perf-3-node-replSet.arm.aws.2023-11: [-30.92%, 148.77%]

alicedoherty

Thanks @yun-soo! Sorry for the delay, was focusing on skunkworks. Thanks for calling out that the variants in the form are outdated, I've opened PERF-5488.

Do you have any idea why the Latency95thPercentile for MetaDescTimeDesc.RunLastPointQueryWithSmallPredicate is so noisy? Since this seems like one of the key metrics you care about this would make triaging changepoints pretty hard for this measurement.

Also as an FYI in the perf analyzer if you select to show the "CoV (%)" it will show the calculated coefficient of variation (the main measure of noise our team uses) for each measurement in your patches to save you some time when doing a noise analysis.

src/workloads/query/TimeSeriesLastpointHighCardinality.yml

alicedoherty · 2024-05-20T10:51:51Z

src/workloads/query/TimeSeriesLastpointHighCardinality.yml

+        $gte: v7.0
+      mongodb_setup:
+        $eq:
+          - standalone


We've gotten rid of our regular standalone variants. Having this line will just mean it will only be running on the ARM and Intel 3-Node ReplSet 2023-11 variants.

Apologies but I don't understand what you're suggesting here. Could you clarify on it so that I know what should be changed? I could not exactly correlate these names with the updated build variants.

These lines are exactly same as in the companion workload TimeSeriesLastpoint.yml. And I was able to see that this workload is enabled on perf-3-node-replSet-intel.intel.aws.2023-11 and perf-3-node-replSet.arm.aws.2023-11 and perf-3-node-replSet-all-feature-flags.arm.2023-11. I think these would be enough.

Sorry the way I worded that was confusing 😅 We've removed our regular standalone variants (SERVER-89499). Having $eq: standalone, standalone-XXX won't cause any issues, it just means it won't match an existing variant, and therefore won't run on any standalone variants. I just wanted to call this out so you weren't surprised that the test isn't running on standalone variants despite having this line in your code.

For clarity it's probably best to remove the standalone lines here.

(JFYI WRT the correlation between these lines and our variants - our variants are defined in perf.yml and perf_branching.yml. Each variant has a mongodb_setup: XXX line. In your Genny file if you write AutoRun: When: mongodb_setup: $eq: replica this means it will run on any variant that has mongodb_setup: replica)

Ah~ makes sense now!! Thanks for detailed explanation. Really appreciate it! Removed standalone(-*) for clarity just like you suggested.

yun-soo

Thanks for review! Addressed comments or asked follow up questions.

Sorry for the delay, was focusing on skunkworks. Thanks for calling out that the variants in the form are outdated, I've opened PERF-5488.

No worries and I was also focusing on skunkworks as many others were. Thanks for filing the ticket.

Do you have any idea why the Latency95thPercentile for MetaDescTimeDesc.RunLastPointQueryWithSmallPredicate is so noisy? Since this seems like one of the key metrics you care about this would make triaging changepoints pretty hard for this measurement.

I don't have a good idea on why yet. I'll spend some time to figure out why but it'll take some time. I'll get back to this PR when I have a good idea. Do you happen to know what could cause such noisiness in other workloads?

Also as an FYI in the perf analyzer if you select to show the "CoV (%)" it will show the calculated coefficient of variation (the main measure of noise our team uses) for each measurement in your patches to save you some time when doing a noise analysis.

Thanks for letting me know. What's the threshold which the team decides a metric is too noisy on? I noticed that CoVs were there, but I didn't know how to interpret those values.

yun-soo · 2024-05-20T16:26:43Z

src/workloads/query/TimeSeriesLastpointHighCardinality.yml

+        $gte: v7.0
+      mongodb_setup:
+        $eq:
+          - standalone


Apologies but I don't understand what you're suggesting here. Could you clarify on it so that I know what should be changed? I could not exactly correlate these names with the updated build variants.

These lines are exactly same as in the companion workload TimeSeriesLastpoint.yml. And I was able to see that this workload is enabled on perf-3-node-replSet-intel.intel.aws.2023-11 and perf-3-node-replSet.arm.aws.2023-11 and perf-3-node-replSet-all-feature-flags.arm.2023-11. I think these would be enough.

src/workloads/query/TimeSeriesLastpointHighCardinality.yml

alicedoherty

Do you happen to know what could cause such noisiness in other workloads?

Looking at the existing time_series_lastpoint test it seems much of the underlying test design/queries are noisy. If you're getting useful signal from that existing test I think it would be okay to merge this test with similar levels, however, I definitely think it's worth looking into the noise of Latency95thPercentile for MetaDescTimeDesc.RunLastPointQueryWithSmallPredicate as a CV of 73% (!) and 12% seems like an indication that something is not testing what you'd expect. Since the really bad noise is limited to this specific phase and (potentially) ARM it might be a good idea to do some profiling, t2 analysis, etc. on that specific part of the test (looking at one of the really good vs bad runs). There are some general resources documented here, but feel free to ask in #ask-devprod-performance if you need more guidance.

What's the threshold which the team decides a metric is too noisy on?

We've struggled to define a specific threshold. Ideally we would like it <1% but at the moment with our current testing infrastructure this seems a little unrealistic. Personally, for tests like this I'd be concerned about metrics with CV >5%.

If this is a time-sensitive feature you want to test - it could also be an option to tell the build barons to ignore the really noisy metric (since it's not the whole task) and open a follow-on ticket to investigate this further.

alicedoherty · 2024-05-21T10:40:50Z

src/workloads/query/TimeSeriesLastpointHighCardinality.yml

+        $gte: v7.0
+      mongodb_setup:
+        $eq:
+          - standalone


Sorry the way I worded that was confusing 😅 We've removed our regular standalone variants (SERVER-89499). Having $eq: standalone, standalone-XXX won't cause any issues, it just means it won't match an existing variant, and therefore won't run on any standalone variants. I just wanted to call this out so you weren't surprised that the test isn't running on standalone variants despite having this line in your code.

For clarity it's probably best to remove the standalone lines here.

(JFYI WRT the correlation between these lines and our variants - our variants are defined in perf.yml and perf_branching.yml. Each variant has a mongodb_setup: XXX line. In your Genny file if you write AutoRun: When: mongodb_setup: $eq: replica this means it will run on any variant that has mongodb_setup: replica)

yun-soo

I definitely think it's worth looking into the noise of Latency95thPercentile for MetaDescTimeDesc.RunLastPointQueryWithSmallPredicate as a CV of 73% (!) and 12% seems like an indication that something is not testing what you'd expect. Since the really bad noise is limited to this specific phase and (potentially) ARM it might be a good idea to do some profiling, t2 analysis, etc. on that specific part of the test (looking at one of the really good vs bad runs). There are some general resources documented here, but feel free to ask in #ask-devprod-performance if you need more guidance.

Thanks for detailed guidance and really appreciate it! I'll definitely look into those two metrics. I wanted to measure improved performance for the higher cardinality workload thanks to a new optimization because TimeSeriesLastpoint.yml had a bug (so a noise) and also didn't give my team a good signal about the new optimization but don't want to get too much noise from this new workload either.

If this is a time-sensitive feature you want to test - it could also be an option to tell the build barons to ignore the really noisy metric (since it's not the whole task) and open a follow-on ticket to investigate this further.

Actually, that's not the case. I confirmed that the feature actually improves the performance in other workloads and this workload is for future performance regression testing, which is not urgent. I wanted to give a closure to the feature. I'm in the middle of something for the other project and so will get back to you with my analysis and any changes necessary to this workload.

yun-soo · 2024-05-21T17:12:55Z

src/workloads/query/TimeSeriesLastpointHighCardinality.yml

+        $gte: v7.0
+      mongodb_setup:
+        $eq:
+          - standalone


Ah~ makes sense now!! Thanks for detailed explanation. Really appreciate it! Removed standalone(-*) for clarity just like you suggested.

jawwadasghar · 2024-05-21T19:02:31Z

src/workloads/query/TimeSeriesLastpoint.yml


  - LoadConfig:
      Path: *phasePath
      Key: QuiesceActor
      Parameters:
-        MaxPhases: 23
+        MaxPhases: *MaxPhases


nit: It might be clearer if the active phases for the quiesce actor are defined here rather than inheriting the defaults.
That way all the phase info is in this file, and it lines up with what is done for TimeSeriesLastpointHighCardinality.yml

This is done in #1219.

jawwadasghar · 2024-05-21T19:16:21Z

src/workloads/query/TimeSeriesLastpointHighCardinality.yml

+
+  ##################################################################################################
+  # Following 4 actors including the hidden QuiesceActor in phase 20 run a lastpoint query with a
+  # sort and group on meta subfield ascending and time descending.


Should this say meta subfield descending?

Good catch! done.

jawwadasghar · 2024-05-21T19:19:51Z

src/workloads/query/TimeSeriesLastpoint.yml


  - LoadConfig:
      Path: *phasePath
      Key: QuiesceActor
      Parameters:
-        MaxPhases: 23
+        MaxPhases: *MaxPhases

 AutoRun:


We are deprecating support for standalone variants.
Can you remove the AutoRun conditions for the standalone setups?
This would also then match when the HighCardinality workload gets scheduled.

Also should the branch conditional be "$gte: v7.0" like it is for the HighCardinality one?

Can you remove the AutoRun conditions for the standalone setups?

This is done in #1219.

should the branch conditional be "$gte: v7.0" like it is for the HighCardinality one?

This is answered in #1219.

yun-soo

@jawwadasghar, addressed your comments

yun-soo · 2024-05-22T14:37:07Z

src/workloads/query/TimeSeriesLastpointHighCardinality.yml

+
+  ##################################################################################################
+  # Following 4 actors including the hidden QuiesceActor in phase 20 run a lastpoint query with a
+  # sort and group on meta subfield ascending and time descending.


Good catch! done.

yun-soo · 2024-05-22T15:10:24Z

src/workloads/query/TimeSeriesLastpoint.yml


  - LoadConfig:
      Path: *phasePath
      Key: QuiesceActor
      Parameters:
-        MaxPhases: 23
+        MaxPhases: *MaxPhases


This is done in #1219.

yun-soo · 2024-05-22T15:12:34Z

src/workloads/query/TimeSeriesLastpoint.yml


  - LoadConfig:
      Path: *phasePath
      Key: QuiesceActor
      Parameters:
-        MaxPhases: 23
+        MaxPhases: *MaxPhases

 AutoRun:


Can you remove the AutoRun conditions for the standalone setups?

This is done in #1219.

should the branch conditional be "$gte: v7.0" like it is for the HighCardinality one?

This is answered in #1219.

jawwadasghar

LGTM, thanks!

yun-soo added 3 commits May 10, 2024 22:39

PERF-5441 Create time_series_lastpoint_high_cardinality workload

5b73f3c

Fix lint

22c3acc

Enable on over v7.0

30f88ea

yun-soo requested a review from a team as a code owner May 10, 2024 23:25

yun-soo requested a review from alicedoherty May 10, 2024 23:25

yun-soo added 2 commits May 10, 2024 23:57

Multiple thread loader

8566758

Fix failure

39770be

yun-soo changed the title ~~PERF-5441 Creates the time_series_lastpoint_high_cardinality workload.~~ PERF-5441 Adds the time_series_lastpoint_high_cardinality workload. May 11, 2024

yun-soo added 5 commits May 11, 2024 02:08

Fix lint failures

5f3c408

Change metric names

4d75119

Lower coll cardinality

df13288

cleanup

df4c562

Fix document

edc7116

yun-soo commented May 11, 2024

View reviewed changes

src/phases/query/TimeSeriesLastpoint.yml Show resolved Hide resolved

yun-soo commented May 11, 2024

View reviewed changes

src/workloads/query/TimeSeriesLastpoint.yml Show resolved Hide resolved

yun-soo mentioned this pull request May 13, 2024

PERF-5440 Fix created indices for ascending sort key queries in TimeSeriesLastpoint.yml workload #1219

Merged

alicedoherty reviewed May 20, 2024

View reviewed changes

Address 2nd round of comments

dbb3df2

yun-soo commented May 20, 2024

View reviewed changes

alicedoherty reviewed May 21, 2024

View reviewed changes

Fix build variants to run

6afbd31

yun-soo commented May 21, 2024

View reviewed changes

jawwadasghar reviewed May 21, 2024

View reviewed changes

Address PR comment

89918e2

yun-soo commented May 22, 2024

View reviewed changes

jawwadasghar approved these changes May 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF-5441 Adds the `time_series_lastpoint_high_cardinality` workload. #1218

PERF-5441 Adds the `time_series_lastpoint_high_cardinality` workload. #1218

yun-soo commented May 10, 2024 •

edited

alicedoherty commented May 13, 2024

yun-soo commented May 13, 2024 •

edited

alicedoherty left a comment •

edited

alicedoherty May 20, 2024

yun-soo May 20, 2024

alicedoherty May 21, 2024

yun-soo May 21, 2024

yun-soo left a comment

yun-soo May 20, 2024

alicedoherty left a comment •

edited

alicedoherty May 21, 2024

yun-soo left a comment

yun-soo May 21, 2024

jawwadasghar May 21, 2024

yun-soo May 22, 2024

jawwadasghar May 21, 2024

yun-soo May 22, 2024

jawwadasghar May 21, 2024

yun-soo May 22, 2024

yun-soo left a comment

yun-soo May 22, 2024

yun-soo May 22, 2024

yun-soo May 22, 2024

jawwadasghar left a comment

PERF-5441 Adds the time_series_lastpoint_high_cardinality workload. #1218

Are you sure you want to change the base?

PERF-5441 Adds the time_series_lastpoint_high_cardinality workload. #1218

Conversation

yun-soo commented May 10, 2024 • edited

Whats Changed

Patch Testing Results

alicedoherty commented May 13, 2024

yun-soo commented May 13, 2024 • edited

alicedoherty left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yun-soo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alicedoherty left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yun-soo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yun-soo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jawwadasghar left a comment

Choose a reason for hiding this comment

PERF-5441 Adds the `time_series_lastpoint_high_cardinality` workload. #1218

PERF-5441 Adds the `time_series_lastpoint_high_cardinality` workload. #1218

yun-soo commented May 10, 2024 •

edited

yun-soo commented May 13, 2024 •

edited

alicedoherty left a comment •

edited

alicedoherty left a comment •

edited