Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Timeline::get_vectored for doing timestamp lookups in SLRUs #7755

Open
VladLazar opened this issue May 14, 2024 · 1 comment
Open

Use Timeline::get_vectored for doing timestamp lookups in SLRUs #7755

VladLazar opened this issue May 14, 2024 · 1 comment
Labels
c/storage/pageserver Component: storage: pageserver m/good_first_issue Moment: when doing your first Neon contributions t/bug Issue Type: Bug

Comments

@VladLazar
Copy link
Contributor

VladLazar commented May 14, 2024

Issue

Mapping timestamps to LSN is slow for tenats with large SLRU counts.

The slowness comes from repeatedly querying SLRUs without using vectored get. On a tenant with 190 slru
blocks this took like 4 seconds for each round of the binary search.

Timeline::map_all_timestamps should use Timeline::get_vectored instead of fanning out one Timeline::get for each block. Make sure to batch the calls into get_vectored up to the limit (32 or 64 I don't recall).

Steps to reproduce

Import tenant 751897498d4ed3c54a6a28fcbce888b8 on a throaway ec2 instance (i used i3en.3xlarge) and do:

$ time curl 127.0.0.1:9898/v1/tenant/751897498d4ed3c54a6a28fcbce888b8/timeline/4e51c228ffc33537db5cb42c7c3ebbcb/get_lsn_by_timestamp?timestamp=2024-05-10T00:00:00Z

real    1m24.795s

Now apply this diff in order to see how much time is being spent in looking up slrus:

diff --git a/pageserver/src/pgdatadir_mapping.rs b/pageserver/src/pgdatadir_mapping.rs
index 1092d64d3..e8e0a28fb 100644
--- a/pageserver/src/pgdatadir_mapping.rs
+++ b/pageserver/src/pgdatadir_mapping.rs
@@ -33,6 +33,7 @@ use serde::{Deserialize, Serialize};
 use std::collections::{hash_map, HashMap, HashSet};
 use std::ops::ControlFlow;
 use std::ops::Range;
+use std::time::Instant;
 use strum::IntoEnumIterator;
 use tokio_util::sync::CancellationToken;
 use tracing::{debug, trace, warn};
@@ -422,6 +423,8 @@ impl Timeline {
             // cannot overflow, high and low are both smaller than u64::MAX / 2
             let mid = (high + low) / 2;

+            let timer = Instant::now();
+
             let cmp = self
                 .is_latest_commit_timestamp_ge_than(
                     search_timestamp,
@@ -432,6 +435,15 @@ impl Timeline {
                 )
                 .await?;

+            let elapsed = timer.elapsed();
+            tracing::info!(
+                "FIND_LSN: high={} low={} mid={} slru_search_duration={}ms",
+                high,
+                low,
+                mid,
+                elapsed.as_millis()
+            );
+
             if cmp {
                 high = mid;
             } else {

In my repro i had:

...
2024-05-14T13:31:09.649748Z  INFO request{method=GET path=/v1/tenant/751897498d4ed3c54a6a28fcbce888b8/timeline/4e51c228ffc33537db5cb42c7c3ebbcb/get_lsn_by_timestamp request_id=7bb1d9c9-b743-4035-a7d8-e5e043f2890a}: FIND_LSN: high=261830080 low=261829569 mid=261829824 slru_search_duration=2871ms
2024-05-14T13:31:13.825731Z  INFO request{method=GET path=/v1/tenant/751897498d4ed3c54a6a28fcbce888b8/timeline/4e51c228ffc33537db5cb42c7c3ebbcb/get_lsn_by_timestamp request_id=7bb1d9c9-b743-4035-a7d8-e5e043f2890a}: FIND_LSN: high=261829824 low=261829569 mid=261829696 slru_search_duration=4175ms
2024-05-14T13:31:17.967303Z  INFO request{method=GET path=/v1/tenant/751897498d4ed3c54a6a28fcbce888b8/timeline/4e51c228ffc33537db5cb42c7c3ebbcb/get_lsn_by_timestamp request_id=7bb1d9c9-b743-4035-a7d8-e5e043f2890a}: FIND_LSN: high=261829824 low=261829697 mid=261829760 slru_search_duration=4141ms
...

Note: this used the non-vectored read path. If the get_impl="vectored" config is used, it comes out to about
13s.

Related:

@VladLazar VladLazar added t/bug Issue Type: Bug c/storage/pageserver Component: storage: pageserver m/good_first_issue Moment: when doing your first Neon contributions labels May 14, 2024
@jcsp
Copy link
Contributor

jcsp commented May 23, 2024

@VladLazar is this issue still relevant now that we're using vectored get by default everywhere, or is the LSN->timestamp lookup still doing its own thing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver m/good_first_issue Moment: when doing your first Neon contributions t/bug Issue Type: Bug
Projects
None yet
Development

No branches or pull requests

2 participants