Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add concurrency limit to QueryRunner and DataSourceSettings #2513

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

Kevin-Chant
Copy link
Contributor

@Kevin-Chant Kevin-Chant commented May 13, 2024

Features and Changes

Creates a new field in DataSourceSettings: maxConcurrentQueries. This field is respected by QueryRunner's startQuery and startReadyQueries entrypoints, which now check the number of running queries before committing to starting a new query.

Added a helper to QueryModel to grab the count of running queries & a new section to the ConnectionSettings modal to allow for editing of this setting (and create a place for other future shared settings to live).

Note: the limit is only respected across multiple experiments (i.e. one experiment which runs N queries where N is over the concurrency limit will not trigger the protections) due to race conditions with queries starting simultaneously.

Testing

  1. Set up at least one experiment which makes multiple queries
  2. Edit the settings for the datastore and provide a concurrency limit (probably 1)
  3. Open the datastore queries list page in a new tab
  4. Force update the results for the experiment(s) and check the queries list page to see that some have been queued.
    a. (If the delay isn't noticeable, you can change the default currentTimeout: number = 500 to 8000.)

Screenshots

image

image

Here's a screen recording of the UX with a concurrency limit of 1

queryconcurrency_ux

And here's a corresponding recording of what's happening on the backend with a few extra console.logs added for clarity.
queryconcurrency_backend

Copy link

github-actions bot commented May 13, 2024

Your preview environment pr-2513-bttf has been deployed.

Preview environment endpoints are available at:

Copy link
Member

@jdorn jdorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In startExperimentResultQueries, we currently start all queries at once and do a Promise.all. This is causing the race condition you mention in your PR description.

We can fix this fairly easily by changing it to await each promise before starting the next query.

It's really only the experiment query that has this issue since all of the other ones fire 1 (or maybe 2) queries total.

packages/back-end/src/queryRunners/QueryRunner.ts Outdated Show resolved Hide resolved
@jdorn
Copy link
Member

jdorn commented May 14, 2024

One thing I can see happening is someone's experiment being stuck in a queued state because all of the slots are being hogged by a few super long running queries.

What would be really cool is to pair this with #2323 so you can easily see a list of all of your running queries and where they are coming from and maybe even kill them directly from that list.

@Kevin-Chant
Copy link
Contributor Author

One thing I can see happening is someone's experiment being stuck in a queued state because all of the slots are being hogged by a few super long running queries.

What would be really cool is to pair this with #2323 so you can easily see a list of all of your running queries and where they are coming from and maybe even kill them directly from that list.

This is one thing I was thinking about; didn't know that we have a page for that now! Do you think showing a link to that page while data is updating would do a good enough job at explaining what's happening?

@jdorn
Copy link
Member

jdorn commented May 14, 2024

Yeah, in AsyncQueriesModal, if there are any queries that are "queued", we can show an alert banner that links to the datasource queries list page.

The page isn't super useful right now (it doesn't link to the experiment that spawned the query and there's no way to cancel), but it's still helpful and we can improve it over time.

@Kevin-Chant Kevin-Chant requested a review from jdorn May 14, 2024 18:17
@Kevin-Chant
Copy link
Contributor Author

In startExperimentResultQueries, we currently start all queries at once and do a Promise.all. This is causing the race condition you mention in your PR description.

We can fix this fairly easily by changing it to await each promise before starting the next query.

It's really only the experiment query that has this issue since all of the other ones fire 1 (or maybe 2) queries total.

Makes sense. I tried rewriting the promise logic in ExperimentResultsQueryRunner but ran into some problems where the queries weren't being set up properly (erroring out with Run callbacks not found). I'm leaning towards just shipping it as-is since it seems unlikely that a single experiment will overload the db so we should get most of the benefit even with this edge case exceeding the concurrency limit. Does that sound good to you?

@Kevin-Chant Kevin-Chant force-pushed the kc/query-runner-concurrency-limit branch from ba84649 to 0651db1 Compare May 21, 2024 18:16
@@ -279,7 +279,7 @@ export const startExperimentResultQueries = async (
organization
);

const singlePromises = singles.map(async (m) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ES6 for loop format allows for awaiting within each iteration of the loop without going to the next, so we can drop the concurrency granularity to 1

Comment on lines +165 to +173
<div className="col-auto mb-2">
<em>Time queued</em>:{" "}
<strong>
{formatDistanceStrict(
getValidDate(query.createdAt),
getValidDate(query.startedAt)
)}
</strong>
</div>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely an optional addition but I found myself wanting to check the queued time after re-running the queries so I thought I'd toss it in here. Let me know whether this seems useful to keep or if I should get rid of it.

@Kevin-Chant
Copy link
Contributor Author

@jdorn I fixed the issue I had before & the queries now go one at a time. I updated the gifs with the new functionality and dropped the wait time by half because it's much more impactful now with a low concurrency limit. Should be ready for a review again once you're done with the 3.0 stuff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants