Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] High subgraph query throughput causes excessive traffic on the database primary node #5336

Open
1 of 3 tasks
TPXP opened this issue Apr 11, 2024 · 0 comments
Open
1 of 3 tasks
Labels
bug Something isn't working

Comments

@TPXP
Copy link

TPXP commented Apr 11, 2024

Bug report

Hello,

We have deployed TheGraph internally and are using it as the blockchain data source for some of our services. It works great, thanks a lot for building such a great product! Lately, we've been scaling up our infrastructure to answer more queries. A part of this effort included adding a read replica to lower the load on our primary database instance (the goal is to have all query nodes get data from the read replicas, so that the primary instance can focus on indexing tasks). Configuration went pretty smoothly after reading the docs and we are indeed seeing some traffic targeting our read replicas. However, we're still seeing surprisingly high usage on our primary database instance, and this usage increases as we get more queries (while we only expect read replicas to be impacted since this is querying data that does not need to perform any writes).

Some analysis of the issue with Performance Insights reveals an elevated rate of queries on the primary instance that seem to correlate with incoming queries given the high throughput. We think these should be targeted to read replicas to scale up properly. Here are some queries along with their rate and aggregated CPU usage metric (10% means the primary instance spends 10% of all its CPU time processing queries like this).

CPU % Rate (calls/s) SQL
42 268.64 SELECT "chain1"."blocks"."number", coalesce(data->'block'->>'timestamp', data->>'timestamp') FROM "chain1"."blocks" WHERE "chain1"."blocks"."hash" = $1 LIMIT $2
11 1348.71 SELECT 1
6 268.64 SELECT EXISTS (SELECT "subgraphs"."subgraph_error"."vid", "subgraphs"."subgraph_error"."id", "subgraphs"."subgraph_error"."subgraph_id", "subgraphs"."subgraph_error"."message", "subgraphs"."subgraph_error"."block_hash", "subgraphs"."subgraph_error"."handler", "subgraphs"."subgraph_error"."deterministic", "subgraphs"."subgraph_error"."block_range" FROM "subgraphs"."subgraph_error" WHERE "subgraphs"."subgraph_error"."subgraph_id" = $1 AND "subgraphs"."subgraph_error"."deterministic" AND block_range @> $2)
4 537.27 SELECT "subgraphs"."subgraph_deployment"."deployment", "subgraphs"."subgraph_deployment"."reorg_count", "subgraphs"."subgraph_deployment"."max_reorg_depth", "subgraphs"."subgraph_deployment"."latest_ethereum_block_number", "subgraphs"."subgraph_deployment"."latest_ethereum_block_hash", "subgraphs"."subgraph_deployment"."earliest_block_number" FROM "subgraphs"."subgraph_deployment" WHERE "subgraphs"."subgraph_deployment"."deployment" = $1 LIMIT $2
2 268.65 SELECT "subgraphs"."subgraph_version"."deployment" FROM ("subgraphs"."subgraph_version" INNER JOIN "subgraphs"."subgraph" ON "subgraphs"."subgraph"."current_version" = "subgraphs"."subgraph_version"."id") WHERE "subgraphs"."subgraph"."name" = $1 LIMIT $2

For reference, at the time of the measurement, we were processing approximately 270 reqs/s.

Besides directing these queries to the database read replicas, caching some of their results for a few seconds seems sensible as well and should not have a noticeable impact on the behaviour of thegraph.

Thanks in advance for your help

Relevant log output

No response

IPFS hash

No response

Subgraph name or link to explorer

No response

Some information to help us out

  • Tick this box if this bug is caused by a regression found in the latest release.
  • Tick this box if this bug is specific to the hosted service.
  • I have searched the issue tracker to make sure this issue is not a duplicate.

OS information

Deployed version: 0.33.0
Linux

@TPXP TPXP added the bug Something isn't working label Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant