Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Rewinding a subgraph causes a constraint violation in graph-node that in turn causes indexer-agent to crashloop #5316

Open
1 of 3 tasks
cryptovestor21 opened this issue Apr 4, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@cryptovestor21
Copy link

cryptovestor21 commented Apr 4, 2024

Bug report

graph-node:v0.34.1
indexer-agent:v0.20.22

Activities that were undertaken before observing this bug:

  1. Cleared call_cache for Arbitrum as part of a complex subgraph sync performance troubleshooting exercise via psql
  2. Rewound a specific problematic subgraph, Silo Finance Arbitrum, QmTMKqty5yZvZtB3SwzXUG92aZUH1YQw3VjByGw4wgaMhW to block 1 using graphman
  3. Observed the above subgraph syncing to ~130m blocks, then stalled.
  4. Checked graph-node logs and found related error (see log output)
  5. Observed indexer-agent complaining about same issue and crash looping - cannot use the agent at all right now to manage subgraphs (see log output)

IMPACT: Production Indexer at risk; we cannot manage our online and offline allocations while we have this issue - ideally need a temp fix for the specific symptoms. Would graphman drop resolve the issue? Would the graph-node and indexer-agent be able to handle that and start syncing the sub again from scratch given this is a subgraph in flight with live allocations?

Relevant log output

----- GRAPH-NODE
Apr 04 13:24:34.037 ERRO Subgraph instance failed to run: internal constraint violated: Subgraph writer for QmTMKqty5yZvZtB3SwzXUG92aZUH1YQw3VjByGw4wgaMhW[sgd622] is not running, sgd: 622, subgraph_id: QmTMKqty5yZvZtB3SwzXUG92aZUH1YQw3VjByGw4wgaMhW, component: SubgraphInstanceManager
Apr 04 13:48:18.741 WARN Price provider Removed: 0x8dca64a43865454f41aa1a3cf0140eb89f2c08aa53871235ecbe46b6a309a1e3, data_source: PriceProvidersRepository, sgd: 622, subgraph_id: QmTMKqty5yZvZtB3SwzXUG92aZUH1YQw3VjByGw4wgaMhW, component: SubgraphInstanceManager > UserMapping
Apr 04 13:48:18.742 ERRO Oracle was not found when trying to remove it at txn: 0x8dca64a43865454f41aa1a3cf0140eb89f2c08aa53871235ecbe46b6a309a1e3, data_source: PriceProvidersRepository, sgd: 622, subgraph_id: QmTMKqty5yZvZtB3SwzXUG92aZUH1YQw3VjByGw4wgaMhW, component: SubgraphInstanceManager > UserMapping

----- INDEXER-AGENT
{"level":50,"time":1712241706593,"pid":1,"hostname":"268ad9e1400b","name":"IndexerAgent","component":"GraphNode","err":{"type":"IndexerError","message":"Failed to query indexing status API","stack":"IndexerError: Failed to query indexing status API\n    at indexerError (/opt/indexer/packages/indexer-common/dist/errors.js:173:12)\n    at GraphNode.<anonymous> (/opt/indexer/packages/indexer-common/dist/graph-node.js:146:55)\n    at Generator.next (<anonymous>)\n    at fulfilled (/opt/indexer/packages/indexer-common/dist/graph-node.js:5:58)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)","code":"IE018","explanation":"https://github.com/graphprotocol/indexer/blob/main/docs/errors.md#ie018","cause":{"type":"CombinedError","message":"[GraphQL] Store error: internal constraint violated: the entityCount for QmTMKqty5yZvZtB3SwzXUG92aZUH1YQw3VjByGw4wgaMhW is not representable as a u64","name":"CombinedError","graphQLErrors":[{"message":"Store error: internal constraint violated: the entityCount for QmTMKqty5yZvZtB3SwzXUG92aZUH1YQw3VjByGw4wgaMhW is not representable as a u64"}],"response":{"size":0,"timeout":0}}},"msg":"Failed to query indexing status API"}

IPFS hash

No response

Subgraph name or link to explorer

https://thegraph.com/explorer/subgraphs/2ufoztRpybsgogPVW6j9NTn1JmBWFYPKbP7pAabizADU?view=Overview&chain=arbitrum-one

Some information to help us out

  • Tick this box if this bug is caused by a regression found in the latest release.
  • Tick this box if this bug is specific to the hosted service.
  • I have searched the issue tracker to make sure this issue is not a duplicate.

OS information

Linux

@cryptovestor21 cryptovestor21 added the bug Something isn't working label Apr 4, 2024
@leoyvens
Copy link
Collaborator

leoyvens commented Apr 4, 2024

the entityCount for QmTMKqty5yZvZtB3SwzXUG92aZUH1YQw3VjByGw4wgaMhW is not representable as a u64

Maybe the rewind somehow turned the entity count negative. Which is a bug of course.

@trader-payne
Copy link

@leoyvens I think the problem was coming from that rewind to block 1 when the startblock was actually 51880000
That means the graphnode doesn't handle that scenario, and it created all that chaos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants