fix: table-not-found issue with executeSelect while running long queries #2222

prash-mi · 2022-08-12T09:47:26Z

Internal Bug's Ref: b/241134681 .

table-not-found issue has been observed with executeSelect while running long queries. This issue comes while initialising storage read session when the query job is not complete.

This is a short term fix where we are polling the job's status using jobs.getQueryResults and the session with read API is initialised when the job is complete, thus avoiding table-not-found.

Capturing this FR for the long term fix/re-design: #2240

Ref: go/executeselect-re-design

…found error

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

…n table_not_found

…elect-fix

# Conflicts: # README.md

shollyman · 2022-08-19T17:09:50Z

google-cloud-bigquery/src/main/java/com/google/cloud/bigquery/ConnectionImpl.java

+    // for getQueryResults) per iteration of the loop
+    long startTimeMs = System.currentTimeMillis();
+    long totalTimeOutMs = 18 * 60 * 60 * 1000; // 18 hours, which is the max timeout for the job
+    long poolingIntervalMs = 60000; // 1 min


nit: s/pooling/polling/

Thanks for point it out, I have update it

shollyman · 2022-08-19T17:11:25Z

google-cloud-bigquery/src/main/java/com/google/cloud/bigquery/ConnectionImpl.java

+    // This logic will wait for approx (poolingIntervalMs + 10 seconds which is the default timeout
+    // for getQueryResults) per iteration of the loop
+    long startTimeMs = System.currentTimeMillis();
+    long totalTimeOutMs = 18 * 60 * 60 * 1000; // 18 hours, which is the max timeout for the job


I think it's reasonable to leave the totalTimeoutMs and assorted logic out of this PR. You may in the future want to add a user configurable timeout, but per-rpc timeouts are sufficient.

BQ guarantees that jobs make forward progress (a job won't get stuck in pending forever). This interval is so long you may as well just trust the system.

sure. I have updated the logic and have added a timeoutMs param at the RPC layer which is currently hardcoded to 60 seconds. QQ, in the very corner case when the job runs for 18 hours and is still not-complete then will the backend throw and error? Otherwise it will be an infinite loop as jobComplete will never be true.
Also, I have captured task to make timeoutMs user configurable here: #2240 (point#6)

shollyman · 2022-08-19T17:17:13Z

google-cloud-bigquery/src/main/java/com/google/cloud/bigquery/ConnectionImpl.java

+
+      } else { // wait for the defined poolingIntervalMs and the loop will retry
+        try {
+          Thread.sleep(poolingIntervalMs);


The behavior here seems wrong. getQueryResults will poll from the server side for up to 10 seconds, and then you sleep for 60 in the client? Effective latency for an 11 sec query becomes more than a minute.

You should not wait on the client side at all, just re-issue a subsequent getQueryResults call. Allow the poll and wait to happen within BQ.

Done, updated to logic to use RCP based poll and wait

shollyman

Thanks for getting all this together!

prash-mi added 2 commits August 12, 2022 13:13

Added exponential-back-off to create read session to avoid table-not-…

bbadcdb

…found error

Added testForTableNotFound IT

a965bc3

prash-mi requested review from a team and loferris August 12, 2022 09:47

product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/java-bigquery API. labels Aug 12, 2022

prash-mi added the owlbot:run Add this label to trigger the Owlbot post processor. label Aug 12, 2022

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Aug 12, 2022

prash-mi assigned shollyman Aug 12, 2022

🦉 Updates from OwlBot post-processor

b16e9f9

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

prash-mi requested a review from a team as a code owner August 12, 2022 10:56

prash-mi added 4 commits August 19, 2022 14:09

Set recordCnt to 5Mil

6816db8

Add polling logic @ getQueryResultsFirstPage, Removed retrial logic o…

2a15e2b

…n table_not_found

Merge remote-tracking branch 'origin/executeselect-fix' into executes…

604f496

…elect-fix

Merge branch 'master' into executeselect-fix

9c30d4e

# Conflicts: # README.md

prash-mi added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 19, 2022

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 19, 2022

prash-mi mentioned this pull request Aug 19, 2022

Long term fix & redesign of executeSelect #2240

Closed

prash-mi added the owlbot:run Add this label to trigger the Owlbot post processor. label Aug 19, 2022

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Aug 19, 2022

Removed getTableNotFoundRetrySettings

1ba5a0d

prash-mi requested review from Neenu1995 and shollyman August 19, 2022 11:14

Neenu1995 approved these changes Aug 19, 2022

View reviewed changes

shollyman reviewed Aug 19, 2022

View reviewed changes

prash-mi added 5 commits August 22, 2022 10:14

Updated getQueryResultsWithRowLimit - Added timeoutMs param

9226bf4

Updated testGetQueryResultsFirstPage

c09ad6a

Updated getQueryResultsWithRowLimit - Add timeoutMs

e3f9d1f

Updated getQueryResultsFirstPage - Modified polling logic and refactor

ac13c99

Removed prev differences. Add getQueryResultsWithRowLimit

7398531

Removed prev differences. Add getQueryResultsWithRowLimit

fec3c59

prash-mi added the owlbot:run Add this label to trigger the Owlbot post processor. label Aug 22, 2022

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Aug 22, 2022

shollyman approved these changes Aug 22, 2022

View reviewed changes

prash-mi merged commit 4876569 into googleapis:main Aug 23, 2022

prash-mi deleted the executeselect-fix branch August 23, 2022 02:14

release-please bot mentioned this pull request Aug 23, 2022

chore(main): release 2.14.7 #2245

Merged

This was referenced Aug 24, 2022

bigquery.it.ITNightlyBigQueryTest: testPositionalParams failed #2200

Closed

bigquery.it.ITNightlyBigQueryTest: testForTableNotFound failed #2248

Closed

This was referenced Aug 30, 2022

[BQ] August 29, 2022 kitta65/bq-extension-vscode#88

Closed

[BQ] August 29, 2022 kitta65/prettier-plugin-bq#93

Closed

[BQ] August 29, 2022 kitta65/bq2cst#100

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: table-not-found issue with executeSelect while running long queries #2222

fix: table-not-found issue with executeSelect while running long queries #2222

prash-mi commented Aug 12, 2022 •

edited

shollyman Aug 19, 2022

prash-mi Aug 22, 2022

shollyman Aug 19, 2022

prash-mi Aug 22, 2022 •

edited

shollyman Aug 19, 2022

prash-mi Aug 22, 2022

shollyman left a comment

fix: table-not-found issue with executeSelect while running long queries #2222

fix: table-not-found issue with executeSelect while running long queries #2222

Conversation

prash-mi commented Aug 12, 2022 • edited

shollyman Aug 19, 2022

Choose a reason for hiding this comment

prash-mi Aug 22, 2022

Choose a reason for hiding this comment

shollyman Aug 19, 2022

Choose a reason for hiding this comment

prash-mi Aug 22, 2022 • edited

Choose a reason for hiding this comment

shollyman Aug 19, 2022

Choose a reason for hiding this comment

prash-mi Aug 22, 2022

Choose a reason for hiding this comment

shollyman left a comment

Choose a reason for hiding this comment

prash-mi commented Aug 12, 2022 •

edited

prash-mi Aug 22, 2022 •

edited