fix: Move ratio calculation for whether to use read API to avoid NPE with setUseReadAPI(false) #2509

jonathanswenson · 2023-02-03T07:13:20Z

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #2508 ☕️

conventional-commit-lint-gcf · 2023-02-03T07:13:25Z

🤖 I detect that the PR title and the commit message differ and there's only one commit. To use the PR title for the commit history, you can use Github's automerge feature with squashing, or use automerge label. Good luck human!

-- conventional-commit-lint bot
https://conventionalcommits.org/

…with setUseReadAPI(false) If a query takes longer than 10+ seconds, the returned job will not have results / totalRows thus totalRows will be null, However, if useReadAPI is false the moved line would throw a null pointer exception trying to unbox the (nullable -- and actually null) Long for division.

Neenu1995 · 2023-02-03T16:17:32Z

@prash-mi PTAL. Thank you .

prash-mi

Thanks for raising the PR.
NPE shouldn't ideally come as we had made getQueryResultFirstPage(...) a blocking call(Ref: https://github.com/googleapis/java-bigquery/blob/main/google-cloud-bigquery/src/main/java/com/google/cloud/bigquery/ConnectionImpl.java#L243), such that it should not return unless the job is complete, which implies that we must be getting totalRows and pageRows back.
Still we can use a null check IMO
Do we want to replace

if ((totalRows == null || pageRows == null)
        && Boolean.TRUE.equals(
            connectionSettings
                .getUseReadAPI())) { // totalRows and pageRows are returned null when the job is not
      // complete
      return true;
    }

with

if (totalRows == null || pageRows == null) { 
      return false;
    }

as, we can't check the ration if any one of these attributes are null . WDYT?

@prash-mi

To ensure that NPE is not thrown. Based on requested changes from @prash-mi

jonathanswenson · 2023-02-07T15:17:22Z

@prash-mi That sounds reasonable to me, I have made that additional change. Let me know what you think.

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

prash-mi

thanks for making the changes, LGTM

gcf-merge-on-green · 2023-03-13T13:00:20Z

Merge-on-green attempted to merge your PR for 6 hours, but it was not mergeable because either one of your required status checks failed, one of your required reviews was not approved, or there is a do not merge label. Learn more about your required status checks here: https://help.github.com/en/github/administering-a-repository/enabling-required-status-checks. You can remove and reapply the label to re-run the bot.

obada-ab · 2023-04-18T15:17:09Z

@jonathanswenson thanks for making the changes, it's better to add unit tests to the changes to ensure that the NPE would be avoided

obada-ab · 2023-04-28T11:57:07Z

@prash-mi there's a behaviour change as a side effect of this change that I'm worried about. Specifically, when either totalRows or pageRows are null (due to the job taking a long time) and setUseReadAPI(true) is used in the connection settings.

The current behaviour would be to use the Read API, but after introducing this fix such queries would always use tabledata.list.

If you believe that this change makes sense, then we can unit test and merge it. Otherwise maybe the following change might be reasonable:

    // Read API does not yet support Interval Type or QueryParameters
    if (containsIntervalType(schema) || hasQueryParameters) {
      logger.log(Level.INFO, "\n Schema has IntervalType, or QueryParameters. Disabling ReadAPI");
      return false;
    }

    if (totalRows == null || pageRows == null) {
      return connectionSettings.getUseReadAPI();
    }
    
    if (Boolean.TRUE.equals(connectionSettings.getUseReadAPI())) {
      long resultRatio = totalRows / pageRows;
      return resultRatio >= connectionSettings.getTotalToPageRowCountRatio()
          && totalRows > connectionSettings.getMinResultSize();
    } else {
      return false;
    }

This would avoid the NPE while maintaining the current behaviour.

Change useReadAPI to default to connectionSettings.getUseReadAPI() when either totalRows or pageRows are null (for long running queries). Also, move the Interval type/Queryparameters check earlier. Add unit testing.

obada-ab · 2023-05-02T17:16:20Z

Added mentioned change alongside unit testing. I don't think new integration tests are useful, but it might be worth it to run the benchmark queries to ensure long-running jobs are working fine, and that their execution time hasn't gotten worse. Will work on that.

prash-mi · 2023-05-03T13:57:23Z

@obada-ab I think returning connectionSettings.getUseReadAPI() if totalRows or pageRows is null is the right thing to do.
Thanks for making changes, LGTM!

obada-ab · 2023-05-04T10:04:35Z

I ran the benchmark queries locally against my test project. The third run of the queries in main gave these results:

query "SELECT * FROM `nyc-tlc.yellow.trips` LIMIT 10000": read 10000 rows, 19 cols, first byte 3.907000 sec, total 3.910000 sec
query "SELECT * FROM `nyc-tlc.yellow.trips` LIMIT 100000": read 100000 rows, 19 cols, first byte 5.695000 sec, total 25.436000 sec
query "SELECT * FROM `nyc-tlc.yellow.trips` LIMIT 1000000": read 1000000 rows, 19 cols, first byte 6.617000 sec, total 239.012000 sec
query "SELECT title FROM `bigquery-public-data.samples.wikipedia` ORDER BY title LIMIT 1000": read 1000 rows, 1 cols, first byte 0.664000 sec, total 0.664000 sec
query "SELECT title, id, timestamp, contributor_ip FROM `bigquery-public-data.samples.wikipedia` WHERE title like 'Blo%' ORDER BY id": read 193029 rows, 4 cols, first byte 7.252000 sec, total 13.599000 sec
query "SELECT * FROM `bigquery-public-data.baseball.games_post_wide` ORDER BY gameId": read 8676 rows, 145 cols, first byte 4.839000 sec, total 11.641000 sec
query "SELECT * FROM `bigquery-public-data.samples.github_nested` WHERE repository.has_downloads ORDER BY repository.created_at LIMIT 10000": read 10000 rows, 8 cols, first byte 7.865000 sec, total 12.966000 sec
query "SELECT repo_name, path FROM `bigquery-public-data.github_repos.files` WHERE path LIKE '%.java' ORDER BY id LIMIT 1000000": read 1000000 rows, 2 cols, first byte 12.123000 sec, total 110.380000 sec

The third run in fix-read-query-npe gave these results:

query "SELECT * FROM `nyc-tlc.yellow.trips` LIMIT 10000": read 10000 rows, 19 cols, first byte 4.089000 sec, total 4.094000 sec
query "SELECT * FROM `nyc-tlc.yellow.trips` LIMIT 100000": read 100000 rows, 19 cols, first byte 5.584000 sec, total 25.566000 sec
query "SELECT * FROM `nyc-tlc.yellow.trips` LIMIT 1000000": read 1000000 rows, 19 cols, first byte 5.847000 sec, total 238.485000 sec
query "SELECT title FROM `bigquery-public-data.samples.wikipedia` ORDER BY title LIMIT 1000": read 1000 rows, 1 cols, first byte 0.602000 sec, total 0.603000 sec
query "SELECT title, id, timestamp, contributor_ip FROM `bigquery-public-data.samples.wikipedia` WHERE title like 'Blo%' ORDER BY id": read 193029 rows, 4 cols, first byte 6.885000 sec, total 13.119000 sec
query "SELECT * FROM `bigquery-public-data.baseball.games_post_wide` ORDER BY gameId": read 8676 rows, 145 cols, first byte 4.057000 sec, total 10.648000 sec
query "SELECT * FROM `bigquery-public-data.samples.github_nested` WHERE repository.has_downloads ORDER BY repository.created_at LIMIT 10000": read 10000 rows, 8 cols, first byte 8.350000 sec, total 13.840000 sec
query "SELECT repo_name, path FROM `bigquery-public-data.github_repos.files` WHERE path LIKE '%.java' ORDER BY id LIMIT 1000000": read 1000000 rows, 2 cols, first byte 11.803000 sec, total 107.445000 sec

Nothing concerning IMO, the differences seem minor and are probably just due to variance between runs.

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

jonathanswenson requested review from a team and prash-mi February 3, 2023 07:13

product-auto-label bot added size: xs Pull request size is extra small. api: bigquery Issues related to the googleapis/java-bigquery API. labels Feb 3, 2023

jonathanswenson force-pushed the fix-read-query-npe branch from 8e11d07 to 88f6910 Compare February 3, 2023 07:16

jonathanswenson changed the title ~~Move ratio calculation for whether to use read API to avoid NPE with setUseReadAPI(false)~~ fix: Move ratio calculation for whether to use read API to avoid NPE with setUseReadAPI(false) Feb 3, 2023

prash-mi suggested changes Feb 6, 2023

View reviewed changes

Always exit early if either total rows or page rows is null

0e821ad

To ensure that NPE is not thrown. Based on requested changes from @prash-mi

product-auto-label bot added size: s Pull request size is small. and removed size: xs Pull request size is extra small. labels Feb 7, 2023

jonathanswenson requested a review from prash-mi February 13, 2023 21:45

prash-mi added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 13, 2023

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 13, 2023

prash-mi added the owlbot:run Add this label to trigger the Owlbot post processor. label Mar 13, 2023

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Mar 13, 2023

🦉 Updates from OwlBot post-processor

d6a16ed

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

gcf-owl-bot bot requested a review from a team as a code owner March 13, 2023 05:08

prash-mi approved these changes Mar 13, 2023

View reviewed changes

prash-mi added the automerge Merge the pull request once unit tests and other checks pass. label Mar 13, 2023

gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Mar 13, 2023

Neenu1995 added kokoro:force-run Add this label to force Kokoro to re-run the tests. and removed kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Apr 17, 2023

fix: use connectionSettings.getUseReadAPI() when row values are null

4d67a8f

Change useReadAPI to default to connectionSettings.getUseReadAPI() when either totalRows or pageRows are null (for long running queries). Also, move the Interval type/Queryparameters check earlier. Add unit testing.

obada-ab requested a review from a team as a code owner May 2, 2023 17:11

product-auto-label bot added size: m Pull request size is medium. and removed size: s Pull request size is small. labels May 2, 2023

Merge branch 'main' into fix-read-query-npe

95f6c28

product-auto-label bot added size: s Pull request size is small. and removed size: m Pull request size is medium. labels May 3, 2023

refactor: fix formatting for ConnectionImplTest.java

73e683d

obada-ab approved these changes May 4, 2023

View reviewed changes

obada-ab added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 4, 2023

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 4, 2023

obada-ab added the owlbot:run Add this label to trigger the Owlbot post processor. label May 4, 2023

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label May 4, 2023

🦉 Updates from OwlBot post-processor

8dee8e8

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

obada-ab added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 4, 2023

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 4, 2023

obada-ab added the automerge Merge the pull request once unit tests and other checks pass. label May 4, 2023

gcf-merge-on-green bot merged commit e1326c8 into googleapis:main May 4, 2023
17 checks passed

gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label May 4, 2023

release-please bot mentioned this pull request May 4, 2023

chore(main): release 2.26.0 #2680

Merged

This was referenced May 23, 2023

May 22, 2023 kitta65/bq-extension-vscode#190

Closed

May 22, 2023 kitta65/prettier-plugin-bq#197

Closed

May 22, 2023 kitta65/bq2cst#209

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Move ratio calculation for whether to use read API to avoid NPE with setUseReadAPI(false) #2509

fix: Move ratio calculation for whether to use read API to avoid NPE with setUseReadAPI(false) #2509

jonathanswenson commented Feb 3, 2023

conventional-commit-lint-gcf bot commented Feb 3, 2023 •

edited

Neenu1995 commented Feb 3, 2023

prash-mi left a comment •

edited

jonathanswenson commented Feb 7, 2023

prash-mi left a comment

gcf-merge-on-green bot commented Mar 13, 2023

obada-ab commented Apr 18, 2023

obada-ab commented Apr 28, 2023

obada-ab commented May 2, 2023

prash-mi commented May 3, 2023

obada-ab commented May 4, 2023

fix: Move ratio calculation for whether to use read API to avoid NPE with setUseReadAPI(false) #2509

fix: Move ratio calculation for whether to use read API to avoid NPE with setUseReadAPI(false) #2509

Conversation

jonathanswenson commented Feb 3, 2023

conventional-commit-lint-gcf bot commented Feb 3, 2023 • edited

Neenu1995 commented Feb 3, 2023

prash-mi left a comment • edited

Choose a reason for hiding this comment

jonathanswenson commented Feb 7, 2023

prash-mi left a comment

Choose a reason for hiding this comment

gcf-merge-on-green bot commented Mar 13, 2023

obada-ab commented Apr 18, 2023

obada-ab commented Apr 28, 2023

obada-ab commented May 2, 2023

prash-mi commented May 3, 2023

obada-ab commented May 4, 2023

conventional-commit-lint-gcf bot commented Feb 3, 2023 •

edited

prash-mi left a comment •

edited