docs: set `index_cols` in `read_gbq` as a best practice #624

tswast · 2024-04-19T21:38:16Z

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Towards internal issue 335727141
🦕

shobsi · 2024-04-22T18:02:44Z

third_party/bigframes_vendored/pandas/io/gbq.py

+        the default index uses an analytic windowed operation that prevents
+        many filtering push down operations. As a best practice, set the
+        ``index_col`` argument to one or more columns, especially on large
+        tables.


Should we make it clearer that setting any arbitrary column as index is not necessarily helpful, and the best practice is to use a column with unique values? (In other words, the following sentence is actually the best practice.)

Added some caveats for non-unique index. Thanks!

docs: set index_cols in read_gbq as a best practice

Loading
Loading status checks…

0ddd86b

tswast requested review from a team as code owners April 19, 2024 21:38

tswast requested a review from shobsi April 19, 2024 21:38

blunderbuss-gcf bot assigned shobsi Apr 19, 2024

product-auto-label bot added size: s api: bigquery labels Apr 19, 2024

shobsi approved these changes Apr 22, 2024

View reviewed changes

shobsi reviewed Apr 22, 2024

View reviewed changes

tswast added 2 commits April 22, 2024 18:52

document behaviors

Loading
Loading status checks…

b96cba3

Merge branch 'main' into b335727141-docs

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

Loading
Loading status checks…

3d70f76

tswast enabled auto-merge (squash) April 22, 2024 18:54

tswast merged commit 70015b7 into main Apr 22, 2024
15 of 16 checks passed

tswast deleted the b335727141-docs branch April 22, 2024 20:05

release-please bot mentioned this pull request Apr 22, 2024

chore(main): release 1.3.0 #617

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: set `index_cols` in `read_gbq` as a best practice #624

docs: set `index_cols` in `read_gbq` as a best practice #624

tswast commented Apr 19, 2024

shobsi Apr 22, 2024 •

edited

Loading

tswast Apr 22, 2024

docs: set index_cols in read_gbq as a best practice #624

docs: set index_cols in read_gbq as a best practice #624

Conversation

tswast commented Apr 19, 2024

shobsi Apr 22, 2024 • edited Loading

Choose a reason for hiding this comment

tswast Apr 22, 2024

Choose a reason for hiding this comment

docs: set `index_cols` in `read_gbq` as a best practice #624

docs: set `index_cols` in `read_gbq` as a best practice #624

shobsi Apr 22, 2024 •

edited

Loading