Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Release v2.0.0 #64

Merged
merged 10 commits into from
Sep 29, 2020
Merged

chore: Release v2.0.0 #64

merged 10 commits into from
Sep 29, 2020

Conversation

plamut
Copy link
Contributor

@plamut plamut commented Sep 24, 2020

This pull request was generated using releasetool.

@plamut plamut requested a review from a team September 24, 2020 15:23
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Sep 24, 2020
CHANGELOG.md Show resolved Hide resolved
@plamut
Copy link
Contributor Author

plamut commented Sep 24, 2020

Three issues causing the samples to fail:

  • The PROJECT_ID env variable not set on Kokoro.
  • As for some other samples, IPython tests fail because BigQuery itself needs to be updated - it contains some compatibility code for v1beta1 client (see Update 1 below), but that does not exist anymore. We also cannot fix and release BigQuery first, because such breaking change would have to be released as v2.0, but that would require us to first release BQ Storage v2.0.0 (this very PR).
  • Some weird namespace behavior (see Update 2 below). A workaround is easy, but it's preferable that it would not be needed.

Update:
Commenting out references to v1beta1 in BigQuery and using that modified local version helps, to_dataframe sample tests pass.

Update 2:
As for the "cell magic not found" failure, we first need to import google.cloud.bigquery.storage in the tests that use IPython magic, which somehow triggers populating the google.cloud.bigquery namespace (the latter contains the cell magic code).
Additionally, the channel closing snippet in BigQuery's magics.py needs to be adjusted to bqstorage_client._transport.grpc_channel.close().

I don't yet know what the deal with that namespace import is - google.cloud.bigquery is a namespace, but after importing google.cloud.bigquery.storage, google.cloud.bigquery becomes a normal module populated with all the stuff one would expect from the BigQuery library.

>>> from google.cloud import bigquery
>>> bigquery
<module 'google.cloud.bigquery' (namespace)>
>>> from google.cloud.bigquery import storage
>>> bigquery
<module 'google.cloud.bigquery' from '/home/peter/workspace/bqs-3.8/lib/python3.8/site-packages/google/cloud/bigquery/__init__.py'>

@busunkim96
Copy link
Contributor

busunkim96 commented Sep 24, 2020

Re namespace:

The theory you raised in the chat thread seems to be correct. It looks like this is a result of google.cloud.bigquery being declared as a namespace package in the setup.py.

For comparison google-cloud-bigquery-connection is microgen'd but only declares google.cloud and google as namespace packages. I don't see the issue with that library.

https://github.com/googleapis/python-bigquery-connection/blob/1a5e7b868a239a92727b590dcd458f771c34b33a/setup.py#L39

python3 -m pip install google-cloud-bigquery google-cloud-bigquery-connection
(env) busunkim@busunkim:~/github/python-workflows$ python3
Python 3.8.3 (default, Jun 15 2020, 16:29:21)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from google.cloud import bigquery
>>> bigquery
<module 'google.cloud.bigquery' from '/usr/local/google/home/busunkim/github/python-workflows/env/lib/python3.8/site-packages/google/cloud/bigquery/__init__.py'>
>>> from google.cloud.bigquery import connection
>>> bigquery
<module 'google.cloud.bigquery' from '/usr/local/google/home/busunkim/github/python-workflows/env/lib/python3.8/site-packages/google/cloud/bigquery/__init__.py'>
>>> connection
<module 'google.cloud.bigquery.connection' from '/usr/local/google/home/busunkim/github/python-workflows/env/lib/python3.8/site-packages/google/cloud/bigquery/connection/__init__.py'>

@plamut
Copy link
Contributor Author

plamut commented Sep 24, 2020

Now I remember, the same question popped up when migrating google-cloud-bigquery-datatransfer.

If we want to use the google.cloud.bigquery.storage namespace, we need to add google.cloud.bigquery to the list of namespace packages, but is there a good way to do this without breaking google.cloud.bigquery? (or BigQuery breaking google.cloud.bigquery.* packages)

@tswast came up with the idea of moving BigQuery 2.0 code into google.cloud.bigquery.core namespace - thoughts?

@tswast
Copy link
Contributor

tswast commented Sep 24, 2020

@plamut Thanks for digging through and reproducing the issue.

Ways forward:

  1. Make a google.cloud.bigquery a namespace package everywhere. This would involve deprecating google.cloud.bigquery in google-cloud-bigquery and moving that code to an inner package (e.g. google.cloud.bigquery.core)

Pro (1):

  • Consistent with how the BigQuery team is using the BigQuery namespace in protos.
  • A cursory check doesn't reproduce the issue, so if new BigQuery services are released, it's hard to push to include manual changes to rewrite packages.
  • Any time there is an update to a BigQuery service, it'll require rewriting package names.

Con (1):

  • There are way more direct users of google-cloud-bigquery than the other packages. Making a package rename there will be much more disruptive than one on the other packages.

or

  1. Deprecate google.cloud.bigquery.storage, google.cloud.bigquery.datatransfer, google.cloud.bigquery.connections, and google.cloud.bigquery.reservation

Pro (2):

  • Avoid package rename in google-cloud-bigquery library.

Con (2):

  • May require some v3.0 major version bumps, wish is a pretty fast churn for anyone that adopted 2.0 (esp on BQ Data Transfer)

@tswast
Copy link
Contributor

tswast commented Sep 24, 2020

Edit I meant that I lean towards (2) -- rename the less-used packages.

@busunkim96
Copy link
Contributor

I'd also lean towards changing the namespace of the less used packages. The two problematic packages connection and reservation are both marked as betas and are fairly new. datatransfer and storage are older and don't use bigquery as a namespace: google.cloud.bigquery_datatransfer and google.cloud.bigquery_storage.

@plamut
Copy link
Contributor Author

plamut commented Sep 24, 2020

Namespace packages can only contain modules and subpackages, but no content on their own (setuptools docs). Namespace package's __init__.py should not contain any other code besides namespace declaration.

If we want google.cloud.bigquery to be a namespace, then we cannot use its own contents (imported in the corresponding __init__.py). If we want to use its contents, google.cloud.bigquery cannot be a namespace, but then importing from google.cloud.bigquery.storage will not work anymore if BigQuery and BigQuery Storage are installed side by side (not sure why, as google.cloud.bigquery.connection works just fine 😕 ).

I can adjust the import paths in BQ Storage (yet again) to google.cloud.bigquery_storage and ditch the google.cloud.bigquery namespace, but I probably won't have time to do the same in other bigquery.* libraries.

Update:
The only notable difference from google-cloud-bigquery-connection that I could found was that I was testing the imports with local library versions installed in editable mode. When I tried installing the same local package versions in normal (non-editable) mode, the imports worked just fine, and that was without declaring the google.cloud.bigquery namespace.

BTW, is INSTALL_LIBRARY_FROM_SOURCE enabled by default on Kokoro for samples tests? (I could not see it in the build logs)

@product-auto-label product-auto-label bot added the api: bigquerystorage Issues related to the googleapis/python-bigquery-storage API. label Sep 25, 2020
@plamut
Copy link
Contributor Author

plamut commented Sep 25, 2020

TL;DR - I managed to get the samples tests pass locally without restructuring the import hacks and without dubious hacks. If somebody can verify these fix(es) locally, we can probably release BQ Storage 2.0 with broken samples and then immediately follow with a BigQuery 2.0 release that will fix them.

The steps that had to be made:

  1. Do not add google.cloud.bigquery to namespace packages in setup.py (to avoid problems mentioned above):

    diff --git setup.py setup.py
    index 4167081..128b824 100644
    --- setup.py
    +++ setup.py
    @@ -50,9 +50,6 @@ namespaces = ["google"]
     if "google.cloud" in packages:
         namespaces.append("google.cloud")
    
    -if "google.cloud.bigquery" in packages:
    -    namespaces.append("google.cloud.bigquery")
    -
     setuptools.setup(
         name=name,
         version=version,
  2. Change quickstart sample test to expect the "standard" GOOGLE_CLOUD_PROJECT env variable instead of PROJECT_ID (it seems that for the samples tests, the CI environment provides the GOOGLE_CLOUD_PROJECT variable).

  3. Install samples dependencies as non-editable packages. This means unsetting the INSTALL_LIBRARY_FROM_SOURCE env variable (the default is False anyway).

  4. Remove v1beta1 compatibility code from BigQuery (it is removed in BQ Storage 2.0). Temporarily change requirements.txt to use the cutting edge versions of google-cloud-bigquery and google-cloud-bigquery-storage in non-editable mode:

    diff --git samples/to_dataframe/requirements.txt samples/to_dataframe/requirements.txt
    index b5ac617..d35d645 100644
    --- samples/to_dataframe/requirements.txt
    +++ samples/to_dataframe/requirements.txt
    @@ -1,6 +1,8 @@
    google-auth==1.21.0
    -google-cloud-bigquery-storage==1.0.0
    -google-cloud-bigquery==1.25.0
    +/path/to/local/python-bigquery-storage
    +/path/to/local/python-bigquery
    pyarrow==1.0.1
    ipython==7.10.2; python_version > '3.0'
    ipython==5.9.0; python_version < '3.0'

    Of course, I had to use local paths, because google-cloud-bigquery==2.0.0 and google-cloud-bigquery-storage==2.0.0 are not yet released. We can change this in requirements.txt once both libraries are released. If we release them at approximately the same time, the BQ Storage samples will not be broken for long.

I added the changes 1) and 2) to this release PR., The fix for 4) in the python-bigquery repo can be fetched from the BigQuery 2.0 PR branch.


UPDATE
Trying to make the imports from google.cloud.bigquery.* work in all cases turned out to be too problematic in the BigQuery library, lots of import errors in tests no matter what I tried. We should not really be dealing with such problems, thus I agree that changing the other BQ libraries' namespaces to google.cloud.bigquery_libname is the most straightforward way to proceed.

storage has not even been released yet, and datatransfer v2.0 has only been out for a week or so - if we make a change, it's better to do it sooner than later. I'll update this PR accordingly.

@plamut plamut requested a review from a team as a code owner September 25, 2020 09:56
This avoids import errors from google.cloud.bigquery.* namespace.
@plamut plamut added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Sep 25, 2020
@plamut plamut added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 27, 2020
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 27, 2020
@plamut
Copy link
Contributor Author

plamut commented Sep 27, 2020

Namespacing:
Considering all the problems with google.cloud.bigquery.storage namespace, I re-generated the library to use google.cloud.bigquery_storage namespace and adjusted the code.

Bazel config will have to be updated, py_gapic_library() should have an additional argument:

opt_args = ["python-gapic-namespace=google.cloud", "python-gapic-name=bigquery_storage"]

@busunkim96 (or someone else), can you do that internally? Thanks!

Samples:
The samples tests still fail on Kokoro, but that's because they do not use the very latest (development) versions of BQ and BQ Storage. If ran with these versions (see previous comment), they pass locally. Once we make both 2.0 releases, they should pass again.

FWIW, the BigQuery adjustments that are needed for the BQ Storage samples to pass can be found in the BigQuery 2.0 preview PR (first commit).

@plamut plamut removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Sep 27, 2020
@tswast
Copy link
Contributor

tswast commented Sep 29, 2020

BTW, is INSTALL_LIBRARY_FROM_SOURCE enabled by default on Kokoro for samples tests? (I could not see it in the build logs)

It's in the "allowed" environment variables: value must match regex (True)|(False)

I see that it's set here: https://github.com/googleapis/python-bigquery-storage/blob/master/.kokoro/samples/python3.8/presubmit.cfg

but from the build failures, I'm not sure that it's actually getting set.

@busunkim96
Copy link
Contributor

The logs seem to show bigquery storage being installed from source. (see the last line)

nox > Running session py-3.8
nox > Creating virtual environment (virtualenv) using python3.8 in .nox/py-3-8
nox > pip install -r requirements.txt
nox > pip install -r requirements-test.txt
nox > pip install -e /tmpfs/src/github/python-bigquery-storage.

@plamut
Copy link
Contributor Author

plamut commented Sep 29, 2020

Merging. The samples test pass locally, but they require the upcoming version of BigQuery, which has not yet been merged and released.

@plamut plamut merged commit 6254bf2 into googleapis:master Sep 29, 2020
@yoshi-automation
Copy link
Contributor

@release-tool-publish-reporter

The release build has started, the log can be viewed here. 🌻

@release-tool-publish-reporter

🥚 You hatched a release! The release build finished successfully! 💜

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquerystorage Issues related to the googleapis/python-bigquery-storage API. autorelease: published cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants