Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: googleapis/python-bigquery-dataframes
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.12.0
Choose a base ref
...
head repository: googleapis/python-bigquery-dataframes
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v0.13.0
Choose a head ref
  • 10 commits
  • 26 files changed
  • 7 contributors

Commits on Nov 1, 2023

  1. test: add code snippets for using bigframes.ml (#159)

    * test: add code snippets for using bigframes.ml
    ashleyxuu authored Nov 1, 2023

    Verified

    This commit was signed with the committer’s verified signature.
    andrewhampton Andrew Hampton
    Copy the full SHA
    3d7a0d6 View commit details

Commits on Nov 2, 2023

  1. Copy the full SHA
    b9cb55c View commit details
  2. feat: to_gbq without a destination table writes to a temporary table (

    #158)
    
    * feat: `to_gbq` without a destination table writes to a temporary table
    
    * add unit test covering happy path for to_gbq
    
    * update to_gbq docs
    tswast authored Nov 2, 2023
    Copy the full SHA
    e1817c9 View commit details
  3. feat: support 32k text-generation and multilingual embedding models (#…

    …161)
    
    * feat: support 32k text-generation and embedding multilingual models
    ashleyxuu authored Nov 2, 2023
    Copy the full SHA
    5f0ea37 View commit details

Commits on Nov 3, 2023

  1. chore: update docfx minimum Python version (#167)

    * chore: update docfx minimum Python version
    
    Source-Link: googleapis/synthtool@bc07fd4
    Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-python:latest@sha256:30470597773378105e239b59fce8eb27cc97375580d592699206d17d117143d0
    
    * chore: remove restriction on noxfile.py
    
    ---------
    
    Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
    Co-authored-by: Dan Lee <71398022+dandhlee@users.noreply.github.com>
    3 people authored Nov 3, 2023
    Copy the full SHA
    6d1953b View commit details
  2. fix: use table clone instead of system time for read_gbq_table (#109)

    * fix: use table clone instead of system time for `read_gbq_table`
    
    * accept expiration datetime instead of timedelta for easier testing
    
    * don't use table clone on _session tables
    
    * remove unnecessary assert
    
    * add docstrings
    tswast authored Nov 3, 2023
    Copy the full SHA
    031f253 View commit details
  3. feat: add __iter__, iterrows, itertuples, keys methods (#164)

    Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
    - [ ] Make sure to open an issue as a [bug/issue](https://togithub.com/googleapis/python-bigquery-dataframes/issues/new/choose) before writing your code!  That way we can discuss the change, evaluate designs, and agree on the general idea
    - [ ] Ensure the tests and linter pass
    - [ ] Code coverage does not decrease (if any source code was changed)
    - [ ] Appropriate docs were updated (if necessary)
    
    Fixes #<issue_number_goes_here> 🦕
    TrevorBergeron authored Nov 3, 2023
    Copy the full SHA
    c065071 View commit details

Commits on Nov 4, 2023

  1. Revert "fix: use table clone instead of system time for `read_gbq_tab…

    …le` (#109)" (#171)
    
    This reverts commit 031f253.
    tswast authored Nov 4, 2023
    Copy the full SHA
    dfcc2d3 View commit details

Commits on Nov 6, 2023

  1. Copy the full SHA
    4ff26cd View commit details

Commits on Nov 7, 2023

  1. chore(main): release 0.13.0 (#165)

    🤖 I have created a release *beep* *boop*
    ---
    
    
    ## [0.13.0](https://togithub.com/googleapis/python-bigquery-dataframes/compare/v0.12.0...v0.13.0) (2023-11-07)
    
    
    ### Features
    
    * `to_gbq` without a destination table writes to a temporary table ([#158](https://togithub.com/googleapis/python-bigquery-dataframes/issues/158)) ([e1817c9](https://togithub.com/googleapis/python-bigquery-dataframes/commit/e1817c9201ba4ea7fd2f8b6f4a667b010a6fec1b))
    * Add `DataFrame.__iter__`, `DataFrame.iterrows`, `DataFrame.itertuples`, and `DataFrame.keys` methods ([#164](https://togithub.com/googleapis/python-bigquery-dataframes/issues/164)) ([c065071](https://togithub.com/googleapis/python-bigquery-dataframes/commit/c065071028c2f4ac80ee7f84dbeb1df385c2a512))
    * Add `Series.__iter__` method ([#164](https://togithub.com/googleapis/python-bigquery-dataframes/issues/164)) ([c065071](https://togithub.com/googleapis/python-bigquery-dataframes/commit/c065071028c2f4ac80ee7f84dbeb1df385c2a512))
    * Add interpolate() to series and dataframe ([#157](https://togithub.com/googleapis/python-bigquery-dataframes/issues/157)) ([b9cb55c](https://togithub.com/googleapis/python-bigquery-dataframes/commit/b9cb55c5b9354f9ff60de0aad66fe60049876055))
    * Support 32k text-generation and multilingual embedding models ([#161](https://togithub.com/googleapis/python-bigquery-dataframes/issues/161)) ([5f0ea37](https://togithub.com/googleapis/python-bigquery-dataframes/commit/5f0ea37fffff792fc3fbed65e6ace846d8ef6a06))
    
    
    ### Bug Fixes
    
    * Update default temp table expiration to 7 days ([#174](https://togithub.com/googleapis/python-bigquery-dataframes/issues/174)) ([4ff26cd](https://togithub.com/googleapis/python-bigquery-dataframes/commit/4ff26cdf862e9f9b91a3a1d2abfa7fbdf0af9c5b))
    
    ---
    This PR was generated with [Release Please](https://togithub.com/googleapis/release-please). See [documentation](https://togithub.com/googleapis/release-please#release-please).
    release-please[bot] authored Nov 7, 2023
    Copy the full SHA
    8b6b1c6 View commit details
4 changes: 2 additions & 2 deletions .github/.OwlBot.lock.yaml
Original file line number Diff line number Diff line change
@@ -13,5 +13,5 @@
# limitations under the License.
docker:
image: gcr.io/cloud-devrel-public-resources/owlbot-python:latest
digest: sha256:4f9b3b106ad0beafc2c8a415e3f62c1a0cc23cabea115dbe841b848f581cfe99
# created: 2023-10-18T20:26:37.410353675Z
digest: sha256:30470597773378105e239b59fce8eb27cc97375580d592699206d17d117143d0
# created: 2023-11-03T00:57:07.335914631Z
2 changes: 1 addition & 1 deletion .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -28,7 +28,7 @@ jobs:
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: "3.9"
python-version: "3.10"
- name: Install nox
run: |
python -m pip install --upgrade setuptools pip wheel
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -4,6 +4,22 @@

[1]: https://pypi.org/project/bigframes/#history

## [0.13.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v0.12.0...v0.13.0) (2023-11-07)


### Features

* `to_gbq` without a destination table writes to a temporary table ([#158](https://github.com/googleapis/python-bigquery-dataframes/issues/158)) ([e1817c9](https://github.com/googleapis/python-bigquery-dataframes/commit/e1817c9201ba4ea7fd2f8b6f4a667b010a6fec1b))
* Add `DataFrame.__iter__`, `DataFrame.iterrows`, `DataFrame.itertuples`, and `DataFrame.keys` methods ([#164](https://github.com/googleapis/python-bigquery-dataframes/issues/164)) ([c065071](https://github.com/googleapis/python-bigquery-dataframes/commit/c065071028c2f4ac80ee7f84dbeb1df385c2a512))
* Add `Series.__iter__` method ([#164](https://github.com/googleapis/python-bigquery-dataframes/issues/164)) ([c065071](https://github.com/googleapis/python-bigquery-dataframes/commit/c065071028c2f4ac80ee7f84dbeb1df385c2a512))
* Add interpolate() to series and dataframe ([#157](https://github.com/googleapis/python-bigquery-dataframes/issues/157)) ([b9cb55c](https://github.com/googleapis/python-bigquery-dataframes/commit/b9cb55c5b9354f9ff60de0aad66fe60049876055))
* Support 32k text-generation and multilingual embedding models ([#161](https://github.com/googleapis/python-bigquery-dataframes/issues/161)) ([5f0ea37](https://github.com/googleapis/python-bigquery-dataframes/commit/5f0ea37fffff792fc3fbed65e6ace846d8ef6a06))


### Bug Fixes

* Update default temp table expiration to 7 days ([#174](https://github.com/googleapis/python-bigquery-dataframes/issues/174)) ([4ff26cd](https://github.com/googleapis/python-bigquery-dataframes/commit/4ff26cdf862e9f9b91a3a1d2abfa7fbdf0af9c5b))

## [0.12.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v0.11.0...v0.12.0) (2023-11-01)


4 changes: 4 additions & 0 deletions bigframes/constants.py
Original file line number Diff line number Diff line change
@@ -12,6 +12,8 @@
# See the License for the specific language governing permissions and
# limitations under the License.

import datetime

"""Constants used across BigQuery DataFrames.
This module should not depend on any others in the package.
@@ -23,3 +25,5 @@
)

ABSTRACT_METHOD_ERROR_MESSAGE = f"Abstract method. You have likely encountered a bug. Please share this stacktrace and how you reached it with the BigQuery DataFrames team. {FEEDBACK_LINK}"

DEFAULT_EXPIRATION = datetime.timedelta(days=7)
91 changes: 91 additions & 0 deletions bigframes/core/block_transforms.py
Original file line number Diff line number Diff line change
@@ -105,6 +105,97 @@ def indicate_duplicates(
)


def interpolate(block: blocks.Block, method: str = "linear") -> blocks.Block:
if method != "linear":
raise NotImplementedError(
f"Only 'linear' interpolate method supported. {constants.FEEDBACK_LINK}"
)
backwards_window = windows.WindowSpec(following=0)
forwards_window = windows.WindowSpec(preceding=0)

output_column_ids = []

original_columns = block.value_columns
original_labels = block.column_labels
block, offsets = block.promote_offsets()
for column in original_columns:
# null in same places column is null
should_interpolate = block._column_type(column) in [
pd.Float64Dtype(),
pd.Int64Dtype(),
]
if should_interpolate:
block, notnull = block.apply_unary_op(column, ops.notnull_op)
block, masked_offsets = block.apply_binary_op(
offsets, notnull, ops.partial_arg3(ops.where_op, None)
)

block, previous_value = block.apply_window_op(
column, agg_ops.LastNonNullOp(), backwards_window
)
block, next_value = block.apply_window_op(
column, agg_ops.FirstNonNullOp(), forwards_window
)
block, previous_value_offset = block.apply_window_op(
masked_offsets,
agg_ops.LastNonNullOp(),
backwards_window,
skip_reproject_unsafe=True,
)
block, next_value_offset = block.apply_window_op(
masked_offsets,
agg_ops.FirstNonNullOp(),
forwards_window,
skip_reproject_unsafe=True,
)

block, prediction_id = _interpolate(
block,
previous_value_offset,
previous_value,
next_value_offset,
next_value,
offsets,
)

block, interpolated_column = block.apply_binary_op(
column, prediction_id, ops.fillna_op
)
# Pandas performs ffill-like behavior to extrapolate forwards
block, interpolated_and_ffilled = block.apply_binary_op(
interpolated_column, previous_value, ops.fillna_op
)

output_column_ids.append(interpolated_and_ffilled)
else:
output_column_ids.append(column)

# Force reproject since used `skip_project_unsafe` perviously
block = block.select_columns(output_column_ids)._force_reproject()
return block.with_column_labels(original_labels)


def _interpolate(
block: blocks.Block,
x0_id: str,
y0_id: str,
x1_id: str,
y1_id: str,
xpredict_id: str,
) -> typing.Tuple[blocks.Block, str]:
"""Applies linear interpolation equation to predict y values for xpredict."""
block, x1x0diff = block.apply_binary_op(x1_id, x0_id, ops.sub_op)
block, y1y0diff = block.apply_binary_op(y1_id, y0_id, ops.sub_op)
block, xpredictx0diff = block.apply_binary_op(xpredict_id, x0_id, ops.sub_op)

block, y1_weight = block.apply_binary_op(y1y0diff, x1x0diff, ops.div_op)
block, y1_part = block.apply_binary_op(xpredictx0diff, y1_weight, ops.mul_op)

block, prediction_id = block.apply_binary_op(y0_id, y1_part, ops.add_op)
block = block.drop_columns([x1x0diff, y1y0diff, xpredictx0diff, y1_weight, y1_part])
return block, prediction_id


def drop_duplicates(
block: blocks.Block, columns: typing.Sequence[str], keep: str = "first"
) -> blocks.Block:
71 changes: 61 additions & 10 deletions bigframes/dataframe.py
Original file line number Diff line number Diff line change
@@ -16,6 +16,7 @@

from __future__ import annotations

import datetime
import re
import textwrap
import typing
@@ -303,6 +304,9 @@ def __len__(self):
rows, _ = self.shape
return rows

def __iter__(self):
return iter(self.columns)

def astype(
self,
dtype: Union[bigframes.dtypes.DtypeString, bigframes.dtypes.Dtype],
@@ -1434,6 +1438,10 @@ def _reindex_columns(self, columns):
def reindex_like(self, other: DataFrame, *, validate: typing.Optional[bool] = None):
return self.reindex(index=other.index, columns=other.columns, validate=validate)

def interpolate(self, method: str = "linear") -> DataFrame:
result = block_ops.interpolate(self._block, method)
return DataFrame(result)

def fillna(self, value=None) -> DataFrame:
return self._apply_binop(value, ops.fillna_op, how="left")

@@ -1472,12 +1480,27 @@ def isin(self, values) -> DataFrame:
f"isin(), you passed a [{type(values).__name__}]"
)

def keys(self) -> pandas.Index:
return self.columns

def items(self):
column_ids = self._block.value_columns
column_labels = self._block.column_labels
for col_id, col_label in zip(column_ids, column_labels):
yield col_label, bigframes.series.Series(self._block.select_column(col_id))

def iterrows(self) -> Iterable[tuple[typing.Any, pandas.Series]]:
for df in self.to_pandas_batches():
for item in df.iterrows():
yield item

def itertuples(
self, index: bool = True, name: typing.Optional[str] = "Pandas"
) -> Iterable[tuple[typing.Any, ...]]:
for df in self.to_pandas_batches():
for item in df.itertuples(index=index, name=name):
yield item

def dropna(
self,
*,
@@ -2285,25 +2308,52 @@ def to_json(

def to_gbq(
self,
destination_table: str,
destination_table: Optional[str] = None,
*,
if_exists: Optional[Literal["fail", "replace", "append"]] = "fail",
if_exists: Optional[Literal["fail", "replace", "append"]] = None,
index: bool = True,
ordering_id: Optional[str] = None,
) -> None:
if "." not in destination_table:
raise ValueError(
"Invalid Table Name. Should be of the form 'datasetId.tableId' or "
"'projectId.datasetId.tableId'"
)

) -> str:
dispositions = {
"fail": bigquery.WriteDisposition.WRITE_EMPTY,
"replace": bigquery.WriteDisposition.WRITE_TRUNCATE,
"append": bigquery.WriteDisposition.WRITE_APPEND,
}

if destination_table is None:
# TODO(swast): If there have been no modifications to the DataFrame
# since the last time it was written (cached), then return that.
# For `read_gbq` nodes, return the underlying table clone.
destination_table = bigframes.session._io.bigquery.create_temp_table(
self._session.bqclient,
self._session._anonymous_dataset,
# TODO(swast): allow custom expiration times, probably via session configuration.
datetime.datetime.now(datetime.timezone.utc)
+ constants.DEFAULT_EXPIRATION,
)

if if_exists is not None and if_exists != "replace":
raise ValueError(
f"Got invalid value {repr(if_exists)} for if_exists. "
"When no destination table is specified, a new table is always created. "
"None or 'replace' are the only valid options in this case."
)
if_exists = "replace"

if "." not in destination_table:
raise ValueError(
f"Got invalid value for destination_table {repr(destination_table)}. "
"Should be of the form 'datasetId.tableId' or 'projectId.datasetId.tableId'."
)

if if_exists is None:
if_exists = "fail"

if if_exists not in dispositions:
raise ValueError("'{0}' is not valid for if_exists".format(if_exists))
raise ValueError(
f"Got invalid value {repr(if_exists)} for if_exists. "
f"Valid options include None or one of {dispositions.keys()}."
)

job_config = bigquery.QueryJobConfig(
write_disposition=dispositions[if_exists],
@@ -2314,6 +2364,7 @@ def to_gbq(
)

self._run_io_query(index=index, ordering_id=ordering_id, job_config=job_config)
return destination_table

def to_numpy(
self, dtype=None, copy=False, na_value=None, **kwargs
Loading