Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: aws/aws-sdk-pandas
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: 3.7.2
Choose a base ref
...
head repository: aws/aws-sdk-pandas
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: 3.7.3
Choose a head ref
  • 19 commits
  • 33 files changed
  • 6 contributors

Commits on Apr 1, 2024

  1. chore(deps-dev): bump the development-dependencies group with 6 updat…

    …es (#2753)
    
    Bumps the development-dependencies group with 6 updates:
    
    | Package | From | To |
    | --- | --- | --- |
    | [boto3-stubs](https://github.com/youtype/mypy_boto3_builder) | `1.34.64` | `1.34.74` |
    | [ruff](https://github.com/astral-sh/ruff) | `0.3.3` | `0.3.4` |
    | [moto](https://github.com/getmoto/moto) | `5.0.3` | `5.0.4` |
    | [pytest-cov](https://github.com/pytest-dev/pytest-cov) | `4.1.0` | `5.0.0` |
    | [tox](https://github.com/tox-dev/tox) | `4.14.1` | `4.14.2` |
    | [bump-my-version](https://github.com/callowayproject/bump-my-version) | `0.19.0` | `0.20.0` |
    
    
    Updates `boto3-stubs` from 1.34.64 to 1.34.74
    - [Release notes](https://github.com/youtype/mypy_boto3_builder/releases)
    - [Commits](https://github.com/youtype/mypy_boto3_builder/commits)
    
    Updates `ruff` from 0.3.3 to 0.3.4
    - [Release notes](https://github.com/astral-sh/ruff/releases)
    - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
    - [Commits](astral-sh/ruff@v0.3.3...v0.3.4)
    
    Updates `moto` from 5.0.3 to 5.0.4
    - [Release notes](https://github.com/getmoto/moto/releases)
    - [Changelog](https://github.com/getmoto/moto/blob/master/CHANGELOG.md)
    - [Commits](getmoto/moto@5.0.3...5.0.4)
    
    Updates `pytest-cov` from 4.1.0 to 5.0.0
    - [Changelog](https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst)
    - [Commits](pytest-dev/pytest-cov@v4.1.0...v5.0.0)
    
    Updates `tox` from 4.14.1 to 4.14.2
    - [Release notes](https://github.com/tox-dev/tox/releases)
    - [Changelog](https://github.com/tox-dev/tox/blob/main/docs/changelog.rst)
    - [Commits](tox-dev/tox@4.14.1...4.14.2)
    
    Updates `bump-my-version` from 0.19.0 to 0.20.0
    - [Release notes](https://github.com/callowayproject/bump-my-version/releases)
    - [Changelog](https://github.com/callowayproject/bump-my-version/blob/master/CHANGELOG.md)
    - [Commits](callowayproject/bump-my-version@0.19.0...0.20.0)
    
    ---
    updated-dependencies:
    - dependency-name: boto3-stubs
      dependency-type: direct:development
      update-type: version-update:semver-patch
      dependency-group: development-dependencies
    - dependency-name: ruff
      dependency-type: direct:development
      update-type: version-update:semver-patch
      dependency-group: development-dependencies
    - dependency-name: moto
      dependency-type: direct:development
      update-type: version-update:semver-patch
      dependency-group: development-dependencies
    - dependency-name: pytest-cov
      dependency-type: direct:development
      update-type: version-update:semver-major
      dependency-group: development-dependencies
    - dependency-name: tox
      dependency-type: direct:development
      update-type: version-update:semver-patch
      dependency-group: development-dependencies
    - dependency-name: bump-my-version
      dependency-type: direct:development
      update-type: version-update:semver-minor
      dependency-group: development-dependencies
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    dependabot[bot] authored Apr 1, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    7bd0362 View commit details
  2. chore(deps): bump the production-dependencies group with 5 updates (#…

    …2754)
    
    Bumps the production-dependencies group with 5 updates:
    
    | Package | From | To |
    | --- | --- | --- |
    | [boto3](https://github.com/boto/boto3) | `1.34.64` | `1.34.69` |
    | [botocore](https://github.com/boto/botocore) | `1.34.69` | `1.34.74` |
    | [pg8000](https://github.com/tlocke/pg8000) | `1.30.5` | `1.31.1` |
    | [opensearch-py](https://github.com/opensearch-project/opensearch-py) | `2.4.2` | `2.5.0` |
    | [deltalake](https://github.com/delta-io/delta-rs) | `0.16.2` | `0.16.3` |
    
    
    Updates `boto3` from 1.34.64 to 1.34.69
    - [Release notes](https://github.com/boto/boto3/releases)
    - [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst)
    - [Commits](boto/boto3@1.34.64...1.34.69)
    
    Updates `botocore` from 1.34.69 to 1.34.74
    - [Changelog](https://github.com/boto/botocore/blob/develop/CHANGELOG.rst)
    - [Commits](boto/botocore@1.34.69...1.34.74)
    
    Updates `pg8000` from 1.30.5 to 1.31.1
    - [Commits](tlocke/pg8000@1.30.5...1.31.1)
    
    Updates `opensearch-py` from 2.4.2 to 2.5.0
    - [Release notes](https://github.com/opensearch-project/opensearch-py/releases)
    - [Changelog](https://github.com/opensearch-project/opensearch-py/blob/main/CHANGELOG.md)
    - [Commits](opensearch-project/opensearch-py@v2.4.2...v2.5.0)
    
    Updates `deltalake` from 0.16.2 to 0.16.3
    - [Release notes](https://github.com/delta-io/delta-rs/releases)
    - [Changelog](https://github.com/delta-io/delta-rs/blob/main/CHANGELOG.md)
    - [Commits](delta-io/delta-rs@rust-v0.16.2...rust-v0.16.3)
    
    ---
    updated-dependencies:
    - dependency-name: boto3
      dependency-type: direct:production
      update-type: version-update:semver-patch
      dependency-group: production-dependencies
    - dependency-name: botocore
      dependency-type: direct:production
      update-type: version-update:semver-patch
      dependency-group: production-dependencies
    - dependency-name: pg8000
      dependency-type: direct:production
      update-type: version-update:semver-minor
      dependency-group: production-dependencies
    - dependency-name: opensearch-py
      dependency-type: direct:production
      update-type: version-update:semver-minor
      dependency-group: production-dependencies
    - dependency-name: deltalake
      dependency-type: direct:production
      update-type: version-update:semver-patch
      dependency-group: production-dependencies
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    dependabot[bot] authored Apr 1, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    d574220 View commit details

Commits on Apr 2, 2024

  1. Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    ace17a3 View commit details
  2. fix: simplify README, remove AWS Glue for Ray references (#2750)

    * fix: simplify README, remove AWS Glue for Ray references
    
    * fix: PR feedback
    jaidisido authored Apr 2, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    946ac33 View commit details

Commits on Apr 8, 2024

  1. chore(deps): bump the production-dependencies group with 4 updates (#…

    …2763)
    
    Bumps the production-dependencies group with 4 updates: [boto3](https://github.com/boto/boto3), [botocore](https://github.com/boto/botocore), [typing-extensions](https://github.com/python/typing_extensions) and [deltalake](https://github.com/delta-io/delta-rs).
    
    
    Updates `boto3` from 1.34.69 to 1.34.74
    - [Release notes](https://github.com/boto/boto3/releases)
    - [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst)
    - [Commits](boto/boto3@1.34.69...1.34.74)
    
    Updates `botocore` from 1.34.74 to 1.34.79
    - [Changelog](https://github.com/boto/botocore/blob/develop/CHANGELOG.rst)
    - [Commits](boto/botocore@1.34.74...1.34.79)
    
    Updates `typing-extensions` from 4.10.0 to 4.11.0
    - [Release notes](https://github.com/python/typing_extensions/releases)
    - [Changelog](https://github.com/python/typing_extensions/blob/main/CHANGELOG.md)
    - [Commits](python/typing_extensions@4.10.0...4.11.0)
    
    Updates `deltalake` from 0.16.3 to 0.16.4
    - [Release notes](https://github.com/delta-io/delta-rs/releases)
    - [Changelog](https://github.com/delta-io/delta-rs/blob/main/CHANGELOG.md)
    - [Commits](delta-io/delta-rs@rust-v0.16.3...rust-v0.16.4)
    
    ---
    updated-dependencies:
    - dependency-name: boto3
      dependency-type: direct:production
      update-type: version-update:semver-patch
      dependency-group: production-dependencies
    - dependency-name: botocore
      dependency-type: direct:production
      update-type: version-update:semver-patch
      dependency-group: production-dependencies
    - dependency-name: typing-extensions
      dependency-type: direct:production
      update-type: version-update:semver-minor
      dependency-group: production-dependencies
    - dependency-name: deltalake
      dependency-type: direct:production
      update-type: version-update:semver-patch
      dependency-group: production-dependencies
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    dependabot[bot] authored Apr 8, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    a78d349 View commit details
  2. chore(deps-dev): bump the development-dependencies group with 3 updat…

    …es (#2764)
    
    Bumps the development-dependencies group with 3 updates: [boto3-stubs](https://github.com/youtype/mypy_boto3_builder), [ruff](https://github.com/astral-sh/ruff) and [moto](https://github.com/getmoto/moto).
    
    
    Updates `boto3-stubs` from 1.34.74 to 1.34.79
    - [Release notes](https://github.com/youtype/mypy_boto3_builder/releases)
    - [Commits](https://github.com/youtype/mypy_boto3_builder/commits)
    
    Updates `ruff` from 0.3.4 to 0.3.5
    - [Release notes](https://github.com/astral-sh/ruff/releases)
    - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
    - [Commits](astral-sh/ruff@v0.3.4...v0.3.5)
    
    Updates `moto` from 5.0.4 to 5.0.5
    - [Release notes](https://github.com/getmoto/moto/releases)
    - [Changelog](https://github.com/getmoto/moto/blob/master/CHANGELOG.md)
    - [Commits](getmoto/moto@5.0.4...5.0.5)
    
    ---
    updated-dependencies:
    - dependency-name: boto3-stubs
      dependency-type: direct:development
      update-type: version-update:semver-patch
      dependency-group: development-dependencies
    - dependency-name: ruff
      dependency-type: direct:development
      update-type: version-update:semver-patch
      dependency-group: development-dependencies
    - dependency-name: moto
      dependency-type: direct:development
      update-type: version-update:semver-patch
      dependency-group: development-dependencies
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    Co-authored-by: jaidisido <jaidisido@gmail.com>
    dependabot[bot] and jaidisido authored Apr 8, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    b59008d View commit details

Commits on Apr 9, 2024

  1. [skip ci] fix: remove Glue for Ray in install section

    jaidisido committed Apr 9, 2024
    Copy the full SHA
    66b6589 View commit details
  2. fix: trickle down s3_output in to_iceberg (#2767)

    jaidisido authored Apr 9, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    de48a86 View commit details
  3. fix: respect order of columns in to_iceberg (#2768)

    jaidisido authored Apr 9, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    c95a5d0 View commit details

Commits on Apr 10, 2024

  1. docs: Fix YAML formatting in Ray Remote tutorial (#2770)

    LeonLuttenberger authored Apr 10, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    db7c449 View commit details

Commits on Apr 19, 2024

  1. chore(deps): bump idna from 3.6 to 3.7 (#2772)

    Bumps [idna](https://github.com/kjd/idna) from 3.6 to 3.7.
    - [Release notes](https://github.com/kjd/idna/releases)
    - [Changelog](https://github.com/kjd/idna/blob/master/HISTORY.rst)
    - [Commits](kjd/idna@v3.6...v3.7)
    
    ---
    updated-dependencies:
    - dependency-name: idna
      dependency-type: indirect
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    dependabot[bot] authored Apr 19, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    2338d6f View commit details
  2. chore(deps): bump aiohttp from 3.9.3 to 3.9.4 (#2777)

    Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.9.3 to 3.9.4.
    - [Release notes](https://github.com/aio-libs/aiohttp/releases)
    - [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst)
    - [Commits](aio-libs/aiohttp@v3.9.3...v3.9.4)
    
    ---
    updated-dependencies:
    - dependency-name: aiohttp
      dependency-type: direct:production
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    dependabot[bot] authored Apr 19, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    b1f5792 View commit details
  3. fix: add PyArrow fixed_size_binary dtype support (#2775)

    jaidisido authored Apr 19, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    a7756b6 View commit details

Commits on Apr 22, 2024

  1. chore(deps-dev): bump the development-dependencies group with 4 updat…

    …es (#2781)
    
    Bumps the development-dependencies group with 4 updates: [boto3-stubs](https://github.com/youtype/mypy_boto3_builder), [ruff](https://github.com/astral-sh/ruff), [bump-my-version](https://github.com/callowayproject/bump-my-version) and [jupyterlab](https://github.com/jupyterlab/jupyterlab).
    
    
    Updates `boto3-stubs` from 1.34.79 to 1.34.88
    - [Release notes](https://github.com/youtype/mypy_boto3_builder/releases)
    - [Commits](https://github.com/youtype/mypy_boto3_builder/commits)
    
    Updates `ruff` from 0.3.5 to 0.4.1
    - [Release notes](https://github.com/astral-sh/ruff/releases)
    - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
    - [Commits](astral-sh/ruff@v0.3.5...v0.4.1)
    
    Updates `bump-my-version` from 0.20.0 to 0.20.1
    - [Release notes](https://github.com/callowayproject/bump-my-version/releases)
    - [Changelog](https://github.com/callowayproject/bump-my-version/blob/master/CHANGELOG.md)
    - [Commits](callowayproject/bump-my-version@0.20.0...0.20.1)
    
    Updates `jupyterlab` from 4.1.5 to 4.1.6
    - [Release notes](https://github.com/jupyterlab/jupyterlab/releases)
    - [Changelog](https://github.com/jupyterlab/jupyterlab/blob/@jupyterlab/lsp@4.1.6/CHANGELOG.md)
    - [Commits](https://github.com/jupyterlab/jupyterlab/compare/@jupyterlab/lsp@4.1.5...@jupyterlab/lsp@4.1.6)
    
    ---
    updated-dependencies:
    - dependency-name: boto3-stubs
      dependency-type: direct:development
      update-type: version-update:semver-patch
      dependency-group: development-dependencies
    - dependency-name: ruff
      dependency-type: direct:development
      update-type: version-update:semver-minor
      dependency-group: development-dependencies
    - dependency-name: bump-my-version
      dependency-type: direct:development
      update-type: version-update:semver-patch
      dependency-group: development-dependencies
    - dependency-name: jupyterlab
      dependency-type: direct:development
      update-type: version-update:semver-patch
      dependency-group: development-dependencies
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    dependabot[bot] authored Apr 22, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    229a9d1 View commit details
  2. fix: Opensearch serverless vector search collections - remove default…

    … `_id` (#2784)
    
    * fix: Opensearch serverless vector search collections - allow no _id
    
    Signed-off-by: Anton Kukushkin <kukushkin.anton@gmail.com>
    
    * mypy
    
    Signed-off-by: Anton Kukushkin <kukushkin.anton@gmail.com>
    
    ---------
    
    Signed-off-by: Anton Kukushkin <kukushkin.anton@gmail.com>
    kukushking authored Apr 22, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    c8b7b03 View commit details
  3. fix: Missing keys in list_to_arrow_table (#2778)

    kukushking authored Apr 22, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    60beb95 View commit details
  4. fix: prevent athena.to_iceberg overwrite to delete table in order to …

    …preserve Iceberg transactions history (#2776)
    erwan-simon authored Apr 22, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    6dd28e4 View commit details
  5. chore: Bump version to 3.7.3 (#2785)

    LeonLuttenberger authored Apr 22, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    3d5a4e8 View commit details
  6. docs: update layers.rst

    LeonLuttenberger committed Apr 22, 2024

    Unverified

    This user has not yet uploaded their public signing key.
    Copy the full SHA
    d554bea View commit details
2 changes: 1 addition & 1 deletion .bumpversion.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "3.7.2"
current_version = "3.7.3"
commit = false
tag = false
tag_name = "{new_version}"
19 changes: 0 additions & 19 deletions .github/ISSUE_TEMPLATE/support-data-wrangler.md

This file was deleted.

135 changes: 41 additions & 94 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
# AWS SDK for pandas (awswrangler)

AWS Data Wrangler is now **AWS SDK for pandas (awswrangler)**. We’re changing the name we use when we talk about the library, but everything else will stay the same. You’ll still be able to install using `pip install awswrangler` and you won’t need to change any of your code. As part of this change, we’ve moved the library from AWS Labs to the main AWS GitHub organisation but, thanks to the GitHub’s redirect feature, you’ll still be able to access the project by its old URLs until you update your bookmarks. Our documentation has also moved to [aws-sdk-pandas.readthedocs.io](https://aws-sdk-pandas.readthedocs.io), but old bookmarks will redirect to the new site.

*Pandas on AWS*

Easy integration with Athena, Glue, Redshift, Timestream, OpenSearch, Neptune, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

![AWS SDK for pandas](docs/source/_static/logo2.png?raw=true "AWS SDK for pandas")
![AWS SDK for pandas](https://github.com/aws/aws-sdk-pandas/blob/main/docs/source/_static/logo2.png?raw=true "AWS SDK for pandas")
![tracker](https://d3tiqpr4kkkomd.cloudfront.net/img/pixel.png?asset=GVOYN2BOOQ573LTVIHEW)

> An [AWS Professional Service](https://aws.amazon.com/professional-services/) open source initiative | aws-proserve-opensource@amazon.com
@@ -29,17 +27,13 @@ Easy integration with Athena, Glue, Redshift, Timestream, OpenSearch, Neptune, Q
> ⚠️ **Starting version 3.0, optional modules must be installed explicitly:**<br>
➡️`pip install 'awswrangler[redshift]'`

Powered By [<img src="https://arrow.apache.org/img/arrow.png" width="200">](https://arrow.apache.org/powered_by/)

## Table of contents

- [Quick Start](#quick-start)
- [At Scale](#at-scale)
- [Read The Docs](#read-the-docs)
- [Getting Help](#getting-help)
- [Community Resources](#community-resources)
- [Logging](#logging)
- [Who uses AWS SDK for pandas?](#who-uses-aws-sdk-for-pandas)

## Quick Start

@@ -100,27 +94,27 @@ FROM "sampleDB"."sampleTable" ORDER BY time DESC LIMIT 3
## At scale
AWS SDK for pandas can also run your workflows at scale by leveraging [Modin](https://modin.readthedocs.io/en/stable/) and [Ray](https://www.ray.io/). Both projects aim to speed up data workloads by distributing processing over a cluster of workers.

The quickest way to get started is to use AWS Glue with Ray. Read our [docs](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/scale.html), our blogs ([1](https://aws.amazon.com/blogs/big-data/scale-aws-sdk-for-pandas-workloads-with-aws-glue-for-ray/)/[2](https://aws.amazon.com/blogs/big-data/advanced-patterns-with-aws-sdk-for-pandas-on-aws-glue-for-ray/)), or head to our latest [tutorials](https://github.com/aws/aws-sdk-pandas/tree/main/tutorials) to discover even more features.
Read our [docs](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/scale.html) or head to our latest [tutorials](https://github.com/aws/aws-sdk-pandas/tree/main/tutorials) to learn more.

> ⚠️ **Ray is currently not available for Python 3.12. While AWS SDK for pandas supports Python 3.12, it cannot be used at scale.**
## [Read The Docs](https://aws-sdk-pandas.readthedocs.io/)

- [**What is AWS SDK for pandas?**](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/about.html)
- [**Install**](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/install.html)
- [PyPi (pip)](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/install.html#pypi-pip)
- [Conda](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/install.html#conda)
- [AWS Lambda Layer](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/install.html#aws-lambda-layer)
- [AWS Glue Python Shell Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/install.html#aws-glue-python-shell-jobs)
- [AWS Glue PySpark Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/install.html#aws-glue-pyspark-jobs)
- [Amazon SageMaker Notebook](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/install.html#amazon-sagemaker-notebook)
- [Amazon SageMaker Notebook Lifecycle](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/install.html#amazon-sagemaker-notebook-lifecycle)
- [EMR](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/install.html#emr)
- [From source](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/install.html#from-source)
- [**At scale**](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/scale.html)
- [Getting Started](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/scale.html#getting-started)
- [Supported APIs](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/scale.html#supported-apis)
- [Resources](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/scale.html#resources)
- [**What is AWS SDK for pandas?**](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/about.html)
- [**Install**](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/install.html)
- [PyPi (pip)](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/install.html#pypi-pip)
- [Conda](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/install.html#conda)
- [AWS Lambda Layer](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/install.html#aws-lambda-layer)
- [AWS Glue Python Shell Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/install.html#aws-glue-python-shell-jobs)
- [AWS Glue PySpark Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/install.html#aws-glue-pyspark-jobs)
- [Amazon SageMaker Notebook](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/install.html#amazon-sagemaker-notebook)
- [Amazon SageMaker Notebook Lifecycle](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/install.html#amazon-sagemaker-notebook-lifecycle)
- [EMR](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/install.html#emr)
- [From source](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/install.html#from-source)
- [**At scale**](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/scale.html)
- [Getting Started](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/scale.html#getting-started)
- [Supported APIs](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/scale.html#supported-apis)
- [Resources](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/scale.html#resources)
- [**Tutorials**](https://github.com/aws/aws-sdk-pandas/tree/main/tutorials)
- [001 - Introduction](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/001%20-%20Introduction.ipynb)
- [002 - Sessions](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/002%20-%20Sessions.ipynb)
@@ -156,36 +150,35 @@ The quickest way to get started is to use AWS Glue with Ray. Read our [docs](htt
- [033 - Amazon Neptune](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/033%20-%20Amazon%20Neptune.ipynb)
- [034 - Distributing Calls Using Ray](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/034%20-%20Distributing%20Calls%20using%20Ray.ipynb)
- [035 - Distributing Calls on Ray Remote Cluster](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/035%20-%20Distributing%20Calls%20on%20Ray%20Remote%20Cluster.ipynb)
- [036 - Distributing Calls with Glue Interactive Sessions on Ray](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/036%20-%20Distributing%20Calls%20with%20Glue%20Interactive%20Sessions%20on%20Ray.ipynb)
- [037 - Glue Data Quality](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/037%20-%20Glue%20Data%20Quality.ipynb)
- [038 - OpenSearch Serverless](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/038%20-%20OpenSearch%20Serverless.ipynb)
- [039 - Athena Iceberg](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/039%20-%20Athena%20Iceberg.ipynb)
- [040 - EMR Serverless](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/040%20-%20EMR%20Serverless.ipynb)
- [041 - Apache Spark on Amazon Athena](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/041%20-%20Apache%20Spark%20on%20Amazon%20Athena.ipynb)
- [**API Reference**](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/api.html)
- [Amazon S3](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/api.html#amazon-s3)
- [AWS Glue Catalog](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/api.html#aws-glue-catalog)
- [Amazon Athena](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/api.html#amazon-athena)
- [Amazon Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/api.html#amazon-redshift)
- [PostgreSQL](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/api.html#postgresql)
- [MySQL](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/api.html#mysql)
- [SQL Server](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/api.html#sqlserver)
- [Oracle](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/api.html#oracle)
- [Data API Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/api.html#data-api-redshift)
- [Data API RDS](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/api.html#data-api-rds)
- [OpenSearch](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/api.html#opensearch)
- [AWS Glue Data Quality](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/api.html#aws-glue-data-quality)
- [Amazon Neptune](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/api.html#amazon-neptune)
- [DynamoDB](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/api.html#dynamodb)
- [Amazon Timestream](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/api.html#amazon-timestream)
- [Amazon EMR](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/api.html#amazon-emr)
- [Amazon CloudWatch Logs](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/api.html#amazon-cloudwatch-logs)
- [Amazon Chime](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/api.html#amazon-chime)
- [Amazon QuickSight](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/api.html#amazon-quicksight)
- [AWS STS](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/api.html#aws-sts)
- [AWS Secrets Manager](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/api.html#aws-secrets-manager)
- [Global Configurations](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/api.html#global-configurations)
- [Distributed - Ray](https://aws-sdk-pandas.readthedocs.io/en/3.7.2/api.html#distributed-ray)
- [**API Reference**](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/api.html)
- [Amazon S3](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/api.html#amazon-s3)
- [AWS Glue Catalog](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/api.html#aws-glue-catalog)
- [Amazon Athena](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/api.html#amazon-athena)
- [Amazon Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/api.html#amazon-redshift)
- [PostgreSQL](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/api.html#postgresql)
- [MySQL](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/api.html#mysql)
- [SQL Server](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/api.html#sqlserver)
- [Oracle](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/api.html#oracle)
- [Data API Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/api.html#data-api-redshift)
- [Data API RDS](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/api.html#data-api-rds)
- [OpenSearch](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/api.html#opensearch)
- [AWS Glue Data Quality](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/api.html#aws-glue-data-quality)
- [Amazon Neptune](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/api.html#amazon-neptune)
- [DynamoDB](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/api.html#dynamodb)
- [Amazon Timestream](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/api.html#amazon-timestream)
- [Amazon EMR](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/api.html#amazon-emr)
- [Amazon CloudWatch Logs](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/api.html#amazon-cloudwatch-logs)
- [Amazon Chime](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/api.html#amazon-chime)
- [Amazon QuickSight](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/api.html#amazon-quicksight)
- [AWS STS](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/api.html#aws-sts)
- [AWS Secrets Manager](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/api.html#aws-secrets-manager)
- [Global Configurations](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/api.html#global-configurations)
- [Distributed - Ray](https://aws-sdk-pandas.readthedocs.io/en/3.7.3/api.html#distributed-ray)
- [**License**](https://github.com/aws/aws-sdk-pandas/blob/main/LICENSE.txt)
- [**Contributing**](https://github.com/aws/aws-sdk-pandas/blob/main/CONTRIBUTING.md)

@@ -198,19 +191,6 @@ You may also find help on these community resources:
and tag it with `awswrangler`
* [Runbook](https://github.com/aws/aws-sdk-pandas/discussions/1815) for AWS SDK for pandas with Ray

## Community Resources

Please [send a Pull Request](https://github.com/aws/aws-sdk-pandas/edit/main/README.md) with your resource reference and @githubhandle.

- [YouTube channel](https://www.youtube.com/playlist?list=PL7bE4nSzLSWdDdlfRgfKo2JBplB4p_v5O) [[@AdrianoNicolucci](https://github.com/AdrianoNicolucci)]
- [Optimize Python ETL by extending Pandas with AWS SDK for pandas](https://aws.amazon.com/blogs/big-data/optimize-python-etl-by-extending-pandas-with-aws-data-wrangler/) [[@igorborgest](https://github.com/igorborgest)]
- [Reading Parquet Files With AWS Lambda](https://aprakash.wordpress.com/2020/04/14/reading-parquet-files-with-aws-lambda/) [[@anand086](https://github.com/anand086)]
- [Transform AWS CloudTrail data using AWS SDK for pandas](https://aprakash.wordpress.com/2020/09/17/transform-aws-cloudtrail-data-using-aws-data-wrangler/) [[@anand086](https://github.com/anand086)]
- [Rename Glue Tables using AWS SDK for pandas](https://ananddatastories.com/rename-glue-tables-using-aws-sdk-pandas/) [[@anand086](https://github.com/anand086)]
- [Getting started on AWS SDK for pandas and Athena](https://medium.com/@dheerajsharmainampudi/getting-started-on-aws-sdk-pandas-and-athena-7b446c834076) [[@dheerajsharma21](https://github.com/dheerajsharma21)]
- [Simplifying Pandas integration with AWS data related services](https://medium.com/@bv_subhash/aws-sdk-pandas-simplifying-pandas-integration-with-aws-data-related-services-2b3325c12188) [[@bvsubhash](https://github.com/bvsubhash)]
- [Build an ETL pipeline using AWS S3, Glue and Athena](https://www.linkedin.com/pulse/build-etl-pipeline-using-aws-s3-glue-athena-data-wrangler-tom-reid/) [[@taupirho](https://github.com/taupirho)]

## Logging

Enabling internal logging examples:
@@ -228,36 +208,3 @@ Into AWS lambda:
import logging
logging.getLogger("awswrangler").setLevel(logging.DEBUG)
```

## Who uses AWS SDK for pandas?

Knowing which companies are using this library is important to help prioritize the project internally.
If you would like us to include your company’s name and/or logo in the README file to indicate that your company is using the AWS SDK for pandas, please raise a "Support Us" issue. If you would like us to display your company’s logo, please raise a linked pull request to provide an image file for the logo. Note that by raising a Support Us issue (and related pull request), you are granting AWS permission to use your company’s name (and logo) for the limited purpose described here and you are confirming that you have authority to grant such permission.

- [Amazon](https://www.amazon.com/)
- [AWS](https://aws.amazon.com/)
- [Cepsa](https://cepsa.com) [[@alvaropc](https://github.com/alvaropc)]
- [Cognitivo](https://www.cognitivo.ai/) [[@msantino](https://github.com/msantino)]
- [Digio](https://www.digio.com.br/) [[@afonsomy](https://github.com/afonsomy)]
- [DNX](https://www.dnx.solutions/) [[@DNXLabs](https://github.com/DNXLabs)]
- [Fortescue Future Industries](https://ffi.com.au/) [[@spencervoorend](https://github.com/spencervoorend)]
- [Funcional Health Tech](https://www.funcionalcorp.com.br/) [[@webysther](https://github.com/webysther)]
- [Funding Circle](https://www.fundingcircle.com/) [[@pfig](https://github.com/pfig)]
- [Infomach](https://www.infomach.com.br/)
- [Informa Markets](https://www.informamarkets.com/en/home.html) [[@mateusmorato]](http://github.com/mateusmorato)
- [LINE TV](https://www.linetv.tw/) [[@bryanyang0528](https://github.com/bryanyang0528)]
- [LogicalCube](https://www.logicalcube.com) [[@zolabud](https://github.com/zolabud)]
- [Magnataur](https://magnataur.com) [[@brianmingus2](https://github.com/brianmingus2)]
- [M4U](https://www.m4u.com.br/) [[@Thiago-Dantas](https://github.com/Thiago-Dantas)]
- [NBCUniversal](https://www.nbcuniversal.com/) [[@vibe](https://github.com/vibe)]
- [nrd.io](https://nrd.io/) [[@mrtns](https://github.com/mrtns)]
- [OKRA Technologies](https://okra.ai) [[@JPFrancoia](https://github.com/JPFrancoia), [@schot](https://github.com/schot)]
- [Pier](https://www.pier.digital/) [[@flaviomax](https://github.com/flaviomax)]
- [Pismo](https://www.pismo.io/) [[@msantino](https://github.com/msantino)]
- [ringDNA](https://www.ringdna.com/) [[@msropp](https://github.com/msropp)]
- [Serasa Experian](https://www.serasaexperian.com.br/) [[@andre-marcos-perez](https://github.com/andre-marcos-perez)]
- [Shipwell](https://shipwell.com/) [[@zacharycarter](https://github.com/zacharycarter)]
- [strongDM](https://www.strongdm.com/) [[@mrtns](https://github.com/mrtns)]
- [Thinkbumblebee](https://www.thinkbumblebee.com/) [[@dheerajsharma21]](https://github.com/dheerajsharma21)
- [VTEX](https://vtex.com/us-en/) [[@igorborgest]](https://github.com/igorborgest)
- [Zillow](https://www.zillow.com/) [[@nicholas-miles]](https://github.com/nicholas-miles)
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.7.2
3.7.3
2 changes: 1 addition & 1 deletion awswrangler/__metadata__.py
Original file line number Diff line number Diff line change
@@ -7,5 +7,5 @@

__title__: str = "awswrangler"
__description__: str = "Pandas on AWS."
__version__: str = "3.7.2"
__version__: str = "3.7.3"
__license__: str = "Apache License 2.0"
5 changes: 3 additions & 2 deletions awswrangler/_data_types.py
Original file line number Diff line number Diff line change
@@ -46,7 +46,7 @@ def pyarrow2athena( # noqa: PLR0911,PLR0912
return "timestamp"
if pa.types.is_date(dtype):
return "date"
if pa.types.is_binary(dtype):
if pa.types.is_binary(dtype) or pa.types.is_fixed_size_binary(dtype):
return "binary"
if pa.types.is_dictionary(dtype):
return pyarrow2athena(dtype=dtype.value_type, ignore_null=ignore_null)
@@ -308,6 +308,7 @@ def _split_map(s: str) -> list[str]:

def athena2pyarrow(dtype: str) -> pa.DataType: # noqa: PLR0911,PLR0912
"""Athena to PyArrow data types conversion."""
dtype = dtype.strip()
if dtype.startswith(("array", "struct", "map")):
orig_dtype: str = dtype
dtype = dtype.lower().replace(" ", "")
@@ -375,7 +376,7 @@ def athena2pandas(dtype: str, dtype_backend: str | None = None) -> str: # noqa:
return "decimal" if dtype_backend != "pyarrow" else "double[pyarrow]"
if dtype in ("binary", "varbinary"):
return "bytes" if dtype_backend != "pyarrow" else "binary[pyarrow]"
if dtype in ("array", "row", "map"):
if any(dtype.startswith(t) for t in ["array", "row", "map", "struct"]):
return "object"
if dtype == "geometry":
return "string"
Loading