Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: aws/aws-sdk-pandas
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: 3.6.0
Choose a base ref
...
head repository: aws/aws-sdk-pandas
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: 3.7.0
Choose a head ref

Commits on Feb 16, 2024

  1. fix: Index columns removed on s3.to_parquet (#2655)

    * first go at a failing test
    
    * pass missing dataset flag in test
    
    * because we partition, do not specify full parquet paths during write
    
    * use proper path in tests
    
    * use reset_index to allow dropping the entire index
    
    * test partitioning on full and partial index
    
    * need to validate schema on read for issue to surface
    
    * need to sort on index
    
    * cross-test without partitioning
    
    * print assertion error for remote debugging
    
    * simplify test to just assert schema validation
    
    * consistently handle regular and index columns casts
    
    * use equality assertion utility, drop unnecessary sort
    
    * add index partition test
    
    * reformat
    
    * undo categorical-specific dataframe creation in test
    
    * try again to expect the right dtypes
    
    * pull out toparquet kwargs
    
    * expect test to fail when using modin and partitioning on full index
    
    * manually assert unpartitioned index is still present, then reset full index
    
    * handle change in promotion kwargs for pyarrow 14+
    
    * move packaging import to correct location
    
    * fix types for promotion kwargs
    
    * test and handle unnamed index levels as well
    
    ---------
    
    Co-authored-by: Robert Schmidtke <robert.schmidtke@trailstonegroup.com>
    Co-authored-by: kukushking <kukushkin.anton@gmail.com>
    Co-authored-by: Leon Luttenberger <LeonLuttenberger@users.noreply.github.com>
    4 people authored Feb 16, 2024
    Copy the full SHA
    95e37bf View commit details

Commits on Feb 19, 2024

  1. chore(deps): bump cryptography from 42.0.0 to 42.0.2 (#2679)

    Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.0 to 42.0.2.
    - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
    - [Commits](pyca/cryptography@42.0.0...42.0.2)
    
    ---
    updated-dependencies:
    - dependency-name: cryptography
      dependency-type: indirect
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    dependabot[bot] authored Feb 19, 2024
    Copy the full SHA
    d6e1d66 View commit details
  2. chore(deps): bump the production-dependencies group with 2 updates (#…

    …2680)
    
    Bumps the production-dependencies group with 2 updates: [boto3](https://github.com/boto/boto3) and [botocore](https://github.com/boto/botocore).
    
    
    Updates `boto3` from 1.34.34 to 1.34.39
    - [Release notes](https://github.com/boto/boto3/releases)
    - [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst)
    - [Commits](boto/boto3@1.34.34...1.34.39)
    
    Updates `botocore` from 1.34.39 to 1.34.44
    - [Changelog](https://github.com/boto/botocore/blob/develop/CHANGELOG.rst)
    - [Commits](boto/botocore@1.34.39...1.34.44)
    
    ---
    updated-dependencies:
    - dependency-name: boto3
      dependency-type: direct:production
      update-type: version-update:semver-patch
      dependency-group: production-dependencies
    - dependency-name: botocore
      dependency-type: direct:production
      update-type: version-update:semver-patch
      dependency-group: production-dependencies
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    dependabot[bot] authored Feb 19, 2024
    Copy the full SHA
    2cca135 View commit details
  3. chore(deps-dev): bump the development-dependencies group with 6 updat…

    …es (#2681)
    
    Bumps the development-dependencies group with 6 updates:
    
    | Package | From | To |
    | --- | --- | --- |
    | [boto3-stubs](https://github.com/youtype/mypy_boto3_builder) | `1.34.39` | `1.34.44` |
    | [ruff](https://github.com/astral-sh/ruff) | `0.2.1` | `0.2.2` |
    | [moto](https://github.com/getmoto/moto) | `5.0.1` | `5.0.2` |
    | [pytest](https://github.com/pytest-dev/pytest) | `8.0.0` | `8.0.1` |
    | [tox](https://github.com/tox-dev/tox) | `4.12.1` | `4.13.0` |
    | [jupyterlab](https://github.com/jupyterlab/jupyterlab) | `4.1.0` | `4.1.1` |
    
    
    Updates `boto3-stubs` from 1.34.39 to 1.34.44
    - [Release notes](https://github.com/youtype/mypy_boto3_builder/releases)
    - [Commits](https://github.com/youtype/mypy_boto3_builder/commits)
    
    Updates `ruff` from 0.2.1 to 0.2.2
    - [Release notes](https://github.com/astral-sh/ruff/releases)
    - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
    - [Commits](astral-sh/ruff@v0.2.1...v0.2.2)
    
    Updates `moto` from 5.0.1 to 5.0.2
    - [Release notes](https://github.com/getmoto/moto/releases)
    - [Changelog](https://github.com/getmoto/moto/blob/master/CHANGELOG.md)
    - [Commits](getmoto/moto@5.0.1...5.0.2)
    
    Updates `pytest` from 8.0.0 to 8.0.1
    - [Release notes](https://github.com/pytest-dev/pytest/releases)
    - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
    - [Commits](pytest-dev/pytest@8.0.0...8.0.1)
    
    Updates `tox` from 4.12.1 to 4.13.0
    - [Release notes](https://github.com/tox-dev/tox/releases)
    - [Changelog](https://github.com/tox-dev/tox/blob/main/docs/changelog.rst)
    - [Commits](tox-dev/tox@4.12.1...4.13.0)
    
    Updates `jupyterlab` from 4.1.0 to 4.1.1
    - [Release notes](https://github.com/jupyterlab/jupyterlab/releases)
    - [Changelog](https://github.com/jupyterlab/jupyterlab/blob/main/CHANGELOG.md)
    - [Commits](https://github.com/jupyterlab/jupyterlab/compare/@jupyterlab/lsp@4.1.0...@jupyterlab/lsp@4.1.1)
    
    ---
    updated-dependencies:
    - dependency-name: boto3-stubs
      dependency-type: direct:development
      update-type: version-update:semver-patch
      dependency-group: development-dependencies
    - dependency-name: ruff
      dependency-type: direct:development
      update-type: version-update:semver-patch
      dependency-group: development-dependencies
    - dependency-name: moto
      dependency-type: direct:development
      update-type: version-update:semver-patch
      dependency-group: development-dependencies
    - dependency-name: pytest
      dependency-type: direct:development
      update-type: version-update:semver-patch
      dependency-group: development-dependencies
    - dependency-name: tox
      dependency-type: direct:development
      update-type: version-update:semver-minor
      dependency-group: development-dependencies
    - dependency-name: jupyterlab
      dependency-type: direct:development
      update-type: version-update:semver-patch
      dependency-group: development-dependencies
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    Co-authored-by: kukushking <kukushkin.anton@gmail.com>
    dependabot[bot] and kukushking authored Feb 19, 2024
    Copy the full SHA
    0d95026 View commit details
  4. fix: Missing timezone metadata (#2682)

    Signed-off-by: Anton Kukushkin <kukushkin.anton@gmail.com>
    kukushking authored Feb 19, 2024
    Copy the full SHA
    784978d View commit details

Commits on Feb 21, 2024

  1. fix: Update RDS cert bundle (#2670)

    * fix: Update RDS cert bundle
    
    Signed-off-by: Anton Kukushkin <kukushkin.anton@gmail.com>
    
    * use s3 url
    
    Signed-off-by: Anton Kukushkin <kukushkin.anton@gmail.com>
    
    * use CUSTOM_JDBC_CERT_STRING
    
    Signed-off-by: Anton Kukushkin <kukushkin.anton@gmail.com>
    
    * bump cdk version
    
    Signed-off-by: Anton Kukushkin <kukushkin.anton@gmail.com>
    
    * Revert "use CUSTOM_JDBC_CERT_STRING"
    
    This reverts commit 2f7e552.
    
    ---------
    
    Signed-off-by: Anton Kukushkin <kukushkin.anton@gmail.com>
    kukushking authored Feb 21, 2024
    Copy the full SHA
    d0081d3 View commit details
  2. Copy the full SHA
    3b96385 View commit details
  3. chore(deps): bump cryptography from 42.0.2 to 42.0.4 (#2683)

    Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.2 to 42.0.4.
    - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
    - [Commits](pyca/cryptography@42.0.2...42.0.4)
    
    ---
    updated-dependencies:
    - dependency-name: cryptography
      dependency-type: indirect
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    dependabot[bot] authored Feb 21, 2024
    Copy the full SHA
    186a534 View commit details

Commits on Feb 23, 2024

  1. fix: Tests - move certs to a public bucket (#2687)

    Signed-off-by: Anton Kukushkin <kukushkin.anton@gmail.com>
    kukushking authored Feb 23, 2024
    Copy the full SHA
    f673ab4 View commit details

Commits on Feb 26, 2024

  1. Copy the full SHA
    54afd65 View commit details

Commits on Feb 27, 2024

  1. chore(deps): bump the production-dependencies group with 2 updates (#…

    …2691)
    
    Bumps the production-dependencies group with 2 updates: [boto3](https://github.com/boto/boto3) and [botocore](https://github.com/boto/botocore).
    
    
    Updates `boto3` from 1.34.44 to 1.34.49
    - [Release notes](https://github.com/boto/boto3/releases)
    - [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst)
    - [Commits](boto/boto3@1.34.44...1.34.49)
    
    Updates `botocore` from 1.34.49 to 1.34.50
    - [Changelog](https://github.com/boto/botocore/blob/develop/CHANGELOG.rst)
    - [Commits](boto/botocore@1.34.49...1.34.50)
    
    ---
    updated-dependencies:
    - dependency-name: boto3
      dependency-type: direct:production
      update-type: version-update:semver-patch
      dependency-group: production-dependencies
    - dependency-name: botocore
      dependency-type: direct:production
      update-type: version-update:semver-patch
      dependency-group: production-dependencies
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    dependabot[bot] authored Feb 27, 2024
    Copy the full SHA
    676e1c1 View commit details

Commits on Feb 28, 2024

  1. Copy the full SHA
    fe2b225 View commit details

Commits on Feb 29, 2024

  1. Copy the full SHA
    00d3df0 View commit details
  2. Copy the full SHA
    9b63eff View commit details

Commits on Mar 1, 2024

  1. Copy the full SHA
    bf204b0 View commit details
  2. Copy the full SHA
    e81fd99 View commit details

Commits on Mar 4, 2024

  1. chore(deps): bump the production-dependencies group with 2 updates (#…

    …2699)
    
    Bumps the production-dependencies group with 2 updates: [boto3](https://github.com/boto/boto3) and [botocore](https://github.com/boto/botocore).
    
    
    Updates `boto3` from 1.34.49 to 1.34.50
    - [Release notes](https://github.com/boto/boto3/releases)
    - [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst)
    - [Commits](boto/boto3@1.34.49...1.34.50)
    
    Updates `botocore` from 1.34.50 to 1.34.54
    - [Changelog](https://github.com/boto/botocore/blob/develop/CHANGELOG.rst)
    - [Commits](boto/botocore@1.34.50...1.34.54)
    
    ---
    updated-dependencies:
    - dependency-name: boto3
      dependency-type: direct:production
      update-type: version-update:semver-patch
      dependency-group: production-dependencies
    - dependency-name: botocore
      dependency-type: direct:production
      update-type: version-update:semver-patch
      dependency-group: production-dependencies
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    dependabot[bot] authored Mar 4, 2024
    Copy the full SHA
    4816e5e View commit details
  2. remove awswrangler README from site-packages folder (#2698)

    Co-authored-by: Al Johri <aljohri@gmail.com>
    Co-authored-by: jaidisido <jaidisido@gmail.com>
    3 people authored Mar 4, 2024
    Copy the full SHA
    e7ecd81 View commit details

Commits on Mar 5, 2024

  1. Copy the full SHA
    5b20362 View commit details
  2. fix: indent categories correctly in pyarrow_additional_kwargs (#2701)

    Co-authored-by: Leon Luttenberger <LeonLuttenberger@users.noreply.github.com>
    jaidisido and LeonLuttenberger authored Mar 5, 2024
    Copy the full SHA
    77f2dc2 View commit details
  3. Copy the full SHA
    d291252 View commit details
  4. Copy the full SHA
    5f36e61 View commit details
Showing with 1,140 additions and 2,941 deletions.
  1. +1 −1 .bumpversion.toml
  2. +1 −1 CONTRIBUTING.md
  3. +40 −42 README.md
  4. +1 −1 VERSION
  5. +0 −2 awswrangler/__init__.py
  6. +1 −1 awswrangler/__metadata__.py
  7. +9 −6 awswrangler/_arrow.py
  8. +0 −20 awswrangler/_config.py
  9. +18 −3 awswrangler/_data_types.py
  10. +6 −15 awswrangler/_utils.py
  11. +10 −10 awswrangler/athena/_read.py
  12. +4 −16 awswrangler/athena/_utils.py
  13. +8 −0 awswrangler/athena/_write_iceberg.py
  14. +3 −14 awswrangler/catalog/_add.py
  15. +17 −77 awswrangler/catalog/_create.py
  16. +4 −25 awswrangler/catalog/_delete.py
  17. +9 −135 awswrangler/catalog/_get.py
  18. +1 −23 awswrangler/catalog/_utils.py
  19. +0 −2 awswrangler/distributed/ray/_register.py
  20. +2 −0 awswrangler/distributed/ray/datasources/arrow_parquet_base_datasource.py
  21. +3 −0 awswrangler/distributed/ray/modin/s3/_read_parquet.py
  22. +0 −30 awswrangler/distributed/ray/modin/s3/_write_dataset.py
  23. +2 −0 awswrangler/distributed/ray/modin/s3/_write_orc.py
  24. +2 −0 awswrangler/distributed/ray/modin/s3/_write_parquet.py
  25. +0 −4 awswrangler/exceptions.py
  26. +0 −28 awswrangler/lakeformation/__init__.py
  27. +0 −309 awswrangler/lakeformation/_read.py
  28. +0 −347 awswrangler/lakeformation/_utils.py
  29. +1 −4 awswrangler/s3/_read_excel.py
  30. +2 −2 awswrangler/s3/_read_orc.py
  31. +42 −6 awswrangler/s3/_read_parquet.py
  32. +9 −1 awswrangler/s3/_read_parquet.pyi
  33. +3 −3 awswrangler/s3/_read_text.py
  34. +7 −39 awswrangler/s3/_write.py
  35. +13 −100 awswrangler/s3/_write_dataset.py
  36. +1 −4 awswrangler/s3/_write_excel.py
  37. +6 −29 awswrangler/s3/_write_orc.py
  38. +33 −31 awswrangler/s3/_write_parquet.py
  39. +9 −103 awswrangler/s3/_write_text.py
  40. +28 −3 awswrangler/typing.py
  41. +0 −18 docs/source/api.rst
  42. +1 −1 docs/source/install.rst
  43. +289 −289 docs/source/layers.rst
  44. +0 −4 docs/source/scale.rst
  45. +122 −137 poetry.lock
  46. +6 −6 pyproject.toml
  47. +1 −0 test_infra/app.py
  48. +13 −31 test_infra/poetry.lock
  49. +4 −6 test_infra/pyproject.toml
  50. +1 −20 test_infra/stacks/base_stack.py
  51. +6 −34 test_infra/stacks/cleanrooms_stack.py
  52. +2 −18 test_infra/stacks/databases_stack.py
  53. +5 −12 tests/_utils.py
  54. +54 −5 tests/conftest.py
  55. +0 −27 tests/load/test_databases.py
  56. +40 −24 tests/unit/test_athena.py
  57. +53 −0 tests/unit/test_athena_iceberg.py
  58. +31 −100 tests/unit/test_catalog.py
  59. +8 −8 tests/unit/test_cleanrooms.py
  60. +0 −2 tests/unit/test_config.py
  61. +0 −219 tests/unit/test_lakeformation.py
  62. +1 −1 tests/unit/test_metadata.py
  63. +0 −24 tests/unit/test_pandas_pyarrow_dtype_backend.py
  64. +11 −38 tests/unit/test_routines.py
  65. +161 −0 tests/unit/test_s3_parquet.py
  66. +10 −10 tutorials/001 - Introduction.ipynb
  67. +15 −15 tutorials/007 - Redshift, MySQL, PostgreSQL, SQL Server, Oracle.ipynb
  68. +3 −3 tutorials/014 - Schema Evolution.ipynb
  69. +1 −11 tutorials/021 - Global Configurations.ipynb
  70. +1 −1 tutorials/022 - Writing Partitions Concurrently.ipynb
  71. +1 −1 tutorials/023 - Flexible Partitions Filter.ipynb
  72. +4 −4 tutorials/030 - Data Api.ipynb
  73. +0 −435 tutorials/032 - Lake Formation Governed Tables.ipynb
2 changes: 1 addition & 1 deletion .bumpversion.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "3.6.0"
current_version = "3.7.0"
commit = false
tag = false
tag_name = "{new_version}"
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -224,7 +224,7 @@ To run all database MySQL tests (Using 8 parallel processes):

``pytest -n 8 tests/unit/test_mysql.py``

To run all tests for all python versions (assuming Amazon QuickSight is activated and the optional stacks deployed):
To run all tests for all python versions (assuming Amazon QuickSight is activated and the optional stack deployed):

``./test.sh``

82 changes: 40 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
@@ -100,27 +100,27 @@ FROM "sampleDB"."sampleTable" ORDER BY time DESC LIMIT 3
## At scale
AWS SDK for pandas can also run your workflows at scale by leveraging [Modin](https://modin.readthedocs.io/en/stable/) and [Ray](https://www.ray.io/). Both projects aim to speed up data workloads by distributing processing over a cluster of workers.

The quickest way to get started is to use AWS Glue with Ray. Read our [docs](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/scale.html), our blogs ([1](https://aws.amazon.com/blogs/big-data/scale-aws-sdk-for-pandas-workloads-with-aws-glue-for-ray/)/[2](https://aws.amazon.com/blogs/big-data/advanced-patterns-with-aws-sdk-for-pandas-on-aws-glue-for-ray/)), or head to our latest [tutorials](https://github.com/aws/aws-sdk-pandas/tree/main/tutorials) to discover even more features.
The quickest way to get started is to use AWS Glue with Ray. Read our [docs](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/scale.html), our blogs ([1](https://aws.amazon.com/blogs/big-data/scale-aws-sdk-for-pandas-workloads-with-aws-glue-for-ray/)/[2](https://aws.amazon.com/blogs/big-data/advanced-patterns-with-aws-sdk-for-pandas-on-aws-glue-for-ray/)), or head to our latest [tutorials](https://github.com/aws/aws-sdk-pandas/tree/main/tutorials) to discover even more features.

> ⚠️ **Ray is currently not available for Python 3.12. While AWS SDK for pandas supports Python 3.12, it cannot be used at scale.**
## [Read The Docs](https://aws-sdk-pandas.readthedocs.io/)

- [**What is AWS SDK for pandas?**](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/about.html)
- [**Install**](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/install.html)
- [PyPi (pip)](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/install.html#pypi-pip)
- [Conda](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/install.html#conda)
- [AWS Lambda Layer](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/install.html#aws-lambda-layer)
- [AWS Glue Python Shell Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/install.html#aws-glue-python-shell-jobs)
- [AWS Glue PySpark Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/install.html#aws-glue-pyspark-jobs)
- [Amazon SageMaker Notebook](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/install.html#amazon-sagemaker-notebook)
- [Amazon SageMaker Notebook Lifecycle](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/install.html#amazon-sagemaker-notebook-lifecycle)
- [EMR](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/install.html#emr)
- [From source](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/install.html#from-source)
- [**At scale**](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/scale.html)
- [Getting Started](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/scale.html#getting-started)
- [Supported APIs](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/scale.html#supported-apis)
- [Resources](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/scale.html#resources)
- [**What is AWS SDK for pandas?**](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/about.html)
- [**Install**](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/install.html)
- [PyPi (pip)](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/install.html#pypi-pip)
- [Conda](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/install.html#conda)
- [AWS Lambda Layer](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/install.html#aws-lambda-layer)
- [AWS Glue Python Shell Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/install.html#aws-glue-python-shell-jobs)
- [AWS Glue PySpark Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/install.html#aws-glue-pyspark-jobs)
- [Amazon SageMaker Notebook](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/install.html#amazon-sagemaker-notebook)
- [Amazon SageMaker Notebook Lifecycle](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/install.html#amazon-sagemaker-notebook-lifecycle)
- [EMR](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/install.html#emr)
- [From source](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/install.html#from-source)
- [**At scale**](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/scale.html)
- [Getting Started](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/scale.html#getting-started)
- [Supported APIs](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/scale.html#supported-apis)
- [Resources](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/scale.html#resources)
- [**Tutorials**](https://github.com/aws/aws-sdk-pandas/tree/main/tutorials)
- [001 - Introduction](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/001%20-%20Introduction.ipynb)
- [002 - Sessions](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/002%20-%20Sessions.ipynb)
@@ -153,7 +153,6 @@ The quickest way to get started is to use AWS Glue with Ray. Read our [docs](htt
- [029 - S3 Select](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/029%20-%20S3%20Select.ipynb)
- [030 - Data Api](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/030%20-%20Data%20Api.ipynb)
- [031 - OpenSearch](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/031%20-%20OpenSearch.ipynb)
- [032 - Lake Formation Governed Tables](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/032%20-%20Lake%20Formation%20Governed%20Tables.ipynb)
- [033 - Amazon Neptune](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/033%20-%20Amazon%20Neptune.ipynb)
- [034 - Distributing Calls Using Ray](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/034%20-%20Distributing%20Calls%20using%20Ray.ipynb)
- [035 - Distributing Calls on Ray Remote Cluster](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/035%20-%20Distributing%20Calls%20on%20Ray%20Remote%20Cluster.ipynb)
@@ -163,31 +162,30 @@ The quickest way to get started is to use AWS Glue with Ray. Read our [docs](htt
- [039 - Athena Iceberg](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/039%20-%20Athena%20Iceberg.ipynb)
- [040 - EMR Serverless](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/040%20-%20EMR%20Serverless.ipynb)
- [041 - Apache Spark on Amazon Athena](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/041%20-%20Apache%20Spark%20on%20Amazon%20Athena.ipynb)
- [**API Reference**](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html)
- [Amazon S3](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html#amazon-s3)
- [AWS Glue Catalog](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html#aws-glue-catalog)
- [Amazon Athena](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html#amazon-athena)
- [AWS Lake Formation](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html#aws-lake-formation)
- [Amazon Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html#amazon-redshift)
- [PostgreSQL](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html#postgresql)
- [MySQL](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html#mysql)
- [SQL Server](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html#sqlserver)
- [Oracle](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html#oracle)
- [Data API Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html#data-api-redshift)
- [Data API RDS](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html#data-api-rds)
- [OpenSearch](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html#opensearch)
- [AWS Glue Data Quality](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html#aws-glue-data-quality)
- [Amazon Neptune](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html#amazon-neptune)
- [DynamoDB](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html#dynamodb)
- [Amazon Timestream](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html#amazon-timestream)
- [Amazon EMR](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html#amazon-emr)
- [Amazon CloudWatch Logs](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html#amazon-cloudwatch-logs)
- [Amazon Chime](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html#amazon-chime)
- [Amazon QuickSight](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html#amazon-quicksight)
- [AWS STS](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html#aws-sts)
- [AWS Secrets Manager](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html#aws-secrets-manager)
- [Global Configurations](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html#global-configurations)
- [Distributed - Ray](https://aws-sdk-pandas.readthedocs.io/en/3.6.0/api.html#distributed-ray)
- [**API Reference**](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html)
- [Amazon S3](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#amazon-s3)
- [AWS Glue Catalog](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#aws-glue-catalog)
- [Amazon Athena](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#amazon-athena)
- [Amazon Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#amazon-redshift)
- [PostgreSQL](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#postgresql)
- [MySQL](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#mysql)
- [SQL Server](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#sqlserver)
- [Oracle](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#oracle)
- [Data API Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#data-api-redshift)
- [Data API RDS](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#data-api-rds)
- [OpenSearch](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#opensearch)
- [AWS Glue Data Quality](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#aws-glue-data-quality)
- [Amazon Neptune](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#amazon-neptune)
- [DynamoDB](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#dynamodb)
- [Amazon Timestream](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#amazon-timestream)
- [Amazon EMR](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#amazon-emr)
- [Amazon CloudWatch Logs](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#amazon-cloudwatch-logs)
- [Amazon Chime](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#amazon-chime)
- [Amazon QuickSight](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#amazon-quicksight)
- [AWS STS](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#aws-sts)
- [AWS Secrets Manager](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#aws-secrets-manager)
- [Global Configurations](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#global-configurations)
- [Distributed - Ray](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#distributed-ray)
- [**License**](https://github.com/aws/aws-sdk-pandas/blob/main/LICENSE.txt)
- [**Contributing**](https://github.com/aws/aws-sdk-pandas/blob/main/CONTRIBUTING.md)

2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.6.0
3.7.0
2 changes: 0 additions & 2 deletions awswrangler/__init__.py
Original file line number Diff line number Diff line change
@@ -19,7 +19,6 @@
emr,
emr_serverless,
exceptions,
lakeformation,
mysql,
neptune,
opensearch,
@@ -58,7 +57,6 @@
"s3",
"sts",
"redshift",
"lakeformation",
"mysql",
"neptune",
"postgresql",
2 changes: 1 addition & 1 deletion awswrangler/__metadata__.py
Original file line number Diff line number Diff line change
@@ -7,5 +7,5 @@

__title__: str = "awswrangler"
__description__: str = "Pandas on AWS."
__version__: str = "3.6.0"
__version__: str = "3.7.0"
__license__: str = "Apache License 2.0"
15 changes: 9 additions & 6 deletions awswrangler/_arrow.py
Original file line number Diff line number Diff line change
@@ -70,12 +70,15 @@ def _apply_timezone(df: pd.DataFrame, metadata: dict[str, Any]) -> pd.DataFrame:
else:
continue
if col_name in df.columns and c["pandas_type"] == "datetimetz":
timezone: datetime.tzinfo = pa.lib.string_to_tzinfo(c["metadata"]["timezone"])
_logger.debug("applying timezone (%s) on column %s", timezone, col_name)
if hasattr(df[col_name].dtype, "tz") is False:
df[col_name] = df[col_name].dt.tz_localize(tz="UTC")
if timezone is not None and timezone != pytz.UTC and hasattr(df[col_name].dt, "tz_convert"):
df[col_name] = df[col_name].dt.tz_convert(tz=timezone)
column_metadata: dict[str, Any] = c["metadata"] if c.get("metadata") else {}
timezone_str: str | None = column_metadata.get("timezone")
if timezone_str:
timezone: datetime.tzinfo = pa.lib.string_to_tzinfo(timezone_str)
_logger.debug("applying timezone (%s) on column %s", timezone, col_name)
if hasattr(df[col_name].dtype, "tz") is False:
df[col_name] = df[col_name].dt.tz_localize(tz="UTC")
if timezone is not None and timezone != pytz.UTC and hasattr(df[col_name].dt, "tz_convert"):
df[col_name] = df[col_name].dt.tz_convert(tz=timezone)
return df


20 changes: 0 additions & 20 deletions awswrangler/_config.py
Original file line number Diff line number Diff line change
@@ -44,7 +44,6 @@ class _ConfigArg(NamedTuple):
"max_local_cache_entries": _ConfigArg(dtype=int, nullable=False, parent_parameter_key="athena_cache_settings"),
"athena_query_wait_polling_delay": _ConfigArg(dtype=float, nullable=False),
"cloudwatch_query_wait_polling_delay": _ConfigArg(dtype=float, nullable=False),
"lakeformation_query_wait_polling_delay": _ConfigArg(dtype=float, nullable=False),
"neptune_load_wait_polling_delay": _ConfigArg(dtype=float, nullable=False),
"timestream_batch_load_wait_polling_delay": _ConfigArg(dtype=float, nullable=False),
"emr_serverless_job_wait_polling_delay": _ConfigArg(dtype=float, nullable=False),
@@ -61,7 +60,6 @@ class _ConfigArg(NamedTuple):
"redshift_endpoint_url": _ConfigArg(dtype=str, nullable=True, enforced=True, loaded=True),
"kms_endpoint_url": _ConfigArg(dtype=str, nullable=True, enforced=True, loaded=True),
"emr_endpoint_url": _ConfigArg(dtype=str, nullable=True, enforced=True, loaded=True),
"lakeformation_endpoint_url": _ConfigArg(dtype=str, nullable=True, enforced=True, loaded=True),
"dynamodb_endpoint_url": _ConfigArg(dtype=str, nullable=True, enforced=True, loaded=True),
"secretsmanager_endpoint_url": _ConfigArg(dtype=str, nullable=True, enforced=True, loaded=True),
"timestream_query_endpoint_url": _ConfigArg(dtype=str, nullable=True, enforced=True, loaded=True),
@@ -353,15 +351,6 @@ def cloudwatch_query_wait_polling_delay(self) -> float:
def cloudwatch_query_wait_polling_delay(self, value: float) -> None:
self._set_config_value(key="cloudwatch_query_wait_polling_delay", value=value)

@property
def lakeformation_query_wait_polling_delay(self) -> float:
"""Property lakeformation_query_wait_polling_delay."""
return cast(float, self["lakeformation_query_wait_polling_delay"])

@lakeformation_query_wait_polling_delay.setter
def lakeformation_query_wait_polling_delay(self, value: float) -> None:
self._set_config_value(key="lakeformation_query_wait_polling_delay", value=value)

@property
def neptune_load_wait_polling_delay(self) -> float:
"""Property neptune_load_wait_polling_delay."""
@@ -497,15 +486,6 @@ def emr_endpoint_url(self) -> str | None:
def emr_endpoint_url(self, value: str | None) -> None:
self._set_config_value(key="emr_endpoint_url", value=value)

@property
def lakeformation_endpoint_url(self) -> str | None:
"""Property lakeformation_endpoint_url."""
return cast(Optional[str], self["lakeformation_endpoint_url"])

@lakeformation_endpoint_url.setter
def lakeformation_endpoint_url(self, value: str | None) -> None:
self._set_config_value(key="lakeformation_endpoint_url", value=value)

@property
def dynamodb_endpoint_url(self) -> str | None:
"""Property dynamodb_endpoint_url."""
21 changes: 18 additions & 3 deletions awswrangler/_data_types.py
Original file line number Diff line number Diff line change
@@ -563,10 +563,12 @@ def pyarrow_types_from_pandas( # noqa: PLR0912,PLR0915
for field in fields:
name = str(field.name)
# Check if any of the index columns must be ignored
if name not in ignore_cols:
if name in ignore_cols:
cols_dtypes[name] = None
else:
_logger.debug("Inferring PyArrow type from index: %s", name)
cols_dtypes[name] = field.type
indexes.append(name)
indexes.append(name)

# Merging Index
sorted_cols: list[str] = indexes + list(df.columns) if index_left is True else list(df.columns) + indexes
@@ -693,13 +695,26 @@ def pyarrow_schema_from_pandas(
df=df, index=index, ignore_cols=ignore_plus
)
for k, v in casts.items():
if (k in df.columns) and (k not in ignore):
if (k not in ignore) and (k in df.columns or _is_index_name(k, df.index)):
columns_types[k] = athena2pyarrow(dtype=v)
columns_types = {k: v for k, v in columns_types.items() if v is not None}
_logger.debug("columns_types: %s", columns_types)
return pa.schema(fields=columns_types)


def _is_index_name(name: str, index: pd.Index) -> bool:
if name in index.names:
# named index level
return True

if (match := re.match(r"__index_level_(?P<level>\d+)__", name)) is not None:
# unnamed index level
if len(index.names) > (level := int(match.group("level"))):
return index.names[level] is None

return False


def athena_types_from_pyarrow_schema(
schema: pa.Schema,
ignore_null: bool = False,
Loading