Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: aws/aws-sdk-pandas
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: 3.7.0
Choose a base ref
...
head repository: aws/aws-sdk-pandas
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: 3.7.1
Choose a head ref
  • 7 commits
  • 31 files changed
  • 3 contributors

Commits on Mar 7, 2024

  1. docs: fix redshift.to_sql doc indentation error (#2706)

    LeonLuttenberger authored Mar 7, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    67cfd73 View commit details
  2. fix: pin pyarrow to 8+ (#2709)

    jaidisido authored Mar 7, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    4e6dd49 View commit details
  3. chore: Bump version to 3.7.1 (#2712)

    LeonLuttenberger authored Mar 7, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    495c1ec View commit details
  4. test: Fix testing infrastructure for Neptune (#2713)

    LeonLuttenberger authored Mar 7, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    41a8db2 View commit details
  5. fix: reverse introduced breaking change in _create_table (#2711)

    Co-authored-by: Leon Luttenberger <LeonLuttenberger@users.noreply.github.com>
    jaidisido and LeonLuttenberger authored Mar 7, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    0d0e78d View commit details
  6. fix: Pin Cython version to workaround issue when building PyArrow fro…

    …m source (#2714)
    LeonLuttenberger authored Mar 7, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    3c73628 View commit details
  7. docs: update layers.rst

    LeonLuttenberger committed Mar 7, 2024

    Unverified

    This user has not yet uploaded their public signing key.
    Copy the full SHA
    da4ba40 View commit details
2 changes: 1 addition & 1 deletion .bumpversion.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "3.7.0"
current_version = "3.7.1"
commit = false
tag = false
tag_name = "{new_version}"
21 changes: 13 additions & 8 deletions .github/workflows/cfn-nag.yml
Original file line number Diff line number Diff line change
@@ -42,11 +42,20 @@ jobs:
npm install -g aws-cdk
cdk --version
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
python-version: 3.11
- name: Install Requirements
run: |
cd test_infra
python -m pip install --upgrade pip
python -m pip install poetry
poetry env use python
poetry env info
source $(poetry env info --path)/bin/activate
poetry install -vvv
- name: Set up cdk.json
run: |
cd test_infra
cat <<EOT >> cdk.context.json
@@ -61,12 +70,8 @@ jobs:
]
}
EOT
python -m pip install --upgrade pip
python -m pip install poetry
poetry env use python
poetry env info
source $(poetry env info --path)/bin/activate
poetry install -vvv
cat cdk.json | jq -r '.context.databases.neptune = true' | jq -r '.context.databases.oracle = true' | jq -r '.context.databases.sqlserver = true' > overwrite.cdk.json
rm cdk.json && mv overwrite.cdk.json cdk.json
- name: CDK Synth
run: |
cd test_infra
80 changes: 40 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
@@ -100,27 +100,27 @@ FROM "sampleDB"."sampleTable" ORDER BY time DESC LIMIT 3
## At scale
AWS SDK for pandas can also run your workflows at scale by leveraging [Modin](https://modin.readthedocs.io/en/stable/) and [Ray](https://www.ray.io/). Both projects aim to speed up data workloads by distributing processing over a cluster of workers.

The quickest way to get started is to use AWS Glue with Ray. Read our [docs](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/scale.html), our blogs ([1](https://aws.amazon.com/blogs/big-data/scale-aws-sdk-for-pandas-workloads-with-aws-glue-for-ray/)/[2](https://aws.amazon.com/blogs/big-data/advanced-patterns-with-aws-sdk-for-pandas-on-aws-glue-for-ray/)), or head to our latest [tutorials](https://github.com/aws/aws-sdk-pandas/tree/main/tutorials) to discover even more features.
The quickest way to get started is to use AWS Glue with Ray. Read our [docs](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/scale.html), our blogs ([1](https://aws.amazon.com/blogs/big-data/scale-aws-sdk-for-pandas-workloads-with-aws-glue-for-ray/)/[2](https://aws.amazon.com/blogs/big-data/advanced-patterns-with-aws-sdk-for-pandas-on-aws-glue-for-ray/)), or head to our latest [tutorials](https://github.com/aws/aws-sdk-pandas/tree/main/tutorials) to discover even more features.

> ⚠️ **Ray is currently not available for Python 3.12. While AWS SDK for pandas supports Python 3.12, it cannot be used at scale.**
## [Read The Docs](https://aws-sdk-pandas.readthedocs.io/)

- [**What is AWS SDK for pandas?**](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/about.html)
- [**Install**](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/install.html)
- [PyPi (pip)](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/install.html#pypi-pip)
- [Conda](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/install.html#conda)
- [AWS Lambda Layer](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/install.html#aws-lambda-layer)
- [AWS Glue Python Shell Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/install.html#aws-glue-python-shell-jobs)
- [AWS Glue PySpark Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/install.html#aws-glue-pyspark-jobs)
- [Amazon SageMaker Notebook](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/install.html#amazon-sagemaker-notebook)
- [Amazon SageMaker Notebook Lifecycle](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/install.html#amazon-sagemaker-notebook-lifecycle)
- [EMR](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/install.html#emr)
- [From source](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/install.html#from-source)
- [**At scale**](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/scale.html)
- [Getting Started](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/scale.html#getting-started)
- [Supported APIs](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/scale.html#supported-apis)
- [Resources](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/scale.html#resources)
- [**What is AWS SDK for pandas?**](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/about.html)
- [**Install**](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/install.html)
- [PyPi (pip)](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/install.html#pypi-pip)
- [Conda](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/install.html#conda)
- [AWS Lambda Layer](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/install.html#aws-lambda-layer)
- [AWS Glue Python Shell Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/install.html#aws-glue-python-shell-jobs)
- [AWS Glue PySpark Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/install.html#aws-glue-pyspark-jobs)
- [Amazon SageMaker Notebook](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/install.html#amazon-sagemaker-notebook)
- [Amazon SageMaker Notebook Lifecycle](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/install.html#amazon-sagemaker-notebook-lifecycle)
- [EMR](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/install.html#emr)
- [From source](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/install.html#from-source)
- [**At scale**](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/scale.html)
- [Getting Started](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/scale.html#getting-started)
- [Supported APIs](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/scale.html#supported-apis)
- [Resources](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/scale.html#resources)
- [**Tutorials**](https://github.com/aws/aws-sdk-pandas/tree/main/tutorials)
- [001 - Introduction](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/001%20-%20Introduction.ipynb)
- [002 - Sessions](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/002%20-%20Sessions.ipynb)
@@ -162,30 +162,30 @@ The quickest way to get started is to use AWS Glue with Ray. Read our [docs](htt
- [039 - Athena Iceberg](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/039%20-%20Athena%20Iceberg.ipynb)
- [040 - EMR Serverless](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/040%20-%20EMR%20Serverless.ipynb)
- [041 - Apache Spark on Amazon Athena](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/041%20-%20Apache%20Spark%20on%20Amazon%20Athena.ipynb)
- [**API Reference**](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html)
- [Amazon S3](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#amazon-s3)
- [AWS Glue Catalog](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#aws-glue-catalog)
- [Amazon Athena](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#amazon-athena)
- [Amazon Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#amazon-redshift)
- [PostgreSQL](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#postgresql)
- [MySQL](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#mysql)
- [SQL Server](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#sqlserver)
- [Oracle](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#oracle)
- [Data API Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#data-api-redshift)
- [Data API RDS](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#data-api-rds)
- [OpenSearch](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#opensearch)
- [AWS Glue Data Quality](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#aws-glue-data-quality)
- [Amazon Neptune](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#amazon-neptune)
- [DynamoDB](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#dynamodb)
- [Amazon Timestream](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#amazon-timestream)
- [Amazon EMR](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#amazon-emr)
- [Amazon CloudWatch Logs](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#amazon-cloudwatch-logs)
- [Amazon Chime](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#amazon-chime)
- [Amazon QuickSight](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#amazon-quicksight)
- [AWS STS](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#aws-sts)
- [AWS Secrets Manager](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#aws-secrets-manager)
- [Global Configurations](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#global-configurations)
- [Distributed - Ray](https://aws-sdk-pandas.readthedocs.io/en/3.7.0/api.html#distributed-ray)
- [**API Reference**](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/api.html)
- [Amazon S3](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/api.html#amazon-s3)
- [AWS Glue Catalog](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/api.html#aws-glue-catalog)
- [Amazon Athena](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/api.html#amazon-athena)
- [Amazon Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/api.html#amazon-redshift)
- [PostgreSQL](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/api.html#postgresql)
- [MySQL](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/api.html#mysql)
- [SQL Server](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/api.html#sqlserver)
- [Oracle](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/api.html#oracle)
- [Data API Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/api.html#data-api-redshift)
- [Data API RDS](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/api.html#data-api-rds)
- [OpenSearch](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/api.html#opensearch)
- [AWS Glue Data Quality](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/api.html#aws-glue-data-quality)
- [Amazon Neptune](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/api.html#amazon-neptune)
- [DynamoDB](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/api.html#dynamodb)
- [Amazon Timestream](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/api.html#amazon-timestream)
- [Amazon EMR](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/api.html#amazon-emr)
- [Amazon CloudWatch Logs](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/api.html#amazon-cloudwatch-logs)
- [Amazon Chime](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/api.html#amazon-chime)
- [Amazon QuickSight](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/api.html#amazon-quicksight)
- [AWS STS](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/api.html#aws-sts)
- [AWS Secrets Manager](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/api.html#aws-secrets-manager)
- [Global Configurations](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/api.html#global-configurations)
- [Distributed - Ray](https://aws-sdk-pandas.readthedocs.io/en/3.7.1/api.html#distributed-ray)
- [**License**](https://github.com/aws/aws-sdk-pandas/blob/main/LICENSE.txt)
- [**Contributing**](https://github.com/aws/aws-sdk-pandas/blob/main/CONTRIBUTING.md)

2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.7.0
3.7.1
2 changes: 1 addition & 1 deletion awswrangler/__metadata__.py
Original file line number Diff line number Diff line change
@@ -7,5 +7,5 @@

__title__: str = "awswrangler"
__description__: str = "Pandas on AWS."
__version__: str = "3.7.0"
__version__: str = "3.7.1"
__license__: str = "Apache License 2.0"
16 changes: 8 additions & 8 deletions awswrangler/athena/_read.py
Original file line number Diff line number Diff line change
@@ -792,11 +792,11 @@ def read_sql_query(
**Related tutorial:**
- `Amazon Athena <https://aws-sdk-pandas.readthedocs.io/en/3.7.0/
- `Amazon Athena <https://aws-sdk-pandas.readthedocs.io/en/3.7.1/
tutorials/006%20-%20Amazon%20Athena.html>`_
- `Athena Cache <https://aws-sdk-pandas.readthedocs.io/en/3.7.0/
- `Athena Cache <https://aws-sdk-pandas.readthedocs.io/en/3.7.1/
tutorials/019%20-%20Athena%20Cache.html>`_
- `Global Configurations <https://aws-sdk-pandas.readthedocs.io/en/3.7.0/
- `Global Configurations <https://aws-sdk-pandas.readthedocs.io/en/3.7.1/
tutorials/021%20-%20Global%20Configurations.html>`_
**There are three approaches available through ctas_approach and unload_approach parameters:**
@@ -860,7 +860,7 @@ def read_sql_query(
/athena.html#Athena.Client.get_query_execution>`_ .
For a practical example check out the
`related tutorial <https://aws-sdk-pandas.readthedocs.io/en/3.7.0/
`related tutorial <https://aws-sdk-pandas.readthedocs.io/en/3.7.1/
tutorials/024%20-%20Athena%20Query%20Metadata.html>`_!
@@ -1137,11 +1137,11 @@ def read_sql_table(
**Related tutorial:**
- `Amazon Athena <https://aws-sdk-pandas.readthedocs.io/en/3.7.0/
- `Amazon Athena <https://aws-sdk-pandas.readthedocs.io/en/3.7.1/
tutorials/006%20-%20Amazon%20Athena.html>`_
- `Athena Cache <https://aws-sdk-pandas.readthedocs.io/en/3.7.0/
- `Athena Cache <https://aws-sdk-pandas.readthedocs.io/en/3.7.1/
tutorials/019%20-%20Athena%20Cache.html>`_
- `Global Configurations <https://aws-sdk-pandas.readthedocs.io/en/3.7.0/
- `Global Configurations <https://aws-sdk-pandas.readthedocs.io/en/3.7.1/
tutorials/021%20-%20Global%20Configurations.html>`_
**There are three approaches available through ctas_approach and unload_approach parameters:**
@@ -1205,7 +1205,7 @@ def read_sql_table(
/athena.html#Athena.Client.get_query_execution>`_ .
For a practical example check out the
`related tutorial <https://aws-sdk-pandas.readthedocs.io/en/3.7.0/
`related tutorial <https://aws-sdk-pandas.readthedocs.io/en/3.7.1/
tutorials/024%20-%20Athena%20Query%20Metadata.html>`_!
9 changes: 5 additions & 4 deletions awswrangler/catalog/_create.py
Original file line number Diff line number Diff line change
@@ -143,12 +143,13 @@ def _create_table( # noqa: PLR0912,PLR0915
DatabaseName=database,
TableInput=table_input,
)
if table_exist and mode in ("overwrite", "update"):
if table_exist:
_logger.debug("Updating table (%s)...", mode)
args["SkipArchive"] = skip_archive
if mode == "overwrite":
delete_all_partitions(table=table, database=database, catalog_id=catalog_id, boto3_session=boto3_session)
client_glue.update_table(**args)
if mode in ["overwrite", "update"]:
client_glue.update_table(**args)
else:
try:
_logger.debug("Creating table (%s)...", mode)
@@ -1079,7 +1080,7 @@ def create_csv_table(
If True allows schema evolution (new or missing columns), otherwise a exception will be raised.
(Only considered if dataset=True and mode in ("append", "overwrite_partitions"))
Related tutorial:
https://aws-sdk-pandas.readthedocs.io/en/3.7.0/tutorials/014%20-%20Schema%20Evolution.html
https://aws-sdk-pandas.readthedocs.io/en/3.7.1/tutorials/014%20-%20Schema%20Evolution.html
sep : str
String of length 1. Field delimiter for the output file.
skip_header_line_count : Optional[int]
@@ -1260,7 +1261,7 @@ def create_json_table(
If True allows schema evolution (new or missing columns), otherwise a exception will be raised.
(Only considered if dataset=True and mode in ("append", "overwrite_partitions"))
Related tutorial:
https://aws-sdk-pandas.readthedocs.io/en/3.7.0/tutorials/014%20-%20Schema%20Evolution.html
https://aws-sdk-pandas.readthedocs.io/en/3.7.1/tutorials/014%20-%20Schema%20Evolution.html
serde_library : Optional[str]
Specifies the SerDe Serialization library which will be used. You need to provide the Class library name
as a string.
2 changes: 1 addition & 1 deletion awswrangler/redshift/_write.py
Original file line number Diff line number Diff line change
@@ -122,7 +122,7 @@ def to_sql(
Dictionary of columns names and Redshift types to be casted.
Useful when you have columns with undetermined or mixed data types.
(e.g. {'col name': 'VARCHAR(10)', 'col2 name': 'FLOAT'})
diststyle : str
diststyle : str
Redshift distribution styles. Must be in ["AUTO", "EVEN", "ALL", "KEY"].
https://docs.aws.amazon.com/redshift/latest/dg/t_Distributing_data.html
distkey : str, optional
4 changes: 2 additions & 2 deletions awswrangler/s3/_read_orc.py
Original file line number Diff line number Diff line change
@@ -224,7 +224,7 @@ def read_orc(
must return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-sdk-pandas.readthedocs.io/en/3.7.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-sdk-pandas.readthedocs.io/en/3.7.1/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
columns : List[str], optional
List of columns to read from the file(s).
validate_schema : bool, default False
@@ -386,7 +386,7 @@ def read_orc_table(
must return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-sdk-pandas.readthedocs.io/en/3.7.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-sdk-pandas.readthedocs.io/en/3.7.1/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
columns : List[str], optional
List of columns to read from the file(s).
validate_schema : bool, default False
4 changes: 2 additions & 2 deletions awswrangler/s3/_read_parquet.py
Original file line number Diff line number Diff line change
@@ -397,7 +397,7 @@ def read_parquet(
must return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-sdk-pandas.readthedocs.io/en/3.7.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-sdk-pandas.readthedocs.io/en/3.7.1/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
columns : List[str], optional
List of columns to read from the file(s).
validate_schema : bool, default False
@@ -639,7 +639,7 @@ def read_parquet_table(
must return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-sdk-pandas.readthedocs.io/en/3.7.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-sdk-pandas.readthedocs.io/en/3.7.1/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
columns : List[str], optional
List of columns to read from the file(s).
validate_schema : bool, default False
Loading