Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoding Error when Reading Parquet File - "RLE: Decoded Run-Length Block" #329

Open
youen opened this issue Aug 28, 2023 · 0 comments
Open

Comments

@youen
Copy link

youen commented Aug 28, 2023

Description:
I'm facing a problem while attempting to query a Parquet file using OctoSQL. The Parquet file is sourced from the "TLC Trip Record Data" dataset (available here), containing NYC taxi trip records.

The error message I'm encountering is as follows:

Error: couldn't run query: couldn't run source: couldn't read row: decoding page 0 of column "VendorID": decoding definition levels of data page v1: RLE: decoded run-length block cannot have more than 1048576 values

It seems to be a decoding issue while trying to read the "VendorID" column from the Parquet file. It's worth noting that I have successfully read this file using other tools like dsq, and it's also accessible via online Parquet viewers like parquetreader.com and tablab.app.

Steps to Reproduce:

  1. Install OctoSQL version 0.12.2.
  2. Attempt to query the Parquet file using OctoSQL.

Expected Behavior:
OctoSQL should be able to successfully query the Parquet file without encountering decoding errors.

Actual Behavior:
OctoSQL encounters a decoding error related to the "VendorID" column, as mentioned above.

Additional Information:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant