New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-41502: [Python] Fix reading column index with decimal values #41503
base: main
Are you sure you want to change the base?
Conversation
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format?
or
In the case of PARQUET issues on JIRA the title also supports:
See also: |
|
Hi @jrversteegh, thank you for the contribution! I think this is an elegant solution. Not sure if this was discussed before or not, can't find any similar issue on our issue tracker. I am sure @jorisvandenbossche will know straight away if this change fits or not. From my side, a test needs to be added in python/pyarrow/tests/parquet/test_pandas.py. |
That indeed looks like a good fix. The error itself should already happen with just a roundtrip from pandas->pyarrow->pandas (without parquet), so you can add a test for this in |
Thanks for that suggestion. I tried, but this issue appears more involved than I expected. It looks like pyarrow expects column names to be strings. If not, it converts them (in turn because the parquet format expects this?). |
@AlenkaF @jorisvandenbossche I've added a test and restored the decimal index from strings. This looks like a bit of a kludge. I think it's because both numpy and pandas don't understand |
Fix for #41502
Convert pandas "decimal" to "object" in numpy.