Support string columns in load_csv #3320

jl-wynen · 2023-11-08T14:43:20Z

We are not handling strings well.
Given a file like

idx,name
1,a
2,b
3,c

we get

>>> sc.io.load_csv('asd.csv')
<scipp.Dataset>
Dimensions: Sizes[row:3, ]
Data:
  idx                         int64  [dimensionless]  (row)  [1, 2, 3]
  name                     PyObject        <no unit>  (row)  [a, b, c]

The problem is that the dtype of name is PyObject. I think this happens because pandas defaults to an object dtype. If it used StringDtype, this might work. See https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html

The text was updated successfully, but these errors were encountered:

YooSunYoung · 2023-11-15T08:28:41Z

I also ran into this problem when I was using from_pandas...
I tried to set the dtype as string if all elements in the column are string.
But I'm not sure what should be done when there are int and string.
Should it just fall back to PyObject...?

YooSunYoung · 2023-11-15T08:31:16Z

Or... can we enable astype from PyObject to string if possible...?

SimonHeybrock changed the title ~~Support string columns in lead_csv~~ Support string columns in load_csv Nov 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support string columns in load_csv #3320

Support string columns in load_csv #3320

jl-wynen commented Nov 8, 2023

YooSunYoung commented Nov 15, 2023

YooSunYoung commented Nov 15, 2023

Support string columns in load_csv #3320

Support string columns in load_csv #3320

Comments

jl-wynen commented Nov 8, 2023

YooSunYoung commented Nov 15, 2023

YooSunYoung commented Nov 15, 2023