You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I write a parquet file using arrow and then read it using DuckDB the unit of the timestamp is incorrect. I'm not sure if this is a duckdb-rs issue or a duckdb issue. But I decided to submit it here first.
Code to generate the arrow file:
let schema = Schema::new(
vec![
Field::new("time", DataType::Timestamp(TimeUnit::Nanosecond, None), true),
Field::new("value", DataType::Float64, true),
Field::new("valid", DataType::Boolean, true),
]
);
let n = 1000000;
let timestamps : Vec<Option<i64>> = (0..n).map(|x| Some(x)).collect();
let values : Vec<Option<f64>> = (0..n).map(|x| Some((x as f64).sin())).collect();
let validities :Vec<Option<bool>> = vec![Some(true); n as usize];
let batch = RecordBatch::try_new(
Arc::new(schema),
vec![
Arc::new(TimestampNanosecondArray::from(timestamps)),
Arc::new(Float64Array::from(values)),
Arc::new(BooleanArray::from(validities)),
]
).expect("Failed to make batch for writing to Parquet test");
let file = File::create("tvv.parquet").unwrap();
let props = WriterProperties::builder()
.set_compression(Compression::UNCOMPRESSED)
.build();
let mut writer = ArrowWriter::try_new(file, batch.schema(), Some(props)).unwrap();
writer.write(&batch).expect("Writing batch");
writer.close().unwrap();
Code to read the file:
let conn = Connection::open_in_memory().unwrap();
let mut stmt = conn.prepare("SELECT * FROM tvv.parquet").unwrap();
let query = stmt.query_arrow([]).unwrap();
let schema = query.get_schema();
println!("{:?}", schema);
When I read the Parquet file using a some tool like Parquet Viewer, the schema is as I expect, with Nanosecond unit. But the output of my code is:
When I write a parquet file using arrow and then read it using DuckDB the unit of the timestamp is incorrect. I'm not sure if this is a duckdb-rs issue or a duckdb issue. But I decided to submit it here first.
Code to generate the arrow file:
Code to read the file:
When I read the Parquet file using a some tool like Parquet Viewer, the schema is as I expect, with Nanosecond unit. But the output of my code is:
Schema { fields: [Field { name: "time", data_type: Timestamp(Microsecond, None), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "value", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "valid", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }
I.e. Microsecond unit.
Is this a bug? Or am I doing something wrong?
The text was updated successfully, but these errors were encountered: