You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On deleting a column from hudi table, its still present when querying from presto using hive connector. All the rows have value null for the deleted column but it is present in output when querying select *. Also, on doing describe table, i see that column is still present
Your Environment
hudi-bundle: org.apache.hudi:hudi-spark3.3-bundle_2.12:0.14.1 to create a hudi table syncing to metastore
spark: 3.3.2
On deleting hudi column, i expected that column should not be present when querying from presto
Current Behavior
<All the rows have value null for the deleted column but it is present in output when querying select *. Also, on doing describe table, i see that column is still present
Possible Solution
TBD
Steps to Reproduce
Create dataproc cluster and make following changes
spark-default.conf
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension
spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED
spark.sql.legacy.avro.datetimeRebaseModeInWrite=CORRECTED
spark.sql.legacy.avro.datetimeRebaseModeInRead=CORRECTED
Create spark-sql engine as pyspark use datasource v1
set hoodie.schema.on.read.enable=true
ALTER TABLE default.subham_test_metastore_11 DROP COLUMN surname;
On doing this, we get this error message as well, but data is getting dropped from hudi.
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. The following columns have types incompatible with the existing columns in their respective positions :
4. update a row in table from pyspark
schema = StructType([
StructField("id", IntegerType(), True),
StructField("name", StringType(), True),
StructField("ts", TimestampType(), True) # Adding timestamp field
])
data = [
(1,"Johny", datetime.now())
]
df = spark.createDataFrame(data, schema)
df.write
.format("org.apache.hudi")
.option("hoodie.table.name", "hoodie_table")
.option("hoodie.datasource.write.recordkey.field", "id")
.option("hoodie.datasource.write.keyprefix", "ts")
.option("hoodie.schema.on.read.enable","true")
.mode("append")
.save("gs://xxxx/subham_test_metastore_13")
Now, from spark-sql and pyspark, we can see that column is not coming but it is coming when querying from presto.
Screenshots (if appropriate)
Context
We have lakehouse in Hudi and use presto with hive-connector to query hudi table. We want to delete column and facing problem there.
The text was updated successfully, but these errors were encountered:
On deleting a column from hudi table, its still present when querying from presto using hive connector. All the rows have value null for the deleted column but it is present in output when querying select *. Also, on doing describe table, i see that column is still present
Your Environment
hudi-bundle: org.apache.hudi:hudi-spark3.3-bundle_2.12:0.14.1 to create a hudi table syncing to metastore
spark: 3.3.2
Expected Behavior
On deleting hudi column, i expected that column should not be present when querying from presto
Current Behavior
<All the rows have value null for the deleted column but it is present in output when querying select *. Also, on doing describe table, i see that column is still present
Possible Solution
TBD
Steps to Reproduce
spark-default.conf
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension
spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED
spark.sql.legacy.avro.datetimeRebaseModeInWrite=CORRECTED
spark.sql.legacy.avro.datetimeRebaseModeInRead=CORRECTED
spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar
hive-site.xml
hive.metastore.disallow.incompatible.col.type.changes = false
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, TimestampType
from datetime import datetime
schema = StructType([
StructField("id", IntegerType(), True),
StructField("name", StringType(), True),
StructField("surname", StringType(), True),
StructField("ts", TimestampType(), True) # Adding timestamp field
])
data = [
(1, "John", "Doe", datetime.now()),
(2, "Jane", "Smith", datetime.now()),
(3, "Michael", "Johnson", datetime.now()),
(4, "Emily", "Williams", datetime.now())
]
df = spark.createDataFrame(data, schema)
df = spark.createDataFrame(data, schema)
df.write
.format("org.apache.hudi")
.option("hoodie.table.name", "hoodie_table")
.option("hoodie.datasource.write.recordkey.field", "id")
.option("hoodie.datasource.write.keyprefix", "ts")
.option("hoodie.schema.on.read.enable","true")
.mode("overwrite")
.save("gs://xxxx/subham_test_metastore_13")
spark.sql("CREATE TABLE default.subham_test_metastore_13 USING hudi LOCATION 'gs://xxxx/subham_test_metastore_13' ")
set hoodie.schema.on.read.enable=true
ALTER TABLE default.subham_test_metastore_11 DROP COLUMN surname;
On doing this, we get this error message as well, but data is getting dropped from hudi.
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. The following columns have types incompatible with the existing columns in their respective positions :
4. update a row in table from pyspark
schema = StructType([
StructField("id", IntegerType(), True),
StructField("name", StringType(), True),
StructField("ts", TimestampType(), True) # Adding timestamp field
])
data = [
(1,"Johny", datetime.now())
]
df = spark.createDataFrame(data, schema)
df.write
.format("org.apache.hudi")
.option("hoodie.table.name", "hoodie_table")
.option("hoodie.datasource.write.recordkey.field", "id")
.option("hoodie.datasource.write.keyprefix", "ts")
.option("hoodie.schema.on.read.enable","true")
.mode("append")
.save("gs://xxxx/subham_test_metastore_13")
Screenshots (if appropriate)
Context
We have lakehouse in Hudi and use presto with hive-connector to query hudi table. We want to delete column and facing problem there.
The text was updated successfully, but these errors were encountered: