Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect spark.catalog.currentDatabase instead of hardcoded "default" #27

Open
NJordan72 opened this issue Apr 13, 2018 · 3 comments
Open

Comments

@NJordan72
Copy link

In the event that a database isn't defined the behavior falls back to using a hardcoded "default" database. Instead we should respect spark.catalog.currentDatabase.

I have a simple pull request prepared if folks are interested.

@rdblue
Copy link
Contributor

rdblue commented Apr 18, 2018

I'm open to changing this, but I don't think that spark.catalog.currentDatabase is necessarily the right choice. That is the current database for the global catalog, but there's no guarantee that Iceberg tables use that catalog. Should Iceberg have a way to set the default database instead? Or maybe this should be handled at a higher level and database should always be required.

@NJordan72
Copy link
Author

I don't have strong opinions here. To add a little more information here I ran into an issue as part of the Convert Table to Iceberg example. As part of SparkSchemaUtil and the Hive hack.

Both use "default" if no database is passed in. When passing in the database I was getting errors around unsupported characters in the database (even with back-ticks), likely because we're using AWSGlueDataCatalog as the store we're trying to convert. I'm not sure how EMR typically solves this, but by having those objects use spark.catalog.currentDatabase instead of default it "just worked."

The net net is that this may not be important generally as the problem is probably both related to Glue metastores and EMR.

@rdblue
Copy link
Contributor

rdblue commented Apr 20, 2018

So it sounds like the problem was that those classes parse the table name instead of passing db and table separately? We'd definitely accept a PR that separated those options out and moved the single string parsing to a separate method. Supporting backticks would be great, too, but we generally expect to rely on Spark or a higher layer to provide that.

Parth-Brahmbhatt pushed a commit to Parth-Brahmbhatt/iceberg that referenced this issue Apr 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants