Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for AWS Glue as an alternative Hive metastore implementation #112

Open
ryanrupp opened this issue Nov 28, 2018 · 5 comments
Open

Comments

@ryanrupp
Copy link

Similar to the functionality in Presto I was wondering if Glue can be substituted in as an alternative implementation of a Hive metastore. Looking at the current HiveTableOperations it relies on:

get table
create table
alter table
an exclusive lock

The locking mechanism would be the problematic part as I don't believe an equivalent API is available in Glue. Possibly there's another approach or another service could be used for the locking functionality e.g. DynamoDB.

@rdblue
Copy link
Contributor

rdblue commented Dec 7, 2018

I thought Glue exposed the same Thrift API that Hive uses. If that's the case, then we should be able to use the same lock API and code.

@ryanrupp
Copy link
Author

ryanrupp commented Dec 7, 2018

I believe the API is partially implemented and doesn't include locking mechanisms unfortunately. Looking into it a bit when running on Spark EMR for instance, the HiveMetaStoreClientFactory can be overridden to specify AWSGlueDataCatalogHiveClientFactory see here. The implementation used there implements the basic Hive metastore operations e.g. create/alter/get table (calling back to the Glue public API) but UnsupportedOperationException is thrown for the lock method.

So, I was thinking the lock piece could be abstracted out where the generic Hive implementation uses the lock method via the Hive metastore but then a Glue override could use some other mechanism. So I guess mainly at this point it's a limitation of the Glue implementation but wanted to toss this out there as a nice to have for people not running their own Hive metastore.

@ryanrupp
Copy link
Author

The client source was made available for Glue now for reference, see announcement. AWSCatalogMetastoreClient implements Hive's IMetaStoreClient and delegates to the GlueMetastoreClientDelegate although this only implements a subset of functionality so lock for instance just throws an unsupported operation exception here

@rdblue
Copy link
Contributor

rdblue commented Feb 13, 2019

I think that Glue should implement locking as required by the interface it exposes. I'd be fine adding a solution specific to Glue in Iceberg as well, but I'm not sure what that would look like. Good to know that Glue won't work though.

@teabot
Copy link

teabot commented Feb 13, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants