Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add getSchema() to SourceClient interface #294

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

coded9
Copy link

@coded9 coded9 commented Dec 16, 2023

Important Read

Addresses #77

What is the purpose of the pull request

This change adds getSchema() method to SourceClient interface

Brief change log

  • Add getSchema() method to SourceClient
  • Added unit tests for getSchema() in TestIcebergSourceClient

Verify this pull request

Run getSchema test in TestIcebergSourceClient

cc: @ashvina

@@ -48,6 +49,14 @@ public interface SourceClient<COMMIT> extends Closeable {
*/
SchemaCatalog getSchemaCatalog(OneTable table, COMMIT commit);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can eliminate the SchemaCatalog class with this change?
It was one of the objectives of #77

@@ -48,6 +49,14 @@ public interface SourceClient<COMMIT> extends Closeable {
*/
SchemaCatalog getSchemaCatalog(OneTable table, COMMIT commit);

/**
* Extracts the {@link OneSchema} as of the latest state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method requires a OneTable object as an argument, which corresponds to the table format at a certain point in time. The method should return the schema that matches the OneTable object, not the most recent one.

Copy link
Contributor

@ashvina ashvina Dec 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@the-other-tim-brown, could you please confirm this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, for the point-in-time schema, we would need the commit/snapshot param. I will make the changes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coded9 the changes required for this are quite straightforward. You can copy the implementation for the getSchemaCatalog that already handles this but simply return the schema and not the schema catalog object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually wonder if this getSchema method is even required when the schema itself is already present in the OneTable object.

/**
* Extracts the {@link OneSchema} as of the latest state.
*
* @param table the current state of the table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment might require an update, depending on Tim's response.

@@ -82,6 +83,11 @@ public SchemaCatalog getSchemaCatalog(OneTable table, HoodieInstant commit) {
return HudiSchemaCatalogExtractor.catalogWithTableSchema(table);
}

@Override
public OneSchema getSchema(OneTable table) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add unit tests for Hudi and Delta sources also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants