Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destination BigQuery: Consolidation of objects to StreamConfig, cleanup #38131

Merged
merged 1 commit into from May 13, 2024

Conversation

gisripa
Copy link
Contributor

@gisripa gisripa commented May 11, 2024

What

Removing redundant references and duplicate information passed around using WriteConfig objects. No functional changes and resurrected all the information needed through StreamConfig and adapted changes accordingly.

This PR should be in a mergeable state with no functional changes after the ones down the stack are published.

Review guide

  • Removed references of BigQueryWriteConfig and reused already built StreamConfig
  • Removing unnecessary StagingOperations interface and made concrete class, this will help for later adding a shim on this and refactoring without large changes
  • Removed other unnecessary references of getting dynamic schema, WriteDispostion etc. Probably remnant of bigquery-denormalized bespoke connector.

User Impact

Can this PR be safely reverted and rolled back?

  • YES 💚
  • NO ❌

Copy link

vercel bot commented May 11, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 13, 2024 5:47pm

Copy link
Contributor Author

gisripa commented May 11, 2024

@gisripa gisripa changed the title Bigquery cdk signature changes Destination BigQuery: CDK signature changes May 11, 2024
@gisripa gisripa force-pushed the gireesh/05-10-Bigquery_cdk_signature_changes branch 2 times, most recently from b43ce70 to dc2606b Compare May 11, 2024 00:20
@gisripa gisripa force-pushed the gireesh/05-10-Bigquery_cdk_signature_changes branch from dc2606b to f6903b7 Compare May 12, 2024 19:17
@gisripa gisripa changed the base branch from cdk-ops-refactor to cdk_generation_id May 12, 2024 22:50
@gisripa gisripa force-pushed the gireesh/05-10-Bigquery_cdk_signature_changes branch from f6903b7 to cf090a7 Compare May 12, 2024 22:50
@gisripa gisripa force-pushed the gireesh/05-10-Bigquery_cdk_signature_changes branch from cf090a7 to 9c0dfc5 Compare May 13, 2024 03:13
@gisripa gisripa changed the title Destination BigQuery: CDK signature changes Destination BigQuery: Consolidation of objects to StreamConfig, cleanup May 13, 2024
@gisripa gisripa force-pushed the gireesh/05-10-Bigquery_cdk_signature_changes branch from 9c0dfc5 to 4328f0a Compare May 13, 2024 03:27
Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple minor comments, nothing blocking (and some of them are about the CDK PR this is stacked on :P )

@@ -234,7 +236,8 @@ public SerializedAirbyteMessageConsumer getSerializedMessageConsumer(final JsonN
final String datasetLocation = BigQueryUtils.getDatasetLocation(config);
final BigQuerySqlGenerator sqlGenerator = new BigQuerySqlGenerator(config.get(BigQueryConsts.CONFIG_PROJECT_ID).asText(), datasetLocation);
final Optional<String> rawNamespaceOverride = TypingAndDedupingFlag.getRawNamespaceOverride(RAW_DATA_DATASET);
final ParsedCatalog parsedCatalog = parseCatalog(config, catalog, datasetLocation, rawNamespaceOverride);
final ParsedCatalog parsedCatalog = parseCatalog(sqlGenerator, defaultNamespace,
rawNamespaceOverride.orElse(JavaBaseConstants.DEFAULT_AIRBYTE_INTERNAL_NAMESPACE), catalog);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random thought: now that we're in kotlin, do you want to make catalogparser accept rawNamespace: String? and do rawNamespace ?: DEFAULT_AIRBYTE_INTERNAL? then we don't need to copy this logic into every connector

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense to me.

final ParsedCatalog parsedCatalog,
final Function<JsonNode, BigQueryRecordFormatter> recordFormatterCreator,
final Function<String, String> tmpTableNameTransformer) {
private Map<StreamDescriptor, StreamConfig> createWriteConfigs(final ConfiguredAirbyteCatalog catalog,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this just be parsedCatalog.getStreams().stream()? afaict we don't actually need the raw protocol models for anything

(and then we don't need to plumb the raw configured catalog into this method)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, but the called method iterated over the Configuredcatalog and populates the map. Didn't want to change any functional logic for the fear of missing some defaultNamespace plumbing. up the stack the whole method will be removed.

// In Destinations V2, we will always use the 'airbyte' schema/namespace for raw tables
BigQueryRecordFormatter.SCHEMA_V2, streamConfig.getId().getOriginalName(),
tableId, streamConfig.getId().getOriginalName());
// In Destinations V2, we will always use the 'airbyte' schema/originalNamespace for raw tables
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// In Destinations V2, we will always use the 'airbyte' schema/originalNamespace for raw tables
// In Destinations V2, we will always use the 'airbyte_internal' schema/originalNamespace for raw tables

😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

DestinationSyncMode.APPEND_DEDUP,
List.of(new ColumnId("foo", "bar", "fizz")),
Optional.empty(),
new LinkedHashMap<>());
new LinkedHashMap<>(), 0, 0, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.... do you want me to set default values for the generation/sync ID args? These diffs seem kind of dumb

@gisripa gisripa marked this pull request as ready for review May 13, 2024 16:14
@gisripa gisripa requested a review from a team as a code owner May 13, 2024 16:14
@gisripa gisripa force-pushed the gireesh/05-10-Bigquery_cdk_signature_changes branch from 4328f0a to 3de0d51 Compare May 13, 2024 16:36
@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label May 13, 2024
Base automatically changed from cdk_generation_id to master May 13, 2024 16:59
@edgao edgao requested a review from a team as a code owner May 13, 2024 16:59
@gisripa gisripa force-pushed the gireesh/05-10-Bigquery_cdk_signature_changes branch from 3de0d51 to 6bf8684 Compare May 13, 2024 17:43
@gisripa gisripa merged commit e0225c1 into master May 13, 2024
33 checks passed
@gisripa gisripa deleted the gireesh/05-10-Bigquery_cdk_signature_changes branch May 13, 2024 18:09
@edgao edgao mentioned this pull request May 15, 2024
2 tasks
@gisripa gisripa mentioned this pull request May 16, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/destination/bigquery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants