Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bigquery/storage/managedwriter: multiplexed writes #7103

Closed
shollyman opened this issue Nov 29, 2022 · 3 comments
Closed

bigquery/storage/managedwriter: multiplexed writes #7103

shollyman opened this issue Nov 29, 2022 · 3 comments
Assignees
Labels
api: bigquery Issues related to the BigQuery API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@shollyman
Copy link
Contributor

This issue tracks the PRs related to supporting multiplex connections in managedwriter.

@shollyman shollyman added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. api: bigquery Issues related to the BigQuery API. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Nov 29, 2022
@shollyman shollyman self-assigned this Nov 29, 2022
shollyman added a commit to shollyman/google-cloud-go that referenced this issue Nov 29, 2022
This PR adds a new internal mechanism to simplify duplicating flow
controllers, and does some preliminary work to wire in a UUID-based
ID for managed stream instances.  Neither is used elsewhere.

Towards: googleapis#7103
shollyman added a commit to shollyman/google-cloud-go that referenced this issue Dec 15, 2022
This PR establishes new unexported connection and connection pool
abstractions.  The implementation is partial, and key areas where
implementation is missing is generally marked with TODOs.

This PR does not alter existing functionality, but continues to lay
groundwork for later refactors.

Towards: googleapis#7103
shollyman added a commit to shollyman/google-cloud-go that referenced this issue Dec 16, 2022
This PR adds two references to live code:
* a pool reference on ManagedStream
* a reference to ManagedStream on pendingWrite

The ManagedStream->pool reference is to allow a writer to
resolve where to find its associated connection, retries, lookups, etc.

The reference on the pendingWrite is primarily in service of
retries, particularly when we need to re-enqueue and thus potentially
re-resolve what connection is associated with the writer.  This PR also
moves some of the retry processing code onto the connectionPool in
service to that goal.  As before, this is new code that isn't yet
referenced from existing functionality.

This PR also more substantially starts to carve out connection
management in the pool, providing a basic connection resolver and
eviction capabilities.  This initial implementation is primitive, and
aligns with our current behavior (single unshared connection per
writer).

We also add some testing of the mapping behavior to ensure we're
consistently updating the map for resolution and eviction.

Towards: googleapis#7103
shollyman added a commit to shollyman/google-cloud-go that referenced this issue Dec 28, 2022
This PR augments the base client with fuller support for view-based
resolution of GetWriteStream metadata.  This PR also adds an integration
test that compares behaviors between different stream types (default vs
explicitly created).

Towards: googleapis#7103
shollyman added a commit that referenced this issue Dec 29, 2022
This PR augments the base client with fuller support for view-based
resolution of GetWriteStream metadata.  This PR also adds an integration
test that compares behaviors between different stream types (default vs
explicitly created).

Towards: #7103
codyoss pushed a commit that referenced this issue Jan 9, 2023
This PR includes much of the rewiring of the existing ManagedStream abstraction, but doesn't cut over to the new implemention yet.

We add a reference to the origin writer as part of the pendingWrite which retains information about a single write request and response. This allows us to resolve retry settings for a given write by checking if the writer has a custom retry policy. In other cases, we use the default settings of the connection pool.

We introduce internal UUID identifiers to the core abstractions (pool, connection, writer) so that we can add observability later to see which components are responsible for processing requests.

We remove the notion of adding connections to the connectionpool contract. Instead, we introduce a new interface in the pool called a poolRouter. By interface contract, it's responsible for picking the correct connection for a given write. However, this allows us to abstract away different implementations for pool behavior and make it the responsibility of an individual router.

Further, this PR adds the most simplistic router we'll use for the initial migration to multiplexing (simpleRouter): it supports a single connection, and routes all traffic to it.

This PR also moves over more internal functionality from the ManagedStream, namely appendWithRetry() and lockingAppend(). The implementations still remain on the ManagedStream implementation at this time, we'll remove most of the functionality when we cut over to using pools/connections.

Towards: #7103
shollyman added a commit to shollyman/google-cloud-go that referenced this issue Feb 7, 2023
This PR does more settings consolidation, and updates/adds new
WriterOption options to propagate settings.

In particular, this PR:
* moves the AppendRows call options into streamSettings
* adds a multiplex flag and option to streamSettings
* adds a call function option into streamSettings

This PR also updates managed stream to use the new option(s) as
appropriate, but most of this is unused here and is in preparation
for a larger cutover of functionality related to the new connection
abstractions.

Towards: googleapis#7103
shollyman added a commit to shollyman/google-cloud-go that referenced this issue Feb 23, 2023
This PR revisits the expected behavior for config knobs in the client.
Previously, all configuration was done when instantiating a writer (aka
a ManagedStream).  There are some chicken-and-egg problems related to
multiplex settings, as connection options are decoupled from individual
writers.

This PR adds the following unexported custom client options (but does
not yet use them for anything):

* enableMultiplex
* defaultInflightRequests
* defaultInflightBytes
* defaultAppendRowsCallOption

This PR also removes the still-unexported enableMultiplex from
the set of defined WriterOption options which can be passed when
instantiating individual writes.

Towards: googleapis#7103
shollyman added a commit that referenced this issue Feb 27, 2023
…7490)

* refactor(bigquery/storage/managedwriter): add custom client options

This PR revisits the expected behavior for config knobs in the client.
Previously, all configuration was done when instantiating a writer (aka
a ManagedStream).  There are some chicken-and-egg problems related to
multiplex settings, as connection options are decoupled from individual
writers.

This PR adds the following unexported custom client options (but does
not yet use them for anything):

* enableMultiplex
* defaultInflightRequests
* defaultInflightBytes
* defaultAppendRowsCallOption

This PR also removes the still-unexported enableMultiplex from
the set of defined WriterOption options which can be passed when
instantiating individual writes.

Additionally, this refactor includes a correctness fix for the traceID option that was causing the traceID to duplicate the initial token.

Towards: #7103
shollyman added a commit to shollyman/google-cloud-go that referenced this issue Mar 2, 2023
This PR extends the poolRouter interface to allow writers to be
registered and removed, and augments the existing simpleRouter to
support the contract.

PR adds a basic test of the router.

A future refactor (when we wire up the new abstractions) will hook
up the functionality properly.

Towards: googleapis#7103
shollyman added a commit to shollyman/google-cloud-go that referenced this issue Mar 17, 2023
This PR allows the flowcontroller to report bytes in flight for flow
controllers with a bounded byte definition.  The primary connection
load signals for a connection are the inserts/bytes in flight as
reported by the flow controller, and this makes the bytes in flight a
signal we can use.

Important note: an unbounded flow controller will not report any bytes
in flight.  This avoids introducing odd situations due to size
normalization where bytes tracked and the actual capacity of the
semaphore could get out of sync.

Towards: googleapis#7103
shollyman added a commit to shollyman/google-cloud-go that referenced this issue Mar 24, 2023
This PR switches newConnection to a typed argument to make it less prone
to invoke incorrectly.  Raised during a review on a related PR.

Towards: googleapis#7103
shollyman added a commit to shollyman/google-cloud-go that referenced this issue Mar 31, 2023
With recent multiplex refactors, call options were not being propagated
properly for non-multiplex writers as we formerly created a
pool-per-writer.

This allows the router to build exclusive connections using the writers
settings, namely overrides to flow control and call options propagated
to the underlying AppendRows RPC.

Towards: googleapis#7103
@shollyman
Copy link
Contributor Author

At this point, experimental multiplexing is available at head, but not baked into a release version. Next release is 1.51.0, but it is likely release will be deferred until week of April 17.

@shollyman
Copy link
Contributor Author

This has been released as part of bigquery/v1.51.0

@shollyman
Copy link
Contributor Author

While there continues to be smaller features related to multiplexing, going forward we'll track those individually rather than via this umbrella issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

1 participant