Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise View.asList() side inputs for iterating rather than for indexing. #31087

Merged
merged 9 commits into from
Apr 30, 2024

Commits on Apr 23, 2024

  1. Optimise View.asList() side inputs for iterating rather than for inde…

    …xing.
    
    The current implementation is, essentially, a distributed hashmap from
    integer keys to the list contents, mediated by each upstream worker starting
    at a random value to minimize overlaps and emitting sufficient metadata to map
    this onto the contiguous range [0, N). This provides optimal *random-access*
    performance, but very poor *iteration* performance (essentially having to do
    a key lookup for every advance, and as the keys are hashed and distributed
    rather than clustered numerically, there is little to no amortiziation in these
    lookups for adjacent items.
    
    Given that most uses for List side inpupts are merely to gather a collection
    of values (the user has no control over the ordering when materialized) and
    the high costs of providing random access, this is probably the wrong tradeoff
    for most pipelines.
    
    This is an update-incompatable change and so has been guarded by the
    update compatibility version flag. The old behavior can be explicilty
    asked for via a new AsList#withRandomAccess() method.
    robertwb committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    b163a54 View commit details
    Browse the repository at this point in the history

Commits on Apr 24, 2024

  1. Configuration menu
    Copy the full SHA
    03fc0c4 View commit details
    Browse the repository at this point in the history
  2. checkstyle

    robertwb committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    f38690c View commit details
    Browse the repository at this point in the history
  3. fix the fix

    robertwb committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    bf3eae5 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    295b440 View commit details
    Browse the repository at this point in the history

Commits on Apr 26, 2024

  1. Configuration menu
    Copy the full SHA
    9ab2906 View commit details
    Browse the repository at this point in the history

Commits on Apr 29, 2024

  1. Better naming for ListViewFn3, restrict to global windows.

    (I kept the name for ListViewFn2 just in case there are pipelines serializing it as data.)
    robertwb committed Apr 29, 2024
    Configuration menu
    Copy the full SHA
    d64e194 View commit details
    Browse the repository at this point in the history
  2. fix

    robertwb committed Apr 29, 2024
    Configuration menu
    Copy the full SHA
    7582e80 View commit details
    Browse the repository at this point in the history

Commits on Apr 30, 2024

  1. Configuration menu
    Copy the full SHA
    3673ee6 View commit details
    Browse the repository at this point in the history