Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DYOD] Support window function queries #2607

Open
wants to merge 170 commits into
base: master
Choose a base branch
from

Conversation

niklasmohrin
Copy link
Contributor

@niklasmohrin niklasmohrin commented Aug 17, 2023

For intermediate review, see #2597. Additionally, we factored out the implementation of the parallel inplace merge sort into #2605.

  • check for nullptr chunks
  • investigate slow sorting for query 51 (it seems that we are (de-)allocating too many vectors and strings)
  • implement and benchmark different fan-outs / -ins for partition_and_sort ([DYOD] Add generic implementation of Parallel Inplace Merge Sort #2605)
  • increase test coverage / cover everything except unreachable code (not sure where we stand currently) (missing: reverse(equal) and deep copy)
  • better operator description in query plans
  • more documentation
  • change constructor to not take the WindowFunctionExpression

Possible follow up work:

  • benchmark against another implementation
  • reduce allocations in general (make HashPartitionedData only have one vector and save offsets for hash buckets?)
  • improve performance for string columns, for example by avoiding copies (something like AllTypeVariantView?)

Furthermore: Currently coverage says "warning: 100 functions have mismatched data"

niklasmohrin and others added 30 commits August 17, 2023 22:11
Upstream said that we should create new `Chunk`s that re-use the
previous segments (they are stored as shared pointers).
Assigning does not work, because `operator[]` returns `AllTypeVariant`,
not a reference.
It appears that someone is renaming our column after the operator has
executed.
This is needed to prevent creating a chunk that has both
`ReferenceSegment`s and `ValueSegments`s, as our output column is always
represented by `ValueSegment`s.
We now reuse all segments from the input table.
If the input table is a reference table, we create a new table with our output value segments and then add
reference segments to this new table in our output table.
This adds `_templated_on_execute` so that the input column type and
window function are compile time constants the whole time. This way, we
don't get template instantiation errors down the line when we resolve
types again.
@Finn-HPI Finn-HPI added the FullCI Run all CI tests (slow, but required for merge) label Aug 23, 2023
@niklasmohrin niklasmohrin marked this pull request as ready for review August 24, 2023 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FullCI Run all CI tests (slow, but required for merge)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants