Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: optimize delete/update by subquery #15403

Closed
wants to merge 10 commits into from

Conversation

lichuang
Copy link
Collaborator

@lichuang lichuang commented May 6, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

feat: optimize subquery

in previous update/delete by subquery implementation:

  1. first evaluate subqueries and get row_id arrays as ScalarExpr;
  2. use ScalarExpr return by first step 1 as conditions to modify data from source.

after optimization of subquery:

  1. all subqueries will be evaluate with modify physical plan in on runtime;
  2. subquery result will join with the target source data, for example: if the the original target source data is [1,2,3,4], it will return [1,2,3,4] plus a marker column like [true, false, true, false], in which the marker column will mark if or not the row is filter by subquery
  3. add a new transformer TransformMutationSubquery to modify data, which will:
  4. delete: it will filter out datas while mark as false
  5. update: it will evaluate operations on data
  6. the modified data will by commit on storage.

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label May 6, 2024
@BohuTANG BohuTANG added the ci-benchmark Benchmark: run all test label May 13, 2024
Copy link
Contributor

Docker Image for PR

  • tag: pr-15403-6a4d9bd

note: this image tag is only available for internal use,
please check the internal doc for more details.

Copy link
Contributor

Docker Image for PR

  • tag: pr-15403-6a4d9bd

note: this image tag is only available for internal use,
please check the internal doc for more details.

@lichuang lichuang changed the title feat: optimize subquery feat: optimize delete/update by subquery May 14, 2024
@lichuang lichuang closed this May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-benchmark Benchmark: run all test pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unify execution of DML statements and queries
2 participants