Skip to content

ZhengtongYan/TONIC

 
 

Repository files navigation

TONIC

Repository of the paper: Turbo-Charging SPJ Query Plans with Learned Physical Join Operator Selections. [paper] [presentation]

VLDB'22

This repository contains scripts to evaluate core characteristics of the QEP-S. Each sub-directory contains a folder with precomputed query feedback. The Quick Evaluation scripts do not require any actual database connection. The steps for (re)computing the respective feedback and statistics are detailed in Verbose Evaluation.

Requirements:

  • sudo apt install build-essential cmake python3-pip libpq-dev python3-dev
  • sudo pip3 install psycopg2
  • sudo pip3 install glob2

Quick Evaluation:

  1. Individual Experiments: To run individual experiments use run_qeps.sh in the corresponding sub-directories. Running the experiments may require the installation of additional modules via pip3 install. Please run dataShift/run_qeps.sh at least once as other experiments, e.g., the adaptivity evaluation, may require a pretrained QEPS-S from the reduced data set.

  2. All Experiments: To run all experiments subsequently, please execute run_all_experiments.sh.

Verbose Evaluation:

Recomputation of the query feedback and statistics requires the following steps:

  1. PostgreSQL: Install Postgres and load the (frozen,) official IMDB data or [CSV files].
  2. Reduced-Data: Create another JOB instance where half the tuples from tables with at least 100k tuples are randomly dropped.
  3. Query-Feedback: Install the plan_hint_extension (see documentation here). Using the extension, execute each query with all possible physical operator combinations. For details see utils/operatorPermutation.
  4. Miscellaneous: To stop QEP-S branching for already empty join results, utils/getIntermediateSize.py recomputes the necessary intermediate result sizes. Lastly, filter expressions for the filter-aware QEP-S have been extracted according to utils/filterDict.py.

Vision: Two-Stage Cardinality-Estimation-Free Optimizer Design

In the future we seek to update this repository to provide a generic end-to-end implementation of TONIC combined with Simplicity (UES).

Starting with a fresh workload, the two-stage design works as follows:

  1. In the first stage, TONIC receives the join order determined by the Simplicity query optimizer and searches for the longest prefix match within the QEP-S to find the most similar case. Since there is no workload history yet, the QEP-S is unable to identify a prefix match for the first query. In accordance, to the UES policy, we apply hash joins for the initial execution of an unknown join order.

  2. We use the actual join cardinalities as input for the default optimizer's cost model to determine the cost of every physical join operator alternative. Based on the cost feedback, we revise the operator decision of the previous step.

  3. For the next queries, we again generate the logical plans according to UES. However, TONIC additionally uses the QEP-S to determine the most cost effective join operators for already known join orders. For (partial) join orders that are not contained in the QEP-S, TONIC falls back to hash joins and updates the QEP-S with the respective query feedback, afterward.

  4. Interestingly, since TONIC accumulates the cost of operator alternatives, we can approximate an index significance rating from the QEP-S cost history. TONIC can use this rating to automatically detect benecficial or unnecessary indices.

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.6%
  • C++ 2.1%
  • Shell 1.3%