This repository contains code and instructions used in Simpli-Squared: A Very Simple Yet Unexpectedly Powerful Join Ordering Algorithm.
1. join-orderingSimplified.py : This file has the code for the join ordering simplified algorithm.
- Input : JOB(Join Ordering Benchmark) queries.
- Output : plans in plans.txt
file. Sample plan file can be found here https://github.com/Asoke26/Simpli-Squared/blob/main/plans.txt
2. sql-generator.py : This code transforms the output plans from plans.txt
into sql.
- Input : JOB(Join Ordering Benchmark) queries and plans.txt
file.
- Output : Rewritten SQL query in explicit representation.
Prerequisites -
- python 3.X.
collections, operator
(Python Packages)- PostgreSQL 14.2.
Step 1 : Create a database using imdb dataset. Step-by-step instructions can be found here(https://github.com/gregrahn/join-order-benchmark). Dataset used in paper can be found in http://homepages.cwi.nl/~boncz/job/imdb.tgz .
Step 2 : Run join ordering simplified alorithm to obtain the plans.
python3 join-orderingSimplified.py
Step 3 : Run SQL generator to translate the plans from step 1 into SQL queries.
python3 sql-generator.py
The reported result in paper [figure 3] are obtained by running these queries in PostgreSQL database.
We change below parameters in PostgreSQL for our experiemnts. They can be found /usr/local/pgsql/data/postgresql.conf
(default location unless set otherwise during installation)
shared_buffers = 128 GB
work_mem = 128 GB
effective_cache_size = 128 GB
geqo_threshold = 18
cpu_tuple_cost = 0.01,0.1,1
force_parallel_mode = on
from_collapse_limit = 1 [To prevent sub-query merging]
join_collapse_limit = 1 [To prevent reordering join order]
System requires a restart to re-initialize these variables.
Note :
In experiemnts, we use imdb schema with referential integrity which can be found here. Data should be inserted following dependency matric.
https://github.com/Asoke26/Simpli-Squared/blob/main/join-order-benchmark/schema-with-referential-integrity.sql