These are course files (instructional materials) for both the BBT3104: Advanced Database Systems and the MIT8107: Advanced Database Systems courses.
Key | Value |
---|---|
Course Code | BBT3104 and MIT8107 |
Course Name | Advanced Database Systems (same name for both courses) |
URL | https://elearning.strathmore.edu/course/view.php?id=3932 and https://elearning.strathmore.edu/course/view.php?id=3277 respectively |
Semester Duration | 14th August - 25th November 2024 |
Lecturer | Allan Omondi |
Contact | [email protected] |
Omondi, A. O. & Kirui, J. C. (2024). BBT3104-Lab6of6-QueryOptimization (v1.3) [Source code]. https://github.com/course-files/BBT3104-Lab6of6-QueryOptimization
Bibtex:
@misc{omondi_bbt3104-lab6of6-queryoptimization_2024,
address = {Nairobi, Kenya},
title = {{BBT3104}-{Lab6of6}-{QueryOptimization}},
copyright = {GPL-3.0},
shorttitle = {Query {Optimization}},
url = {https://github.com/course-files/BBT3104-Lab6of6-QueryOptimization},
abstract = {Instructional materials (course files) for the BBT3104 course (Advanced Database Systems) and the MIT8107 course (Advanced Database Systems). Topic: Query Optimization. Based on the IMDb dataset using PostgreSQL and the Join Order Benchmark (JOB).},
urldate = {2024-07-10},
author = {Omondi, Allan Odhiambo and Kirui, Julliet Chepngeno},
month = jul,
year = {2024},
note = {v1.3 [Source code]},
keywords = {query-optimization, query-processing},
}
Please refer to the Lab Manual available here for the detailed instructions: https://elearning.strathmore.edu/
IMDb captures more than 2.5 million movie titles produced over 133 years by 234,997 different companies with over 4 million actors. It subsequently contains many join-crossing correlations thus making it challenging for cardinality estimators. This is unlike TPC-H and TPC-DS which are considered trivial for cardinality estimators.
-
Recommended: The version of IMDb used in Leis et al. (2018) can be downloaded from here: http://homepages.cwi.nl/~boncz/job/imdb.tgz. This version was created in May 2013.
-
The current version of IMDb can be downloaded from here: http://www.imdb.com/interfaces
-
Create the PostgreSQL Docker container using Docker-Compose.yaml
-
Create the IMDb database in Postgres using schema.sql
-
Download the
.tgz
file, decompress it, and import each.csv
file into its respective table. You can use DBeaver (recommended) or any other similar database tool when importing. A slightly similar video tutorial is available here on YouTube.
Alternatively, you can use cinemagoer, formerly IMDbPy, to automate this step.
- Proceed to execute the Join Order Benchmark (JOB) queries available here on the IMDb database in PostgreSQL.
- Execute the q-error computation code to quantitatively determine the error between the actual cardinality and the estimated cardinality per node for each query here.
Initial code has been sourced from join-order-benchmark by Greg Rahn, Moritz Eyssen, maahl, and Max Halford.
Additional code has also been sourced from learnedcardinalities by Andreas Kipf
Kipf, A., Kipf, T., Radke, B., Leis, V., Boncz, P., & Kemper, A. (2019). Learned cardinalities: Estimating correlated joins with deep learning. 9th Biennial Conference on Innovative Data Systems Research (CIDR ‘19), CA, USA. https://arxiv.org/abs/1809.00677
Leis, V., Radke, B., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., & Neumann, T. (2018). Query optimization through the looking glass, and what we found running the Join Order Benchmark. The VLDB Journal, 27(5), 643–668. https://doi.org/10.1007/s00778-017-0480-7