Query Optimization

These are course files (instructional materials) for both the BBT3104: Advanced Database Systems and the MIT8107: Advanced Database Systems courses.

Key	Value
Course Code	BBT3104 and MIT8107
Course Name	Advanced Database Systems (same name for both courses)
URL	https://elearning.strathmore.edu/course/view.php?id=3932 and https://elearning.strathmore.edu/course/view.php?id=3277 respectively
Semester Duration	14^th August - 25^th November 2024
Lecturer	Allan Omondi
Contact	[email protected]

Recommended Citation

Omondi, A. O. & Kirui, J. C. (2024). BBT3104-Lab6of6-QueryOptimization (v1.3) [Source code]. https://github.com/course-files/BBT3104-Lab6of6-QueryOptimization

Bibtex:

@misc{omondi_bbt3104-lab6of6-queryoptimization_2024,
	address = {Nairobi, Kenya},
	title = {{BBT3104}-{Lab6of6}-{QueryOptimization}},
	copyright = {GPL-3.0},
	shorttitle = {Query {Optimization}},
	url = {https://github.com/course-files/BBT3104-Lab6of6-QueryOptimization},
	abstract = {Instructional materials (course files) for the BBT3104 course (Advanced Database Systems) and the MIT8107 course (Advanced Database Systems). Topic: Query Optimization. Based on the IMDb dataset using PostgreSQL and the Join Order Benchmark (JOB).},
	urldate = {2024-07-10},
	author = {Omondi, Allan Odhiambo and Kirui, Julliet Chepngeno},
	month = jul,
	year = {2024},
	note = {v1.3 [Source code]},
	keywords = {query-optimization, query-processing},
}

Lab Manual

Please refer to the Lab Manual available here for the detailed instructions: https://elearning.strathmore.edu/

Internet Movie Database (IMDb)

IMDb captures more than 2.5 million movie titles produced over 133 years by 234,997 different companies with over 4 million actors. It subsequently contains many join-crossing correlations thus making it challenging for cardinality estimators. This is unlike TPC-H and TPC-DS which are considered trivial for cardinality estimators.

Recommended: The version of IMDb used in Leis et al. (2018) can be downloaded from here: http://homepages.cwi.nl/~boncz/job/imdb.tgz. This version was created in May 2013.
The current version of IMDb can be downloaded from here: http://www.imdb.com/interfaces

Data Import

Create the PostgreSQL Docker container using Docker-Compose.yaml
Create the IMDb database in Postgres using schema.sql
Download the .tgz file, decompress it, and import each .csv file into its respective table. You can use DBeaver (recommended) or any other similar database tool when importing. A slightly similar video tutorial is available here on YouTube.
Alternatively, you can use cinemagoer, formerly IMDbPy, to automate this step.

Join Order Benchmark (JOB) queries

Proceed to execute the Join Order Benchmark (JOB) queries available here on the IMDb database in PostgreSQL.

Q-Error Calculations

Execute the q-error computation code to quantitatively determine the error between the actual cardinality and the estimated cardinality per node for each query here.

Acknowledgments

Initial code has been sourced from join-order-benchmark by Greg Rahn, Moritz Eyssen, maahl, and Max Halford.

Additional code has also been sourced from learnedcardinalities by Andreas Kipf

References

Kipf, A., Kipf, T., Radke, B., Leis, V., Boncz, P., & Kemper, A. (2019). Learned cardinalities: Estimating correlated joins with deep learning. 9th Biennial Conference on Innovative Data Systems Research (CIDR ‘19), CA, USA. https://arxiv.org/abs/1809.00677

Leis, V., Radke, B., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., & Neumann, T. (2018). Query optimization through the looking glass, and what we found running the Join Order Benchmark. The VLDB Journal, 27(5), 643–668. https://doi.org/10.1007/s00778-017-0480-7

Name		Name	Last commit message	Last commit date
Latest commit History 273 Commits
.cspell		.cspell
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
.vscode		.vscode
Company-Database--Elmasri-and-Navathe-2016		Company-Database--Elmasri-and-Navathe-2016
DreamHome-database		DreamHome-database
Internet-Movie-Database--IMDb		Internet-Movie-Database--IMDb
Join-Order-Benchmark-queries		Join-Order-Benchmark-queries
MSCN		MSCN
container-volumes		container-volumes
q-error		q-error
query-featurization		query-featurization
query-workload		query-workload
.gitignore		.gitignore
Docker-Compose.yaml		Docker-Compose.yaml
LICENSE		LICENSE
README.md		README.md
lab_submission.md		lab_submission.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Query Optimization

Recommended Citation

Lab Manual

Internet Movie Database (IMDb)

Data Import

Join Order Benchmark (JOB) queries

Q-Error Calculations

Acknowledgments

References

About

Contributors 4

Languages

License

course-files/BBT3104-Lab6of6-QueryOptimization

Folders and files

Latest commit

History

Repository files navigation

Query Optimization

Recommended Citation

Lab Manual

Internet Movie Database (IMDb)

Data Import

Join Order Benchmark (JOB) queries

Q-Error Calculations

Acknowledgments

References

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 4

Languages