GitHub - lingo-db/subop-vldb-2023-reproducibility

Required Dataset

Fetch New York Taxi data set (2016-01, with location data) and unpack it into the file yellow_tripdata_2016-01.csv
Fetch modified JOB-Dataset and unpack the csv files into job-data/*.csv

Setup

python3 -m venv venv
. venv/bin/activate
pip3 install -r requirements.txt

Run Experiments (DuckDB+Python & Hyper)

# pagerank with scikit-network and DuckDB
python imdb-pagerank.py
# pagerank with hyper and recursive SQL
python imdb-pagerank-recursive-hyper.py
# similarity join with python and DuckDB
python imdb-similarity-join.py
# pagerank with scikit-learn and DuckDB
python taxi-kmeans.py
# k-means with hyper and recursive SQL
python imdb-kmeans-recursive-hyper.py
# blackscholes with numpy
python blackscholes-numpy.py
# haversine with numpy
python haversine-numpy.py

Run Experiments (Weld)

mkdir weld-data
#generate data required for pagerank
python imdb-generate-pagerank-edges.py
line_count=$(wc -l < "weld-data/imdb-pagerank-data.csv")
line_count=$((line_count - 1))
sed -i "1s/.*/$line_count/" "weld-data/imdb-pagerank-data.csv"

# build pyweld docker container
docker build -t pyweld pyweld/
# build c++ weld docker container
docker build -t cppweld cppweld/


# run blackscholes with pyweld docker
docker run --rm -v "$(pwd)/blackscholes-weld.py:/app/blackscholes-weld.py" pyweld python /app/blackscholes-weld.py
# run haversine with pyweld docker
docker run --rm -v "$(pwd)/haversine-weld.py:/app/haversine-weld.py" pyweld python /app/haversine-weld.py

# run pagerank with cppweld docker
docker run --rm -v ./weld-data:/data:ro cppweld /experiments/build/weld_experiments pagerank

Run Experiments (LingoDB)

git submodule init
git submodule update 
cd lingo-db
# install dependencies according to LingoDB Repository
[...]
git submodule update --init --recursive
make dependencies
make build-debug
make build-release
#run TPC-H and TPC-DS benchmarks
make run-benchmarks
#load job data
make resources/data/job/.stamp
cd ..
#load taxi dataset
mkdir taxidb 
./lingo-db/build/lingodb-release/sql taxidb < ./lingo-db/resources/sql/taxi/initialize.sql
cd ..
# run similarity join
./lingo-db/build/lingodb-release/run-mlir imdb-similarity-join.mlir ./lingo-db/resources/data/job
./lingo-db/build/lingodb-release/run-mlir imdb-similarity-join-pushdown.mlir ./lingo-db/resources/data/job
# run pagerank (time for iterations is reported as Timing: ... ms)
env LINGODB_TIMING=ON ./lingo-db/build/lingodb-release/run-mlir imdb-pagerank.mlir ./lingo-db/resources/data/job
# run k-means (time for iterations is reported as Timing: ... ms)
env LINGODB_TIMING=ON ./lingo-db/build/lingodb-release/run-mlir taxi-kmeans.mlir taxidb
# run blackscholes (time for actual computation is reported as Timing: ... ms)
python blackscholes-subop.py
env LINGODB_TIMING=ON ./lingo-db/build/lingodb-release/run-mlir blackscholes.mlir
# run haversine (time for actual computation is reported as Timing: ... ms)
python haversine-subop.py
env LINGODB_TIMING=ON ./lingo-db/build/lingodb-release/run-mlir haversine.mlir

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Required Dataset

Setup

Run Experiments (DuckDB+Python & Hyper)

Run Experiments (Weld)

Run Experiments (LingoDB)

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cppweld		cppweld
lingo-db @ e243299		lingo-db @ e243299
loc-analysis		loc-analysis
numsubop		numsubop
pyweld		pyweld
.gitignore		.gitignore
.gitmodules		.gitmodules
Readme.md		Readme.md
blackscholes-subop.py		blackscholes-subop.py
blackscholes-weld.py		blackscholes-weld.py
blackscholes_numpy.py		blackscholes_numpy.py
haversine-numpy.py		haversine-numpy.py
haversine-subop.py		haversine-subop.py
haversine-weld.py		haversine-weld.py
imdb-generate-pagerank-edges.py		imdb-generate-pagerank-edges.py
imdb-pagerank-recursive-hyper.py		imdb-pagerank-recursive-hyper.py
imdb-pagerank-recursive.py		imdb-pagerank-recursive.py
imdb-pagerank.mlir		imdb-pagerank.mlir
imdb-pagerank.py		imdb-pagerank.py
imdb-similarity-join-pushdown.mlir		imdb-similarity-join-pushdown.mlir
imdb-similarity-join.mlir		imdb-similarity-join.mlir
imdb-similarity-join.py		imdb-similarity-join.py
requirements.txt		requirements.txt
taxi-kmeans-prepare-data.py		taxi-kmeans-prepare-data.py
taxi-kmeans-recursive-hyper.py		taxi-kmeans-recursive-hyper.py
taxi-kmeans.mlir		taxi-kmeans.mlir
taxi-kmeans.py		taxi-kmeans.py

lingo-db/subop-vldb-2023-reproducibility

Folders and files

Latest commit

History

Repository files navigation

Required Dataset

Setup

Run Experiments (DuckDB+Python & Hyper)

Run Experiments (Weld)

Run Experiments (LingoDB)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages