This repository contains code for using Frequent Directions (FD) for using ridge regression as proposed
in Shi & Phillips 2020.
The repo is split into src
and notebooks
with the latter being experimental/draft versions of the code
which is then placed in src
, along with some discussion explaining the functionality.
We test on open datasets such as the California Housing Dataset and the YearPredictionsMSD Dataset. The latter cannot be saved into the repo as it exceeds the 100Mb file size, but details for downloading and preprocessing can be found in notebooks
.
src/california_housing_regression.py, src/year_predictions_regression.py
showcase how FD is used in ridge regression for training, validating and then test evaluation. To obtain the necessary data for ``src/california_housing_regression.pyone can directly read this from the repository as shown in the script. However, for
src/year_predictions_regression.py`, the `notebook/SongPredictions/YearPredictions.ipynb` should first be executed (or at least read) to obtain the data.src/experiments/
contains the experiments shown in Dickens 2020. These are primarilysrc/experiments/bias_variance_tradeoff.py
andsrc/experiments/iterative_sketching.py
.
This code runs on Python 3.7.6
and uses only standard libraries (e.g NumPy, scikit-learn, matplotlib, pandas
).
- Many works require getting the top m components via a power method or truncated SVD, can we sub in FD?
- How does FD performance compare to power iteration?
- Multiple response ridge regression comparing frequent direcitons and co-occuring directions