admm-l1-2-logistic-regression

Description

This repo is the final project of COMP633 at UNC-Chapel Hill. We used Alternating Direction Method of Multipliers (ADMM) optimization methods to solve the L-1/L-2 regularized binary logistic regression. In this repo, we provide efficient squential and distributed implementations of ADMM powered by GNU Scientific Library (GSL) and Message Passing Interface (MPI) such that the program can handle millions of samples within couple of minutes.

Dependencies

GCC
GNU Scientific Library (GSL)
Message Passing Interface (MPI)

Usage

Compile

We use gcc/icc to compile the whole project. The standard Makefile has been provided. You may need to change the Makefile slightly to accommodate your local include path and library path. The program has been compiled successfully in MacOS and Linux environment.

Runner

logit: Sequential Runner
mpi_logit: Distributed Runner

usage: logit/mpi_logit [-A Feature Matrix] [-b Response Vector] [-e Relative Tolerance]
                       [-E Absolute Telerance] [-r Regularization Type] 
                       [-t Maximum Iteration] [-o Output File] [-p Print Progress]
arguments:
  -A:       The path to feature matrix,  e.g. data/input.dat;
  -b:       The path to response vevtor, e.g. data/output.dat;
  -e:       The relative precision tolerance in ADMM algorithm;
  -E:       The absolute precision tolerance in ADMM algorithm;
  -r:       Regularization type to use: only support L-1 and L-2, e.g. -r l2;
  -t:       The maximum number of iterations;
  -o:       The path to the result file, e.g. data/solution.dat
  -p:       Whether to print the optimization progress.

Note: Matrix and vector uses the standard Matrix Market File Format, i.e. Matrix Market Exchange Formats.

Distributed Runner

mpirun -np {\# of cores} ./mpi_logit [Arguments]

Note: For the distributed runner, we follow the data-parallel principle. Every process reads the part of feature matrix and response vector (partitioned by sample). Therefore, the argument -A and b should be the prefix path of data file, e.g. we use -A data/A when A1.dat--A{N}.dat are located in data.

Performance

Reference

Boyd, Stephen, et al. "Distributed optimization and statistical learning via the alternating direction method of multipliers." Foundations and Trends® in Machine learning 3.1 (2011): 1-122.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
header		header
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
data_generation.py		data_generation.py
logit.c		logit.c
mmio.c		mmio.c
mpi_logit.c		mpi_logit.c
solver.c		solver.c
submit_admm_logi.sbatch		submit_admm_logi.sbatch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

admm-l1-2-logistic-regression

Description

Dependencies

Usage

Compile

Runner

Distributed Runner

Performance

Reference

About

Releases

Packages

Contributors 2

Languages

License

HaidYi/admm-l1-2-logistic-regression

Folders and files

Latest commit

History

Repository files navigation

admm-l1-2-logistic-regression

Description

Dependencies

Usage

Compile

Runner

Distributed Runner

Performance

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages