Skip to content

Commit

Permalink
Merge pull request rapidsai#3 from mt-jones/cudf-refactor
Browse files Browse the repository at this point in the history
[REVIEW] adding notebooks and utilities for the mortgage workflow
  • Loading branch information
harrism authored Dec 6, 2018
2 parents 4aa6cc5 + 9b06b31 commit e54f8bb
Show file tree
Hide file tree
Showing 8 changed files with 979 additions and 2 deletions.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
# notebooks
RAPIDS Sample Notebooks
# RAPIDS Notebooks and Utilities

* `mortgage`: contains the notebook which runs ETL + ML on the Mortgage Dataset derived from [Fannie Mae’s Single-Family Loan Performance Data](http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html) ... download the mortgage dataset for use with the notebook [here](https://rapidsai.github.io/datasets/)
* `utils`: contains a set of useful scripts for interacting with RAPIDS
673 changes: 673 additions & 0 deletions mortgage/E2E.ipynb

Large diffs are not rendered by default.

95 changes: 95 additions & 0 deletions utils/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Utility Scripts

## Summary

* `start-jupyter.sh`: starts a JupyterLab environment for interacting with, and running, notebooks
* `stop-jupyter.sh`: identifies all process IDs associated with Jupyter and kills them
* `dask-cluster.py`: launches a configured Dask cluster (a set of nodes) for use within a notebook
* `dask-setup.sh`: a low-level script for constructing a set of Dask workers on a single node

## start-jupyter

Typical output for `start-jupyter.sh` will be of the following form:

```bash

jupyter-lab --allow-root --ip=0.0.0.0 --no-browser --NotebookApp.token=''


[I 09:58:01.481 LabApp] Writing notebook server cookie secret to /run/user/10060/jupyter/notebook_cookie_secret
[W 09:58:01.928 LabApp] All authentication is disabled. Anyone who can connect to this server will be able to run code.
[I 09:58:01.945 LabApp] JupyterLab extension loaded from /conda/envs/cudf/lib/python3.6/site-packages/jupyterlab
[I 09:58:01.945 LabApp] JupyterLab application directory is /conda/envs/cudf/share/jupyter/lab
[W 09:58:01.946 LabApp] JupyterLab server extension not enabled, manually loading...
[I 09:58:01.949 LabApp] JupyterLab extension loaded from /conda/envs/cudf/lib/python3.6/site-packages/jupyterlab
[I 09:58:01.949 LabApp] JupyterLab application directory is /conda/envs/cudf/share/jupyter/lab
[I 09:58:01.950 LabApp] Serving notebooks from local directory: /workspace/notebooks/notebooks
[I 09:58:01.950 LabApp] The Jupyter Notebook is running at:
[I 09:58:01.950 LabApp] http://(dgx15 or 127.0.0.1):8888/
[I 09:58:01.950 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
```

`jupyter-lab` will expose a JupyterLab server on port `:8888`. Opening a web-browser, and navigating to `http://YOUR.IP.ADDRESS:8888` provides a GUI which can used to edit/run code.

## stop-jupyter

Sometimes a server needs to be forcibly shut down. Running

```bash
notebooks$ bash utils/stop-jupyter.sh
```

will kill any and all JupyterLab servers running on the machine.

## dask-cluster

This is a Python script used to launch a Dask cluster. A configuration file is provided at `/path/to/notebooks/utils/dask.conf`.

```bash
notebooks$ cat utils/dask.conf

ENVNAME cudf

NWORKERS 8

12.34.567.890 MASTER

DASK_SCHED_PORT 8786
DASK_SCHED_BOKEH_PORT 8787
DASK_WORKER_BOKEH_PORT 8790

DEBUG
```

* `ENVNAME cudf`: a keyword to tell `dask-cluster.py` the name of the virtual environment where `cudf` is installed
* `NWORKERS 8`: a keyword to tell `dask-cluster.py` how many workers to instantiate on the node which called `dask-cluster.py`
* `12.34.567.890 MASTER`: a map of `IP.ADDRESS {WORKER/MASTER}`
* `DASK_SCHED_PORT 8786`: a keyword to tell `dask-cluster.py` which port is assigned to the Dask scheduler
* `DASK_SCHED_BOKEH_PORT 8787`: a keyword to tell `dask-cluster.py` which port is assigned to the scheduler's visual front-end
* `DASK_WORKER_BOKEH_PORT 8790`: a keyword to tell `dask-cluster.py` which port is assigned to the worker's visual front-end
* `DEBUG`: a keyword to tell `dask-cluster.py` to launch all Dask workers with log-level set to DEBUG

## dask-setup

`dask-setup.sh` expects several inputs, and order matters:

* `ENVNAME`: name of the virtual environment where `cudf` is installed
* `NWORKERS`: number of workers to create
* `DASK_SCHED_PORT`: port to assign the scheduler
* `DASK_SCHED_BOKEH_PORT`: port to assign the scheduler's front-end
* `DASK_WORKER_BOKEH_PORT`: port to assign the worker's front-end
* `YOUR.IP.ADDRESS`: machine's IP address
* `{WORKER/MASTER}`: the node's title
* `DEBUG`: log-level (optional, case-sensitive)

The script is called as follows:

```bash
notebooks$ bash utils/dask-setup.sh 8 8786 8787 8790 12.34.567.890 MASTER DEBUG
```

Note: `DEBUG` is optional. This script was designed to be called by `dask-cluster.py`. It is not meant to be called directly by a user other than to kill all present Dask workers:

```bash
notebooks$ bash utils/dask-setup.sh 0
```
72 changes: 72 additions & 0 deletions utils/dask-cluster.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
import subprocess

dask_conf_path = "../utils/dask.conf"
with open(dask_conf_path, "r") as file:
dask_conf = file.read()

_dask_conf = dask_conf.split("\n")
dask_conf = list()
for i, line in enumerate(_dask_conf):
line = line.split()
if 0 < len(line):
dask_conf.append(line)

cmd = "bash ../utils/dask-setup.sh 0"

print(cmd)

process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
output, error = process.communicate()

cmd = "hostname --all-ip-addresses"
process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
IPADDR = str(output.decode()).split()[0]

ENVNAME = None
NWORKERS = None
DASK_SCHED_PORT = None
DASK_SCHED_BOKEH_PORT = None
DASK_WORKER_BOKEH_PORT = None
MASTER_IPADDR = None
WHOAMI = None
DEBUG = None

for line in dask_conf:
if line[0] == "ENVNAME":
ENVNAME = line[1]
if line[0] == "NWORKERS":
NWORKERS = line[1]
if line[0] == "DASK_SCHED_PORT":
DASK_SCHED_PORT = line[1]
if line[0] == "DASK_SCHED_BOKEH_PORT":
DASK_SCHED_BOKEH_PORT = line[1]
if line[0] == "DASK_WORKER_BOKEH_PORT":
DASK_WORKER_BOKEH_PORT = line[1]
if line[1] == "MASTER":
MASTER_IPADDR = line[0]
if line[0] == IPADDR:
WHOAMI = line[1]
if line[0] == "DEBUG"
DEBUG = "DEBUG"

cmd = "bash ../utils/dask-setup.sh " + str(ENVNAME)
cmd = cmd + " " + str(NWORKERS)
cmd = cmd + " " + str(DASK_SCHED_PORT)
cmd = cmd + " " + str(DASK_SCHED_BOKEH_PORT)
cmd = cmd + " " + str(DASK_WORKER_BOKEH_PORT)
cmd = cmd + " " + str(MASTER_IPADDR)
cmd = cmd + " " + str(WHOAMI)
if DEBUG != None:
cmd = cmd + " " + str(DEBUG)

print(cmd)

process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
output, error = process.communicate()

cmd = "screen -list"

process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
print(output.decode())
112 changes: 112 additions & 0 deletions utils/dask-setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
#!/bin/bash
export NCCL_P2P_DISABLE=1
# export NCCL_SOCKET_IFNAME=ib

export DASK_DISTRIBUTED__SCHEDULER__WORK_STEALING=False
export DASK_DISTRIBUTED__SCHEDULER__BANDWIDTH=1

ENVNAME=$1
NWORKERS=$2
DASK_SCHED_PORT=$3
DASK_SCHED_BOKEH_PORT=$4
DASK_WORKER_BOKEH_PORT=$5
MASTER_IPADDR=$6
WHOAMI=$7
DEBUG=$8

DASK_LOCAL_DIR=./.dask
NUM_GPUS=$(nvidia-smi --list-gpus | wc --lines)
MY_IPADDR=($(hostname --all-ip-addresses))

mkdir -p $DASK_LOCAL_DIR

echo -e "\n"

echo "shutting down current dask cluster if it exists..."
NUM_SCREENS=$(screen -list | grep --only-matching --extended-regexp '[0-9]\ Socket|[0-9]{1,10}\ Sockets' | grep --only-matching --extended-regexp '[0-9]{1,10}')
SCREENS=($(screen -list | grep --only-matching --extended-regexp '[0-9]{1,10}\.dask|[0-9]{1,10}\.gpu' | grep --only-matching --extended-regexp '[0-9]{1,10}'))
if [[ $NUM_SCREENS > 0 ]]; then
screen -wipe
for screen_id in $(seq 1 $NUM_SCREENS);
do
index=$screen_id-1
echo ${SCREENS[$index]}
screen -S ${SCREENS[$index]} -X quit
done
fi
echo "... cluster shut down"

echo -e "\n"

if [[ "0" -lt "$NWORKERS" ]] && [[ "$NWORKERS" -le "$NUM_GPUS" ]]; then

if [[ "$WHOAMI" = "MASTER" ]]; then
echo "initializing dask scheduler..."
screen -dmS dask_scheduler bash -c "source activate $ENVNAME && dask-scheduler"
sleep 5
echo "... scheduler started"
fi

echo -e "\n"

echo "starting $NWORKERS worker(s)..."
declare -a WIDS
for worker_id in $(seq 1 $NWORKERS);
do
start=$(( worker_id - 1 ))
end=$(( NWORKERS - 1 ))
other=$(( start - 1 ))
devs=$(seq --separator=, $start $end)
second=$(seq --separator=, 0 $other)
if [ "$second" != "" ]; then
devs="$devs,$second"
fi
echo "... starting gpu worker $worker_id"

if [[ "$DEBUG" = "DEBUG" ]]; then
export create_worker="source activate $ENVNAME && \
cuda-memcheck dask-worker $MASTER_IPADDR:$DASK_SCHED_PORT \
--host=${MY_IPADDR[0]} --no-nanny \
--nprocs=1 --nthreads=1 \
--memory-limit=0 --name ${MY_IPADDR[0]}_gpu_$worker_id \
--local-directory $DASK_LOCAL_DIR/$name"
export logfile="${DASK_LOCAL_DIR}/gpu_worker_${worker_id}_log.txt"
env CUDA_VISIBLE_DEVICES=$devs screen -dmS gpu_worker_$worker_id \
bash -c 'script -c "$create_worker" "$logfile"'
else
export create_worker="source activate $ENVNAME && \
dask-worker $MASTER_IPADDR:$DASK_SCHED_PORT \
--host=${MY_IPADDR[0]} --no-nanny \
--nprocs=1 --nthreads=1 \
--memory-limit=0 --name ${MY_IPADDR[0]}_gpu_$worker_id \
--local-directory $DASK_LOCAL_DIR/$name"
env CUDA_VISIBLE_DEVICES=$devs screen -dmS gpu_worker_$worker_id \
bash -c "$create_worker"
fi

WIDS[$id]=$!
done
sleep 5

echo -e "\n"

echo "... $NWORKERS worker(s) successfully started"

echo -e "\n"
fi

if [[ "$NWORKERS" -eq "0" ]]; then
NUM_SCREENS=$(screen -list | grep --only-matching --extended-regexp '[0-9]\ Socket|[0-9]{1,10}\ Sockets' | grep --only-matching --extended-regexp '[0-9]{1,10}')
if [[ $NUM_SCREENS == "" ]]; then
echo "cluster shut down successfully"
echo "verifying status:"
screen -list
fi
fi

if [[ "0" -lt "$NWORKERS" ]]; then
echo "printing status ..."
echo -e "\n"
screen -list
echo -e "\n"
fi
11 changes: 11 additions & 0 deletions utils/dask.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
ENVNAME cudf

NWORKERS 8

12.34.567.890 MASTER

DASK_SCHED_PORT 8786
DASK_SCHED_BOKEH_PORT 8787
DASK_WORKER_BOKEH_PORT 8790

DEBUG
5 changes: 5 additions & 0 deletions utils/start-jupyter.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/bash
echo -e "\n"
echo "jupyter-lab --allow-root --ip=0.0.0.0 --no-browser --NotebookApp.token=''"
echo -e "\n"
jupyter-lab --allow-root --ip=0.0.0.0 --no-browser --NotebookApp.token=''
7 changes: 7 additions & 0 deletions utils/stop-jupyter.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash
ps aux | grep jupyter | \
grep --extended-regexp "$USER[\ ]{1,10}[0-9]{1,10}" | \
grep --only-matching --extended-regexp "$USER[\ ]{1,10}[0-9]{1,10}" | \
grep --only-matching --extended-regexp "[\ ]{1,10}[0-9]{1,10}" | \
xargs kill -9
sleep 2

0 comments on commit e54f8bb

Please sign in to comment.