Skip to content

Commit

Permalink
Additional updates to provide full set of runtime numbers and to fix …
Browse files Browse the repository at this point in the history
…notebook descriptions, typos, etc.
  • Loading branch information
rlratzel committed Mar 29, 2019
1 parent 54a3ca6 commit d96617d
Showing 1 changed file with 78 additions and 50 deletions.
128 changes: 78 additions & 50 deletions cuml/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
* `tsvd_demo`: notebook showcasing truncated singular value decomposition (tsvd) algorithm comparison between cuML and scikit-learn.
* `linear_regression`: notebook showcasing linear regression comparison between cuML and scikit-learn.
* `ridge_regression`: notebook showcasing ridge regression comparison between cuML and scikit-learn.
* `sgd`: notebook showcasing ridge regression comparison between cuML and scikit-learn.
* `sgd`: notebook showcasing stochastic gradient descent comparison between cuML and scikit-learn.
* `umap`: notebook showcasing and evaluating cuML's UMAP dimension reduction technique.

## dbscan_demo
Expand Down Expand Up @@ -37,8 +37,6 @@ Note that the timing differences depend upon the exact dataset being used. Also,

## knn_demo

The `knn_demo` notebook demonstrates how cuml establishes interoperability between `cudf` and `faiss-gpu`. There is native support for this demo with CUDA 9.2. With CUDA 10.0, the user must build `faiss-gpu` from source with [GPU support](https://github.com/facebookresearch/faiss/blob/master/INSTALL.md).

Typical output of the cells processing knn looks like:

- For scikit-learn knn:
Expand Down Expand Up @@ -111,114 +109,144 @@ compare tsvd: cuml vs sklearn transformed results equal
Typical output of the cells processing linear_regression looks like:

- For scikit-learn:

Fit:

```
CPU times: user 1min 8s, sys: 21.2 s, total: 1min 30s
Wall time: 6.06 s
```

Predict:

```
CPU times: user s, sys: s, total: s
Wall time: s
CPU times: user 5.46 s, sys: 312 ms, total: 5.77 s
Wall time: 471 ms
```


- For cuML:

Fit:

```
CPU times: user s, sys: s, total: s
Wall time: s
CPU times: user 504 ms, sys: 347 ms, total: 851 ms
Wall time: 1.08 s
```

Predict:

```
CPU times: user 144 ms, sys: 7.71 ms, total: 152 ms
Wall time: 145 ms
```


Final cell of the notebook should output:

```
SKL MSE(y):
3.4750864e-13
5.6481553e-05
CUML MSE(y):
5.827862e-07
7.246567e-07
```

## ridge_regression_demo

Typical output of the cells processing ridge_regression looks like:

- For scikit-learn:
```
CPU times: user s, sys: s, total: s
Wall time: s
```

- For cuML:
Fit:

```
CPU times: user s, sys: s, total: s
Wall time: s
CPU times: user 1min 1s, sys: 1.99 s, total: 1min 3s
Wall time: 5.02 s
```

Final cell of the notebook should output:
Predict:

```
SKL MSE(y):
1.8886121326984265e-08
CUML MSE(y):
1.9204549e-08
CPU times: user 2.66 s, sys: 75.6 ms, total: 2.74 s
Wall time: 180 ms
```

## sgd_demo

Typical output of the cells processing sgd looks like:
- For cuML:

Fit:

- For scikit-learn:
```
CPU times: user s, sys: s, total: s
Wall time: s
CPU times: user 518 ms, sys: 309 ms, total: 827 ms
Wall time: 831 ms
```

- For cuML:
Predict:

```
CPU times: user s, sys: s, total: s
Wall time: s
CPU times: user 146 ms, sys: 7.31 ms, total: 154 ms
Wall time: 149 ms
```

Final cell of the notebook should output:

```
SKL MSE(y):
1.1356839999498491e-07
0.0204307456949534
CUML MSE(y):
1.04257616e-07
0.00012496959
```

## umap_demo
## sgd_demo

Typical output of the cells processing umap looks like:
Typical output of the cells processing sgd looks like:

- For blobs clustering test:
```
CPU times: user 15.4 s, sys: 1.71 s, total: 17.1 s
Wall time: 4.53 s
```
- For scikit-learn:

Final cell of blobs clustering test should output:
Fit:

```
Cluster demonstration completed successfully
CPU times: user 10min 22s, sys: 417 ms, total: 10min 22s
Wall time: 10min 20s
```

Prefict:

- For trustworthiness evaluation using random initialization:
```
CPU times: user 9.84 s, sys: 380 ms, total: 10.2 s
Wall time: 945 ms
CPU times: user 146 ms, sys: 63 ms, total: 209 ms
Wall time: 166 ms
```

Final cell of trustworthiness evaluation with random initialization should output:

- For cuML:

Fit:

```
Trustworthiness on random initialization passed successfully
CPU times: user 2min 13s, sys: 8.1 s, total: 2min 21s
Wall time: 2min 18s
```

- For trustworthiness evaluation using spectral initialization:
Prefict:

```
CPU times: user 9.83 s, sys: 296 ms, total: 10.1 s
Wall time: 832 ms
CPU times: user 139 ms, sys: 10.9 ms, total: 150 ms
Wall time: 142 ms
```

Final cell of trustworthiness evaluation with spectral initialization should output:
Final cell of the notebook should output:

```
Trustworthiness on spectral initialization passed successfully
SKL MSE(y):
1.144686926876654e-07
CUML MSE(y):
1.0390148e-07
```

## umap_demo

This notebook currently performs assertions and does not print any output. It contains two types of tests that evaluate UMAP's ability to preserve local neighborhood structure.

The first test verifies that when the input contains blobs generated from several different clusters, the UMAP output produces low-dimensional blobs with the same number of clusters.

The second test demonstrates that the neighborhoods of the low-dimensional embeddings are similar to the neighborhoods of the inputs. A score, known as trustworthiness, and made popular by t-SNE, is used to evaluate the UMAP embeddings for both random and spectral initialization strategies.

0 comments on commit d96617d

Please sign in to comment.