diff --git a/cuml/README.md b/cuml/README.md index f1bd4b8d..2d99dd41 100644 --- a/cuml/README.md +++ b/cuml/README.md @@ -8,7 +8,7 @@ * `tsvd_demo`: notebook showcasing truncated singular value decomposition (tsvd) algorithm comparison between cuML and scikit-learn. * `linear_regression`: notebook showcasing linear regression comparison between cuML and scikit-learn. * `ridge_regression`: notebook showcasing ridge regression comparison between cuML and scikit-learn. -* `sgd`: notebook showcasing ridge regression comparison between cuML and scikit-learn. +* `sgd`: notebook showcasing stochastic gradient descent comparison between cuML and scikit-learn. * `umap`: notebook showcasing and evaluating cuML's UMAP dimension reduction technique. ## dbscan_demo @@ -37,8 +37,6 @@ Note that the timing differences depend upon the exact dataset being used. Also, ## knn_demo -The `knn_demo` notebook demonstrates how cuml establishes interoperability between `cudf` and `faiss-gpu`. There is native support for this demo with CUDA 9.2. With CUDA 10.0, the user must build `faiss-gpu` from source with [GPU support](https://github.com/facebookresearch/faiss/blob/master/INSTALL.md). - Typical output of the cells processing knn looks like: - For scikit-learn knn: @@ -111,24 +109,46 @@ compare tsvd: cuml vs sklearn transformed results equal Typical output of the cells processing linear_regression looks like: - For scikit-learn: + +Fit: + +``` +CPU times: user 1min 8s, sys: 21.2 s, total: 1min 30s +Wall time: 6.06 s +``` + +Predict: + ``` -CPU times: user s, sys: s, total: s -Wall time: s +CPU times: user 5.46 s, sys: 312 ms, total: 5.77 s +Wall time: 471 ms ``` + - For cuML: + +Fit: + ``` -CPU times: user s, sys: s, total: s -Wall time: s +CPU times: user 504 ms, sys: 347 ms, total: 851 ms +Wall time: 1.08 s ``` +Predict: + +``` +CPU times: user 144 ms, sys: 7.71 ms, total: 152 ms +Wall time: 145 ms +``` + + Final cell of the notebook should output: ``` SKL MSE(y): -3.4750864e-13 +5.6481553e-05 CUML MSE(y): -5.827862e-07 +7.246567e-07 ``` ## ridge_regression_demo @@ -136,89 +156,97 @@ CUML MSE(y): Typical output of the cells processing ridge_regression looks like: - For scikit-learn: -``` -CPU times: user s, sys: s, total: s -Wall time: s -``` -- For cuML: +Fit: + ``` -CPU times: user s, sys: s, total: s -Wall time: s +CPU times: user 1min 1s, sys: 1.99 s, total: 1min 3s +Wall time: 5.02 s ``` -Final cell of the notebook should output: +Predict: ``` -SKL MSE(y): -1.8886121326984265e-08 -CUML MSE(y): -1.9204549e-08 +CPU times: user 2.66 s, sys: 75.6 ms, total: 2.74 s +Wall time: 180 ms ``` -## sgd_demo -Typical output of the cells processing sgd looks like: +- For cuML: + +Fit: -- For scikit-learn: ``` -CPU times: user s, sys: s, total: s -Wall time: s +CPU times: user 518 ms, sys: 309 ms, total: 827 ms +Wall time: 831 ms ``` -- For cuML: +Predict: + ``` -CPU times: user s, sys: s, total: s -Wall time: s +CPU times: user 146 ms, sys: 7.31 ms, total: 154 ms +Wall time: 149 ms ``` Final cell of the notebook should output: ``` SKL MSE(y): -1.1356839999498491e-07 +0.0204307456949534 CUML MSE(y): -1.04257616e-07 +0.00012496959 ``` -## umap_demo +## sgd_demo -Typical output of the cells processing umap looks like: +Typical output of the cells processing sgd looks like: -- For blobs clustering test: -``` -CPU times: user 15.4 s, sys: 1.71 s, total: 17.1 s -Wall time: 4.53 s -``` +- For scikit-learn: -Final cell of blobs clustering test should output: +Fit: ``` -Cluster demonstration completed successfully +CPU times: user 10min 22s, sys: 417 ms, total: 10min 22s +Wall time: 10min 20s ``` +Prefict: -- For trustworthiness evaluation using random initialization: ``` -CPU times: user 9.84 s, sys: 380 ms, total: 10.2 s -Wall time: 945 ms +CPU times: user 146 ms, sys: 63 ms, total: 209 ms +Wall time: 166 ms ``` -Final cell of trustworthiness evaluation with random initialization should output: + +- For cuML: + +Fit: ``` -Trustworthiness on random initialization passed successfully +CPU times: user 2min 13s, sys: 8.1 s, total: 2min 21s +Wall time: 2min 18s ``` -- For trustworthiness evaluation using spectral initialization: +Prefict: ``` -CPU times: user 9.83 s, sys: 296 ms, total: 10.1 s -Wall time: 832 ms +CPU times: user 139 ms, sys: 10.9 ms, total: 150 ms +Wall time: 142 ms ``` -Final cell of trustworthiness evaluation with spectral initialization should output: +Final cell of the notebook should output: ``` -Trustworthiness on spectral initialization passed successfully +SKL MSE(y): +1.144686926876654e-07 +CUML MSE(y): +1.0390148e-07 ``` + +## umap_demo + +This notebook currently performs assertions and does not print any output. It contains two types of tests that evaluate UMAP's ability to preserve local neighborhood structure. + +The first test verifies that when the input contains blobs generated from several different clusters, the UMAP output produces low-dimensional blobs with the same number of clusters. + +The second test demonstrates that the neighborhoods of the low-dimensional embeddings are similar to the neighborhoods of the inputs. A score, known as trustworthiness, and made popular by t-SNE, is used to evaluate the UMAP embeddings for both random and spectral initialization strategies.