Skip to content

Commit

Permalink
Merge pull request #1 from sahandha/paper
Browse files Browse the repository at this point in the history
Paper
  • Loading branch information
mgckind authored Oct 1, 2018
2 parents 2d3d38b + b431d2c commit 7554885
Show file tree
Hide file tree
Showing 23 changed files with 84,066 additions and 169,068 deletions.
10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
.ipynb_checkpoints*
paper/images/
paper/other_files/
paper/Makefile
paper/joss-logo.png
paper/latex.template
*.pyc
build/
dist/
eif.egg-info/
895 changes: 0 additions & 895 deletions Notebooks/.ipynb_checkpoints/IsolationForest-checkpoint.ipynb

This file was deleted.

790 changes: 0 additions & 790 deletions Notebooks/.ipynb_checkpoints/TreeGraphPlotting-checkpoint.ipynb

This file was deleted.

83,433 changes: 0 additions & 83,433 deletions Notebooks/.ipynb_checkpoints/TreeVisualization-checkpoint.ipynb

This file was deleted.

221 changes: 0 additions & 221 deletions Notebooks/.ipynb_checkpoints/general_3D_examples-checkpoint.ipynb

This file was deleted.

1,262 changes: 1,262 additions & 0 deletions Notebooks/EIF.ipynb

Large diffs are not rendered by default.

895 changes: 0 additions & 895 deletions Notebooks/IsolationForest.ipynb

This file was deleted.

165,012 changes: 82,228 additions & 82,784 deletions Notebooks/TreeVisualization.ipynb

Large diffs are not rendered by default.

82 changes: 70 additions & 12 deletions Notebooks/general_3D_examples.ipynb
Original file line number Diff line number Diff line change
@@ -1,9 +1,28 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook is similar to IsolationForest.ipynb in that it demonstrates a use case of the EIF. However, unlike the other notebook, we demonstrate a use case for three dimensional data. "
]
},
{
"cell_type": "markdown",
"metadata": {
"heading_collapsed": true
},
"source": [
"# Imports"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"metadata": {
"collapsed": true,
"hidden": true
},
"outputs": [],
"source": [
"%matplotlib inline\n",
Expand Down Expand Up @@ -31,15 +50,19 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"heading_collapsed": true
},
"source": [
"## Produce Data"
"## Generate Data"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
Expand Down Expand Up @@ -76,20 +99,31 @@
"## Trees and training"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we are working with three dimensional data, there are two extension levels to the standard isolation Forest. The extension level 2 is of course the fully extended case. We provide examples for each extension level."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"F0 = iso.iForest(X,ntrees=500, sample_size=sample, ExtensionLevel=0)\n",
"F0 = iso.iForest(X,ntrees=500, sample_size=sample, ExtensionLevel=0) # Extension level 0 is the same as the standard Isolation Forest. \n",
"F1 = iso.iForest(X,ntrees=500, sample_size=sample, ExtensionLevel=1)\n",
"F2 = iso.iForest(X,ntrees=500, sample_size=sample, ExtensionLevel=2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"heading_collapsed": true
},
"source": [
"## Scores and distributions"
]
Expand All @@ -98,7 +132,8 @@
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true
"collapsed": true,
"hidden": true
},
"outputs": [],
"source": [
Expand All @@ -107,10 +142,21 @@
"S2 = F2.compute_paths(X_in=X)"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true
},
"source": [
"The distribution of anomaly scores are shown. By definition, anomalies are those that occur less frequently. So it makes sense that the number of points with higher anomaly scores reduces as the score increases. "
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
Expand All @@ -133,10 +179,21 @@
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true
},
"source": [
"Here we plot the points and highlight a 10 points with highest and 10 points with lowest anomaly scores. The two plots provide a comparison between the two algorithms."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
Expand Down Expand Up @@ -180,7 +237,8 @@
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
"collapsed": true,
"hidden": true
},
"outputs": [],
"source": []
Expand All @@ -203,7 +261,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
"version": "3.6.1"
},
"toc": {
"nav_menu": {},
Expand Down
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<a href="https://github.com/sahandha/eif/releases/tag/v1.0.1"> <img src="https://img.shields.io/badge/release-v1.0.1-blue.svg" alt="latest release" /></a><a href="https://pypi.org/project/eif/1.0.1/"><img src="https://img.shields.io/badge/pypi-v1.0.1-orange.svg" alt="pypi version"/></a>
<a href="https://github.com/sahandha/eif/releases/tag/v1.0.2"> <img src="https://img.shields.io/badge/release-v1.0.2-blue.svg" alt="latest release" /></a><a href="https://pypi.org/project/eif/1.0.2/"><img src="https://img.shields.io/badge/pypi-v1.0.2-orange.svg" alt="pypi version"/></a>
# Extended Isolation Forest

This is a simple package implementation for the Extended Isolation Forest method. It is an improvement on the original algorithm Isolation Forest which is described (among other places) in this [paper](icdm08b.pdf) for detecting anomalies and outliers from a data point distribution. The original code can be found at [https://github.com/mgckind/iso_forest](https://github.com/mgckind/iso_forest)
Expand Down Expand Up @@ -28,12 +28,16 @@ In addition, it also contains means to draw the trees created using the [igraph]

See these notebooks for examples on how to use it

- [Basics](Notebooks/IsolationForest.ipynb)
- [Basics](Notebooks/EIF.ipynb)
- [3D Example](Notebooks/general_3D_examples.ipynb)
- [Tree visualizations](Notebooks/TreeVisualization.ipynb)

## Release

### v1.0.2
#### 2018-OCT-01
- Added documentation, examples and software paper

### v1.0.1
#### 2018-AUG-08
- Bugfix for multidimensional data
Expand Down
Loading

0 comments on commit 7554885

Please sign in to comment.