K-NEAREST NEIGHBOURS
a classification algorithm / deals with irregular data
- Buying cost _"buying"
- Maintenance _"maint"
- Number of doors _"door"
- Number of Persons _"persons"
- Boot size _"lug_boot"
- Safety degree _"safety"
- class = ["unacc", "acc", "good", "vgood"]
-
pandas
python import pandas as pd
-
Numpy
python import numpy as np
-
sklearn
import sklearn from sklearn import linear_model, preprocessing from sklearn.utils import shuffle from sklearn.neighbors import KNeighborsClassifier
STEP 1:
-
We have to read in our dataset. Using panda.
data = pd.read_csv("car.data")
STEP 2:
-
To encode the non-integral data values
encode = preprocessing.LabelEncoder() buying = encode.fit_transform(list(data["buying"])) maint = encode.fit_transform(list(data["maint"])) door = encode.fit_transform(list(data["door"])) persons = encode.fit_transform(list(data["persons"])) lug_boot = encode.fit_transform(list(data["lug_boot"])) safety = encode.fit_transform(list(data["safety"])) cls = encode.fit_transform(list(data["class"]))
STEP 3:
-
Here we define what we want to predict. The label.
predict = "cls"
x = list(zip(buying, maint, door, persons, lug_boot, safety))
-
This line defines attributes that will help with prediction.
y = list(cls)
-
This line gives only the 'class' value.
STEP 4:
-
We divide x & y into four('x train', 'y train', 'x test', 'y test'). Using sklearn!
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size=0.1)
-
This line splits our data X and Y, This is for TRAINING & TESTING.
-
The line 'test_size=0.1', means that from our dataset 10% will be used for testing.
STEP 5:
-
create the training model.
model = KNeighborsClassifier(n_neighbors=7) model.fit(x_train, y_train)
-
n_neighbours=7 , means a maximum of 7 neighbours is allowed
-
You can use more or less eg: n_neighbours=5, n_neighbours=11 etc..
-
only odd numbers are needed.
accuracy = model.score(x_test, y_test) print(accuracy)
-
model-score, finds how accurate the model is!
STEP 6:
```python
prediction = model.predict(x_test)
names = ["unacc", "acc", "good", "vgood"]
for x in range(len(prediction)):
print(names[prediction[x]], x_test[x], names[y_test[x]])
# or
print("Predicted: ", names[prediction[x]], "Actual: ", names[y_test[x]])
```
- On the first line our model makes a prediction.
- A for loop to iterate through each prediction!
- names variable & n helps give proper class names to our output.
STEP 7:
-
Saving our model
-
If we were to save our model it would consume alot of space, it's one limitations of this algorithm.
-
This because of the magnitude calculations of each single value with all the other possibilities.
The END