Skip to content

Latest commit

 

History

History
148 lines (99 loc) · 3.75 KB

README.md

File metadata and controls

148 lines (99 loc) · 3.75 KB

CAR CLASSIFICATION PROJECT

K-NEAREST NEIGHBOURS

a classification algorithm / deals with irregular data

Attributes

  1. Buying cost _"buying"
  2. Maintenance _"maint"
  3. Number of doors _"door"
  4. Number of Persons _"persons"
  5. Boot size _"lug_boot"
  6. Safety degree _"safety"

Label / The prediction

  1. class = ["unacc", "acc", "good", "vgood"]

Requirements

  1. pandas

    python import pandas as pd

  2. Numpy

    python import numpy as np

  3. sklearn

    import sklearn 
    from sklearn import linear_model, preprocessing
    from sklearn.utils import shuffle 
    from sklearn.neighbors import KNeighborsClassifier

Steps of the project:

STEP 1:

  • We have to read in our dataset. Using panda.

    data = pd.read_csv("car.data")

STEP 2:

  • To encode the non-integral data values

    encode = preprocessing.LabelEncoder()
    
    buying = encode.fit_transform(list(data["buying"]))
    maint = encode.fit_transform(list(data["maint"]))
    door = encode.fit_transform(list(data["door"]))
    persons = encode.fit_transform(list(data["persons"]))
    lug_boot = encode.fit_transform(list(data["lug_boot"]))
    safety = encode.fit_transform(list(data["safety"]))
    cls = encode.fit_transform(list(data["class"]))

STEP 3:

  • Here we define what we want to predict. The label.

    predict = "cls"
    x = list(zip(buying, maint, door, persons, lug_boot, safety)) 
  • This line defines attributes that will help with prediction.

    y = list(cls)
  • This line gives only the 'class' value.

STEP 4:

  • We divide x & y into four('x train', 'y train', 'x test', 'y test'). Using sklearn!

    x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size=0.1) 
  • This line splits our data X and Y, This is for TRAINING & TESTING.

  • The line 'test_size=0.1', means that from our dataset 10% will be used for testing.

STEP 5:

  • create the training model.

    model = KNeighborsClassifier(n_neighbors=7)
    model.fit(x_train, y_train)
  • n_neighbours=7 , means a maximum of 7 neighbours is allowed

  • You can use more or less eg: n_neighbours=5, n_neighbours=11 etc..

  • only odd numbers are needed.

    accuracy = model.score(x_test, y_test)
    print(accuracy)
  • model-score, finds how accurate the model is!

HOW K-NEAREST NEIGHBOURS works!!!

Web capture_20-6-2023_91031_app whiteboard microsoft com
Web capture_20-6-2023_91842_app whiteboard microsoft com
Web capture_20-6-2023_93627_app whiteboard microsoft com

STEP 6:

```python 
prediction = model.predict(x_test)
names = ["unacc", "acc", "good", "vgood"]

for x in range(len(prediction)):
    print(names[prediction[x]], x_test[x], names[y_test[x]])
    # or
    print("Predicted: ", names[prediction[x]], "Actual: ", names[y_test[x]])
    
```
  • On the first line our model makes a prediction.
  • A for loop to iterate through each prediction!
  • names variable & n helps give proper class names to our output.

STEP 7:

  • Saving our model

  • If we were to save our model it would consume alot of space, it's one limitations of this algorithm.

  • This because of the magnitude calculations of each single value with all the other possibilities.

The END