Class 9 Exercise: Glass Identification

Let's practice what we've learned using the Glass Identification dataset.

Read the data into a DataFrame.
Briefly explore the data to make sure the DataFrame matches your expectations.
Let's convert this into a binary classification problem. Create a new DataFrame column called "binary":
- If type of glass = 1/2/3/4, set binary = 0.
- If type of glass = 5/6/7, set binary = 1.
Create a feature matrix "X" using all features. (Think carefully about which columns are actually features!)
Create a response vector "y" from the "binary" column.
Split X and y into training and testing sets.
Fit a KNN model on the training set using K=5.
Make predictions on the testing set and calculate testing accuracy.
Write a for loop that computes the testing accuracy for a range of K values.
Plot the K value versus testing accuracy to help you choose an optimal value for K.
Calculate the testing accuracy that could be achieved by always predicting the most frequent class in the testing set. (This is known as the "null accuracy".)
Bonus: Explore the data to determine which features look like good predictors, and then redo this exercise using only those features to see if you can achieve a higher testing accuracy!

Provide feedback