This project focuses on building machine learning models to classify celestial objects into stars, galaxies, and quasars using a provided dataset. The following sections detail the workflow and the steps taken in the notebook.
- Introducing Dataset
- Importing Necessary Libraries and Modules
- Exploring the Dataset
- Preparing Data for the Model
- Scaling the Data and Checking Distribution Plots
- Building ML Models and Evaluating Results
The dataset used for this project contains information about celestial objects. The classification task involves predicting whether an object is a star, galaxy, or quasar based on various features provided in the dataset.
We utilize several Python libraries for data manipulation, visualization, and building machine learning models. Here is the list of the libraries:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from tensorflow import keras
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
data = pd.read_csv("/content/Skyserver_SQL2_27_2018 6_51_39 PM.csv")
data.head()
data.shape
data.describe()
data.drop(['objid','specobjid'], axis=1, inplace=True)
data.head(10)
data.info()
The dataset is complete with no missing values.
le = LabelEncoder().fit(data['class'])
data['class'] = le.transform(data['class'])
data.head(10)
data.info()
X = data.drop('class', axis=1)
y = data['class']
Standardizing the dataset to have a mean of 0 and a standard deviation of 1.
scaler = StandardScaler(copy=True, with_mean=True, with_std=True)
X = scaler.fit_transform(X)
We employ several machine learning models to classify the data:
- Decision Tree Classifier
- Logistic Regression
- Naive Bayes
- K-Nearest Neighbors
- Support Vector Machine
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Example for Decision Tree Classifier
dt = DecisionTreeClassifier()
dt.fit(X_train, y_train)
y_pred = dt.predict(X_test)
print(f"Decision Tree Accuracy: {accuracy_score(y_test, y_pred)}")
# Repeat similar steps for other models
This program is under the MIT License
This notebook demonstrates the process of loading a dataset, performing exploratory data analysis, preparing the data, scaling it, and finally building and evaluating several machine learning models to classify celestial objects.
- NumPy Documentation
- Pandas Documentation
- Matplotlib Documentation
- Seaborn Documentation
- TensorFlow Documentation
- Scikit-Learn Documentation
Upload this notebook to a Kaggle session, link the dataset, and run the cells to reproduce the analysis and results.