Skip to content

Data analysis and machine learning project focused on assessing water quality using Decision Tree and Random Forest models

Notifications You must be signed in to change notification settings

haticeozbolat01/Water-Quality-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Water Quality Explanatory Data Analysis

Overview

This project focuses on analyzing water quality data using various statistical and machine learning techniques. The primary objectives include bivariate and multivariate data analysis, correlation analysis, data preprocessing, and modeling using Decision Tree and Random Forest Classifiers.

What You Will Learn

  1. Bivariate and Multivariate Data Analysis: Explore relationships between different variables in the dataset.
  2. Correlation Analysis: Understand how different features are related to each other.
  3. Data Preprocessing:
    • Handling missing values.
    • Splitting the data into training and testing sets.
    • Normalizing the data for better model performance.
  4. Modeling:
    • Implementing Decision Tree and Random Forest classifiers to predict water quality.
    • Visualizing the Decision Tree to interpret model decisions.
    • Tuning hyperparameters of the Random Forest to optimize model accuracy.

Dataset

The dataset used in this analysis is focused on water quality metrics. It includes several features that help in determining the quality of water based on different chemical properties.

Project Structure

  • Data Preprocessing: Includes steps for handling missing data, normalization, and train-test splitting.
  • Exploratory Data Analysis (EDA): Visualizations and statistical summaries to understand data distribution and feature relationships.
  • Modeling: Implementation of machine learning models such as Decision Tree and Random Forest, along with their evaluation.
  • Visualization: Graphical representation of the Decision Tree model.
  • Hyperparameter Tuning: Optimization of Random Forest parameters for better performance.

Requirements

  • Python 3.x
  • Jupyter Notebook
  • Libraries:
    • pandas
    • numpy
    • scikit-learn
    • matplotlib
    • seaborn

How to Run the Project

  1. Clone the repository or download the notebook file.
  2. Install the required libraries listed above.
  3. Open the notebook file in Jupyter Notebook.
  4. Run the cells sequentially to execute the analysis.

Results

The analysis provides insights into the most significant features affecting water quality and presents a predictive model with optimized accuracy.

Conclusion

This project demonstrates the application of data analysis and machine learning techniques to assess water quality, offering valuable insights and predictive capabilities.

Releases

No releases published

Packages

No packages published