Skip to content

This repository includes my House Prices Multi-Variate Linear Regression-Flatiron School Module 2 Project. In this project I made use of the OSEMN methodology incorporating packages such as Pandas, NumPy, Matplotlib, Seaborn, and Scikit-Learn.

Notifications You must be signed in to change notification settings

lopez-christian/House-Prices-Linear-Regression-Project

Repository files navigation

Multi-Variate Linear Regression Model

Building a Multi-Variate Linear Regression Model using King County,WA House Prices Dataset

Using Scikit-Learn

Screen Shot 2020-04-27 at 10 03 24 PM

This project will make use of Pandas, and NumPy for the data exploration phase as well as using Matplotlib and Seaborn to form visualizations. We will then be using Scikit-Learn to model our multi-variate linear regression. We will incorporate some dummies datasets creation to deal with categorical data as well as log-transformation methodology to deal with the continuous features of the dataset.

Columns we will be examining:

Screen Shot 2020-04-25 at 6 14 27 PM

Example of dataframe head and tail:

Screen Shot 2020-04-25 at 6 12 02 PM

Purpose of project:

The purpose of this project is to come up with ways in which to maximize profitability for sellers attempting to sell a home in King County,WA. We will search for actionable insights that will serve guidance to these sellers, but we need a thorough understanding of the dynamics of the housing market in order to drive our calculated decisions.

3 Recommendations I would suggest to sellers:

Recommendation # 1:

Screen Shot 2020-04-25 at 6 27 13 PM

My first recommendation to sellers would be to make living space square footage their focal point. Correlation between square footage of living space and price of the home is fairly high compared to the other features. It is clear that larger homes mandate higher asking prices. Selling homes on the larger-end of the spectrum are guaranteed to generate the most revenue.

Recommendation # 2:

Screen Shot 2020-04-25 at 6 35 30 PM

My second recommendation would be to pay particular attention to the locality of the home. House prices are clustered according to zipcode. Many factors and variables, tied into the zipcode, may influence the price either positively or negatively and we must be mindful of that.

Recommendation #3:

Screen Shot 2020-04-25 at 6 41 15 PM

My third recommendation would be to attend to the grade given by King County to the home. It is very influential in the price of the home. In general, as the grade increases, the price increases as well. This highlights the positive linear correlation between the two.

Screen Shot 2020-04-25 at 6 50 05 PM

Sidenote: The grade distribution follows a normal curve, which suggests that they are being issued in a forthright and diligent manner. If interested it would be engaging to see what goes into the grading component of the homes. But that's a project for another time.

Screen Shot 2020-04-25 at 6 58 08 PM

This correlational heatmap was used throughout the project to guide me in the feature selection process and may be very helpful and finding other interesting correlational to experiment with.

Multi-variate linear regression model using Scikit-Learn:

Screen Shot 2020-04-25 at 7 11 13 PM

Screen Shot 2020-04-25 at 7 11 32 PM

Screen Shot 2020-04-25 at 7 11 42 PM

Screen Shot 2020-04-25 at 7 13 02 PM

Screen Shot 2020-04-25 at 7 13 16 PM

Please take a look at the jupyter notebook file included with this repository. I include bonus recommendations and future work/research to keep in mind if you hope to expand on my work.

Key takeways:

1. Make living space square footage your number one feature to look out for.

2. Location is an extremely important feature when evaluating the price of a home.

3. The grade given to a home by the King County Housing Department is very influential in the price.

About

This repository includes my House Prices Multi-Variate Linear Regression-Flatiron School Module 2 Project. In this project I made use of the OSEMN methodology incorporating packages such as Pandas, NumPy, Matplotlib, Seaborn, and Scikit-Learn.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published