In this project, you will apply basic machine learning concepts on data collected for housing prices in the Boston, Massachusetts area to predict the selling price of a new home. You will first explore the data to obtain important features and descriptive statistics about the dataset. Next, you will properly split the data into testing and training subsets, and determine a suitable performance metric for this problem. You will then analyze performance graphs for a learning algorithm with varying parameters and training set sizes. This will enable you to pick the optimal model that best generalizes for unseen data. Finally, you will test this optimal model on a new sample and compare the predicted selling price to your statistics.
This project is designed to get you acquainted to working with datasets in Python and applying basic machine learning techniques using NumPy and Scikit-Learn. Before being expected to use many of the available algorithms in the sklearn library, it will be helpful to first practice analyzing and interpreting the performance of your model.
Things you will learn by completing this project:
- How to use NumPy to investigate the latent features of a dataset.
- How to analyze various learning performance plots for variance and bias.
- How to determine the best-guess model for predictions from unseen data.
- How to evaluate a model's performance on unseen data using previous data.
The Boston housing market is highly competitive, and you want to be the best real estate agent in the area. To compete with your peers, you decide to leverage a few basic machine learning concepts to assist you and a client with finding the best selling price for their home. Luckily, you've come across the Boston Housing dataset which contains aggregated data on various features for houses in Greater Boston communities, including the median value of homes for each of those areas. Your task is to build an optimal model based on a statistical analysis with the tools available. This model will then be used to estimate the best selling price for your clients' homes.
This project uses the following software and Python libraries:
You will also need to have software installed to run and execute a Jupyter Notebook.
If you do not have Python installed yet, it is highly recommended that you install the Anaconda distribution of Python, which already has the above packages and more included.
For this assignment, you can find the boston_housing
folder containing the necessary project files on the Machine Learning projects GitHub, under the projects
folder. You may download all of the files for projects we'll use in this Nanodegree program directly from this repo. Please make sure that you use the most recent version of project files when completing a project!
This project contains three files:
boston_housing.ipynb
: This is the main file where you will be performing your work on the project.housing.csv
: The project dataset. You'll load this data in the notebook.visuals.py
: This Python script provides supplementary visualizations for the project. Do not modify.
In the Terminal or Command Prompt, navigate to the folder containing the project files, and then use the command jupyter notebook boston_housing.ipynb
to open up a browser window or tab to work with your notebook. Alternatively, you can use the command jupyter notebook
or ipython notebook
and navigate to the notebook file in the browser window that opens. Follow the instructions in the notebook and answer each question presented to successfully complete the project. A README file has also been provided with the project files which may contain additional necessary information or instruction for the project.
Your project will be reviewed by a Udacity reviewer against the Predicting Boston Housing Prices project rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.
When you are ready to submit your project, collect the following files and compress them into a single archive for upload. Alternatively, you may supply the following files on your GitHub Repo in a folder named boston_housing
for ease of access:
- The
boston_housing.ipynb
notebook file with all questions answered and all code cells executed and displaying output. - An HTML export of the project notebook with the name report.html. This file must be present for your project to be evaluated.
Once you have collected these files and reviewed the project rubric, proceed to the project submission page.
When you're ready to submit your project, click on the Submit Project button at the bottom of the page.
If you are having any problems submitting your project or wish to check on the status of your submission, please email us at [email protected] or visit us in the discussion forums.
You will get an email as soon as your reviewer has feedback for you. In the meantime, review your next project and feel free to get started on it or the courses supporting it!