Solution: homework-1.ipynb.
You need to install Python, NumPy, Pandas, Matplotlib and Seaborn. For that, you can the instructions from 06-environment.md.
What's the version of NumPy that you installed?
You can get the version information using the __version__
field:
np.__version__
What's the version of Pandas?
For this homework, we'll use the same dataset as for the next session - the car price dataset.
Download it from here.
You can do it with wget:
wget https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/chapter-02-car-price/data.csv
Or just open it with your browser and click "Save as...".
Now read it with Pandas.
What's the average price of BMW cars in the dataset?
Select a subset of cars after year 2015 (inclusive, i.e. 2015 and after). How many of them have missing values for Engine HP?
- Calculate the average "Engine HP" in the dataset.
- Use the
fillna
method and to fill the missing values in "Engine HP" with the mean value from the previous step. - Now, calcualte the average of "Engine HP" again.
- Has it changed?
Round both means before answering this questions. You can use the round
function for that:
print(round(mean_hp_before))
print(round(mean_hp_after))
- Select all the "Rolls-Royce" cars from the dataset.
- Select only columns "Engine HP", "Engine Cylinders", "highway MPG".
- Now drop all duplicated rows using
drop_duplicates
method (you should get a dataframe with 7 rows). - Get the underlying NumPy array. Let's call it
X
. - Compute matrix-matrix multiplication between the transpose of
X
andX
. To get the transpose, useX.T
. Let's call the resultXTX
. - Invert
XTX
. - What's the sum of all the elements of the result?
Hint: if the result is negative, re-read the task one more time
- Create an array
y
with values[1000, 1100, 900, 1200, 1000, 850, 1300]
. - Multiply the inverse of
XTX
with the transpose ofX
, and then multiply the result byy
. Call the resultw
. - What's the value of the first element of
w
?.
Note: You just implemented linear regression. We'll talk about it in the next lesson.
Submit your results here: https://forms.gle/aiunQqRtqcay8Wwo9.
If your answer doesn't match options exactly, select the closest one.
The deadline for submitting is 13 September 2021, 17:00 CET. After that, the form will be closed.