MoRe

Movie Recommendation System Using Cosine Similarity

Based on the past user behavior, MoRe recommends the movies to users based on their similarity. It suggests movies to users with a recommendation rate that is greater than the preference rate of movie for the same user. So in core words it will give recommendations which are never liked by other, but a user might like that.

Image Credit - [mohamed_hassan](https://pixabay.com/users/mohamed_hassan-5229782/)

Introduction to Recommendation System

Recommendation systems are the systems that are designed to recommend things to user based on many different factors. These system predict things that users are more likely to purchase or interested in it. Giant companies Google, Amazon, Netflix use recommendation system to help their users to purchase products or movies for them. Recommendation system recommends you the items based on past activities this is known as Content Based Filtering or the preference of the other user's that are to similar to you this is known as Collaborative Based Filtering .

Cosine Similarity

Cosine similarity is a metric used to measure how similar two items are. Mathematically it calculates the cosine of the angle between two vectors projected in a multidimensional space. Cosine similarity is advantageous when two similar documents are far apart by Euclidean distance(size of documents) chances are they may be oriented closed together. The smaller the angle, higher the cosine similarity.

1 - cosine-similarity = cosine-distance

Code

Jupyter python notebook is available at nbviewer.

Download the dataset from here

Importing the important libraries

import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

Loading the dataset and converting it into dataframe using pandas

df = pd.read_csv("movie_dataset.csv")

Features list

we'll choose the features that are most relevant to us and store it in the list name features .

features = ['keywords', 'cast', 'genres', 'director']

Removing null values

Data preprocessing is needed before proceeding further. Hence all the null values must be removed.

for feature in features:
    df[feature] = df[feature].fillna('')

Combined features

combining all the features in the single feature and difference column to the existing dataset.

def combined_features(row):
    return row['keywords']+" "+row['cast']+" "+row['genres']+" "+row['director']

df['combined_features'] = df.apply(combined_features,axis = 1)

Extracting features

now we'll extract the features by using sklearn's feature_extraction module it helps us to extract feature into format supported by machine learning algorithms.

CountVetcorizer()'s fit_transform we'll help to count the number of the text present in the document.

cv = CountVectorizer()
count_matrix = cv.fit_transform(df['combined_features'])
print("Count Matrix: ",count_matrix.toarray())

Cosine similarity

sklearn has the module cosine_similarity which we'll use to compute the similarity between two vectors.

cosine_sim = cosine_similarity(count_matrix)

cosine_sim is a numpy array with calculated cosine similarity between tw movies

Content user like as we are building content based filtering.

Now we'll take the input movies in the movie_user_like variable. Since we're building content based recommendation system we need to know the the content user like in order to predict the similar.

movie_user_like = "Dead Poets Society"

def get_index_from(title):
    return df[df.title == title]["index"].values[0]

movie_index = get_index_from(movie_user_like)

Generating similar movies matrix

similar_movies = list(enumerate(cosine_sim[movie_index]))

Sorting the similar movies in descending order

sorted_similar_movies = sorted(similar_movies, key = lambda x:x[1], reverse = True)

Printing the similar movies

def get_title_from_index(index):
    return df[df.index == index]["title"].values[0]

i=0
for movies in sorted_similar_movies:
    print(get_title_from_index(movies[0]))
    i = i+1;
    if i>15:
        break

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
images		images
README.md		README.md
movie_dataset.csv		movie_dataset.csv
movie_recommendation_system.ipynb		movie_recommendation_system.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MoRe

Movie Recommendation System Using Cosine Similarity

Table of Content

Introduction to Recommendation System

Cosine Similarity

Code

Importing the important libraries

Loading the dataset and converting it into dataframe using pandas

Features list

Removing null values

Combined features

Extracting features

Cosine similarity

Content user like as we are building content based filtering.

Generating similar movies matrix

Sorting the similar movies in descending order

Printing the similar movies

About

Releases

Packages

Contributors 2

Languages

pravinkumarosingh/MoRe

Folders and files

Latest commit

History

Repository files navigation

MoRe

Movie Recommendation System Using Cosine Similarity

Table of Content

Introduction to Recommendation System

Cosine Similarity

Code

Importing the important libraries

Loading the dataset and converting it into dataframe using pandas

Features list

Removing null values

Combined features

Extracting features

Cosine similarity

Content user like as we are building content based filtering.

Generating similar movies matrix

Sorting the similar movies in descending order

Printing the similar movies

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages