Group project assignment for AI6122
Submission:
- Li Kaiyu
- Chen Lei
- Li Jiayi
- Chen Yueqi
- Chang Lo-Wei
The following softwares need to be installed on your system:
- Anaconda: Can be download from https://docs.anaconda.com/anaconda/install/index.html
- Jupyter NoteBook: Can be download via pip using the following command.
pip install jupyter notebook
- Amazon Product Review Dataset: The datasets are available at https://jmcauley.ucsd.edu/data/amazon/. Please go to the website, and download 'Digital_Music_5.json' and 'Kindle_Store_5.json' under the section titled "Small" subsets for expreimentation, i.e., the 5-core datasets.
- Open the Anaconda Prompt console and put the nlp.yaml in the same directory. Next put in the followinng command to create a anaconda environment named 'nlp'. then all packages needed will be installed automatically.
conda env create -f nlp.yaml
- Activate the 'nlp' environment with the following command.
conda activate nlp
- Create a new directory named 'data' under the root directory of the project codes, then put 'Digital_Music_5.json' and 'Kindle_Store_5.json' into 'data' directory.
- Open the Jupyter notebook "Data Analysis.ipynb"
- Simply run each code cell in order from top to bottom. The first line in each cell explains the function of this cell.
-
Set up the system.
python Search\ Engine.py
-
Input the query you want to search with the format of "reviwerID* asin* plain-text*" (order is interchangeable here and * represents 0 or more occurrences of the preceding term). Then press enter to confirm.
-
If you want to quit the system, simply type q and press enter to confirm.
-
The sample output will be a table with the searching results which has 6 columns: Rank, DocID, ReviewerID, asin, Snippets, and Score.
- Open the Jupyter notebook "Recommender System (Collaborative Filtering System).ipynb".
- Run the code cells from top to bottom.
- You can adjust the number in the first [] of "sorted_processed_reviewText" in code block 10 and 11 to change products.
- The output result of our summarizer is below code block 19.
- The outputs of code blocks 12-16 are of baseline models. From the top to the bottom is TextRank, YAKE!, TfIdf, and TopicRank.
- Open the Jupyter notebook Recommender System (Collaborative Filtering System).ipynb.
- Run the code block from top to bottom.
- In the code block 19, the outputs show the sample results of a test SVD model and its RMSE value. In the code block 21, you can adjust the number in [] to get a product ID in the output, change the i in code block 22 according the obtained product ID, and run below code blocks, the top 10 recommended product will be shown in code block 24.