This project is intended to scrape product information and reviews from sephora.ca for some analysis. It uses selenium to scrape data.
File sephora_scrapping.py is the main file for scrapping products information and reviews. In the script, I used chrome browser. To run this code, please download chrome driver to the same directory of sephora_scrapping.py (if you already have Chrome brower). If you prefer to use other browser, please also download the browser driver to the same directory of sephora_scrapping.py.
File sephora_scrapping do the following things:
Given a base link, for example, the base link of all eye cream products in sephora.ca, this part will get product name and product link, and assign an index for this product as following:
index,product_name,product_link
0,OLEHENRIKSEN Banana Bright Eye Crème,https://www.sephora.com/ca/en/product/banana-bright-eye-creme-P426339?icid2=products%20grid:p426339:product
It could also scrape a product's overall information and overall rate:
- product_index
- product_details
- total_reviews_count
- number of 5 star rates
- number of 4 star rates
- number of 3 star rates
- number of 2 star rates
- number of 1 star rates
- overall rate
File sephora_scrapping could scrape the following separate review information:
- review_user_id
- review_user_info
- rate
- review_date
- review_content
- not_helpful
- helpful