This project is a Python-based web scraper that utilizes the Scrapy framework to extract property data from https://www.openrent.co.uk. The script scans all the property pages by using property ID and logs the entries in a local CSV file as well as an online Google Drive sheet using the Google Drive API. It has been deployed on cloud platforms like Railway.
openrent-scrapper.demo1.mp4
To use this scraper, you must have Python 3 installed on your system along with the Scrapy library. You will also need a Google API key for accessing the Google Drive API.
- Clone this repository onto your local system using the following command:
Copy code
git clone https://github.com/[username]/property-scraper.git
- Navigate to the project directory:
Copy code
cd property-scraper
3.Copy your Google API key to the main folder. Set your starting property ID in openrentupdatespider/openrentupdatespider/spiders/openrentupdatespider.py.
Once you have installed the necessary dependencies and set your starting property ID, you can run the scraper locally or push it onto the Railway cloud server.
To run the scraper locally, navigate to the project directory in your terminal and run the following command:
css
Copy code
python main.py
This will start the scraper and log the data in both the local CSV file and the Google Drive sheet.
To deploy the scraper to Railway, first create a new app on the Railway dashboard. Then, follow these steps:
Connect your app to this GitHub repository. Set your Google API key as an environment variable in the Railway dashboard. Set your starting property ID as a Railway environment variable. Deploy the app. Once the app is deployed, you should see the Google Drive sheet entries start increasing as the scraper runs.