A Simple Web Scrapping Script For Wuzzuf.com Using Selenium as part of the AI-Pro internship at ITI
- Python 3 (Preferably newer than 3.6)
- Selenium
pip install selenium
- Compatible Browser Driver
Header Item | Value Type | Representation | Notes |
---|---|---|---|
Job Title | String | The title of the required job | |
Company | String | The company offering the job | Could be Confidential meaning they chose not to be public |
Company Address | String | The address of the company | |
Posting Time | Date/Time | The time at which the job was posted to Wuzzuf | The time (Hour:Minutes) is not precise unless the job is posted is less than 24 hours |
Job Type(s) | String | Full Time, Part Time, Work From Home, etc. | One job can have multiple types separated with a pipe character '|' |
Career Level | String | Experienced, Entry Level, etc. | |
Years of Experience | String | The required years of experience to apply | Can be NULL |
Industries | String | The field to which the job relates | One job can relate to multiple industries separated with a pipe character '|' |
Skill(s) | String | The set of skills required to apply for the job | One job usually have multiple skills separated with a hyphen character '-' |
Link | String | The link to the job posting details |
- Download and extract the app files:
Scrapy.py
,__init__.py
,scrapper_config.json
- Download the respective driver for your browser's type and version. Currently only these three are implemented:
Browser | Driver Link |
---|---|
Chrome | https://sites.google.com/chromium.org/driver/ |
Firefox | https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/ |
Edge | https://github.com/mozilla/geckodriver/releases |
- Note: If you are using Edge driver, you need to install Edge Selenium Tools as well
pip install msedge-selenium-tools
- Place the driver file (without changing its original name) in the same directory as the Scrapy.py file.
- Edit the
scrapper_config.json
file as needed:
Variable | Options | Meaning |
---|---|---|
browser |
Chrome | If you are using Chrome driver |
Firefox | If you are using Firefox driver | |
Edge | If you are using Edge driver | |
headless |
0 | Run the browser in normal mode (Browser window will show up) |
1 | Run the browser in headless mode (Browser window won't show up) | |
datetime_format |
%d: day, %m: month, %Y: year, %H: hour, %M: minutes |
You can choose any combination of these elements to customize the output datetime in the scrapped data |
Windows |
0 | If you are using an operating system other than Windows |
1 | If you are using Windows operating system |
- Now you are ready, open the Terminal / CMD in the folder containing Scrapy and the driver files.
- Run the following command
python3 Scrapy.py
- Enter the search query of the job role you want to scrape
- Enter the number of pages you want to scrape
- Wait for the scrapped data, a CSV file will be created in
./Wuzzuf Scraped Data/
directory
Tested on Ubuntu 20.04 using:
- Python 3.9.7
- Firefox 96.0
- Chrome 96.0.4664.110
This script is for educational purposes only and I am not responsible for misuse.