Scrapy

A Simple Web Scrapping Script For Wuzzuf.com Using Selenium as part of the AI-Pro internship at ITI

Requirements

Python 3 (Preferably newer than 3.6)
Selenium pip install selenium
Compatible Browser Driver

Scraped Data Headers and Representations

Header Item	Value Type	Representation	Notes
Job Title	String	The title of the required job
Company	String	The company offering the job	Could be `Confidential` meaning they chose not to be public
Company Address	String	The address of the company
Posting Time	Date/Time	The time at which the job was posted to Wuzzuf	The time (Hour:Minutes) is not precise unless the job is posted is less than 24 hours
Job Type(s)	String	Full Time, Part Time, Work From Home, etc.	One job can have multiple types separated with a pipe character '\|'
Career Level	String	Experienced, Entry Level, etc.
Years of Experience	String	The required years of experience to apply	Can be NULL
Industries	String	The field to which the job relates	One job can relate to multiple industries separated with a pipe character '\|'
Skill(s)	String	The set of skills required to apply for the job	One job usually have multiple skills separated with a hyphen character '-'
Link	String	The link to the job posting details

Instructions

Download and extract the app files: Scrapy.py, __init__.py, scrapper_config.json
Download the respective driver for your browser's type and version. Currently only these three are implemented:

Browser	Driver Link
Chrome	https://sites.google.com/chromium.org/driver/
Firefox	https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/
Edge	https://github.com/mozilla/geckodriver/releases

Note: If you are using Edge driver, you need to install Edge Selenium Tools as well pip install msedge-selenium-tools
Place the driver file (without changing its original name) in the same directory as the Scrapy.py file.
Edit the scrapper_config.json file as needed:

Variable	Options	Meaning
`browser`	Chrome	If you are using Chrome driver
	Firefox	If you are using Firefox driver
	Edge	If you are using Edge driver
`headless`	0	Run the browser in normal mode (Browser window will show up)
`headless`	1	Run the browser in headless mode (Browser window won't show up)
`datetime_format`	%d: day, %m: month, %Y: year, %H: hour, %M: minutes	You can choose any combination of these elements to customize the output datetime in the scrapped data
`Windows`	0	If you are using an operating system other than Windows
`Windows`	1	If you are using Windows operating system

Now you are ready, open the Terminal / CMD in the folder containing Scrapy and the driver files.
Run the following command python3 Scrapy.py
Enter the search query of the job role you want to scrape
Enter the number of pages you want to scrape
Wait for the scrapped data, a CSV file will be created in ./Wuzzuf Scraped Data/ directory

Test Environment

Tested on Ubuntu 20.04 using:

Python 3.9.7
Firefox 96.0
Chrome 96.0.4664.110

Further features might be added in the future...

Disclaimer

This script is for educational purposes only and I am not responsible for misuse.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Output_Samples		Output_Samples
LICENSE		LICENSE
README.md		README.md
Scrapy.py		Scrapy.py
__init__.py		__init__.py
scrapper_config.json		scrapper_config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scrapy

Table of Contents

Requirements

Scraped Data Headers and Representations

Instructions

Test Environment

Further features might be added in the future...

Disclaimer

About

Releases

Packages

Languages

License

AbdElrahman-A-Eid/Scrapy

Folders and files

Latest commit

History

Repository files navigation

Scrapy

Table of Contents

Requirements

Scraped Data Headers and Representations

Instructions

Test Environment

Further features might be added in the future...

Disclaimer

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages