GitHub - njraladdin/job-listings-scraper-scheduler: A Python-based tool that scrapes multiple Swedish job listings sites, processes the data, updates Airtable, and reruns on a schedule. it also regularly updates the status of expiring jobs to ensure the database remains up to date.

Scheduled Job Listings Scraper and Airtable Updater

A Python-based tool that scrapes multiple Swedish job listings sites, processes the data, updates Airtable, and reruns on a schedule. it also regularly updates the status of expiring jobs to ensure the database remains up to date.

Features

Scraping: Extracts job data from multiple job sites.
Processing: Organizes and saves the scraped data into CSV files.
Updating: Uploads the new job data to Airtable.
Scheduling: Automates the entire process at regular intervals, repeating the scraping, processing, and updating steps.
Maintenance: Manages expired job statuses.

Quick Start

Clone the repository:

git clone https://github.com/aladynjr/scheduled-job-listings-scraper-airtable-updater.git

Install dependencies:
```
pip install -r requirements.txt
```
Configure environment variables in a .env file (see .env.example). Ensure you set up your Airtable API key, base ID, and table IDs.
Run the script:
```
./start.sh
```

Output

Generates CSV files in the data directory:

site_<site_number>_scraped_data.csv: Job data for each site
site_<site_number>_new_jobs.csv: New job data for Airtable

Project Structure

scheduled-job-listings-scraper-airtable-updater/
│
├── config.py
├── main.py
├── requirements.txt
├── .env.example
│
├── tasks/
│   ├── scrape_all_sites_task.py
│   └── update_expired_jobs_task.py
│
├── scrapers/
│   ├── job_site_1_scraper.py
│   ├── job_site_2_scraper.py
│   ├── job_site_3_scraper.py
│   ├── job_site_4_scraper.py
│   └── job_site_5_scraper.py
│
├── utils/
│   └── update_airtable_with_csv.py
│
└── start.sh

Technologies

Python
Pandas
Schedule
Airtable API

Disclaimer

This tool is for demonstration purposes only. The data scraped is publicly available, and the usage of this tool should comply with each job site's terms of service. Ensure you have the necessary permissions to scrape and use the data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scheduled Job Listings Scraper and Airtable Updater

Features

Quick Start

Output

Project Structure

Technologies

Disclaimer

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
images		images
scrapers		scrapers
tasks		tasks
utils		utils
.gitignore		.gitignore
README.md		README.md
config.py		config.py
example.env		example.env
main.py		main.py
requirements.txt		requirements.txt
start.sh		start.sh

njraladdin/job-listings-scraper-scheduler

Folders and files

Latest commit

History

Repository files navigation

Scheduled Job Listings Scraper and Airtable Updater

Features

Quick Start

Output

Project Structure

Technologies

Disclaimer

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages