This repository contains the source code for a detailed web scraper designed to extract comprehensive data from the PCPartPicker site. It not only retrieves basic data from listings but also delves into each product's specifications page to fetch detailed attributes. The scraper targets various PC component categories such as GPUs, power supplies, cases, CPUs, CPU coolers, memory, storage, and motherboards. Implemented in Node.js, this tool uses the ZenRows API to circumvent site security measures, ensuring efficient data extraction.
The PCPartPicker website employs robust security features that can block typical scraping efforts. To address this, we utilize ZenRows, which renders JavaScript and circumvents basic anti-bot protections. The HTML content fetched is parsed, and data is systematically extracted and stored into CSV files. The setup currently achieves about 90% accuracy in data extraction, proving invaluable for educational and research applications.
- Graphics Cards (GPUs)
- Power Supplies
- Computer Cases
- CPUs
- CPU Coolers
- Memory
- Storage (SSDs and HDDs)
- Motherboards
Each category is processed by its dedicated script that extracts detailed specifications like manufacturer, model, part number, and technical specifications, storing this data in CSV files named accordingly (e.g., gpus_detailed.csv
, cpus_detailed.csv
).
To use the scraper:
- Ensure Node.js is installed on your system.
- Clone this repository.
- Install dependencies with
npm install
. - Configure your ZenRows API key in the scripts.
- Run the script corresponding to the component you wish to scrape, e.g.,
node scrape_cpus.js
.
Data is output in CSV format, with files named according to the component type. Headers in each CSV correspond to the data fields extracted, providing a structured and comprehensive dataset.
This project is strictly for educational purposes, demonstrating advanced web scraping techniques and the handling of anti-scraping technologies. Users should comply with the terms of service of PCPartPicker and use the data responsibly.
Contributions are welcomed, particularly those that enhance scraping efficiency or expand the range of components covered. Feel free to fork the repository and submit pull requests
For detailed instructions on how the scripts operate and setup guidance, refer to the specific script files within this repository.