Skip to content

Portfolio performance compatible scraper for hungarian instruments

License

Notifications You must be signed in to change notification settings

havasd/pp-scraper

Repository files navigation

pp-scraper

Portfolio performance compatible scraper for hungarian instruments.

Scraping is done every weekday around 18:00 UTC time as a batch job for all data sources and the results are uploaded to pp-data repository.

Primarily features:

  • daily price data scraping in a json format per instrument for the below data sources
  • optional historical price generation

Implemented Spiders

Data source name Spider name Notes
Alfa alfa_nyugdij Can scrape historical data
Allianz allianz_nyugdij Can scrape historical data
Aranykor aranykor Scrapes historical data
Bamosz bamosz Supports historical scraping with splash
Budapest budapest_nyugdij Can scrape historical data, scrapes VPF and PPF funds
Erste erste_nyugdij Can scrape historical data from hand-crafted csv
Honved honved_nyugdij Can scrape historical data
Horizont horizont_nyugdij Can scrape historical data
MÁK mak Scrapes only latest data
MÁK mak_historical Scrapes historical data from PDF report generator endpoint for a given time range. It uses tesseract OCR for extracting data from the PDF files. Best effort, the OCR makes some mistakes in certain cases for parsing tables
MBH mbh_nyugdij Can scrape historical data
OTP otp_nyugdij Can scrape historical data
Pannónia pannonia_nyugdij Can scrape historical data
Szövetség szovetseg_nyugdij Scrapes historical data from excel

Installation

For local execution you need to install the following packages.

Ubuntu

  1. Install sudo apt-get install python3 python3-dev python3-pip libxml2-dev libxslt1-dev zlib1g-dev libffi-dev libssl-dev python3-venv docker.io tesseract-ocr
  2. pip install -r requirements.txt

About

Portfolio performance compatible scraper for hungarian instruments

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages