Portfolio performance compatible scraper for hungarian instruments.
Scraping is done every weekday around 18:00 UTC time as a batch job for all data sources and the results are uploaded to pp-data repository.
Primarily features:
- daily price data scraping in a json format per instrument for the below data sources
- optional historical price generation
Data source name | Spider name | Notes |
---|---|---|
Alfa | alfa_nyugdij | Can scrape historical data |
Allianz | allianz_nyugdij | Can scrape historical data |
Aranykor | aranykor | Scrapes historical data |
Bamosz | bamosz | Supports historical scraping with splash |
Budapest | budapest_nyugdij | Can scrape historical data, scrapes VPF and PPF funds |
Erste | erste_nyugdij | Can scrape historical data from hand-crafted csv |
Honved | honved_nyugdij | Can scrape historical data |
Horizont | horizont_nyugdij | Can scrape historical data |
MÁK | mak | Scrapes only latest data |
MÁK | mak_historical | Scrapes historical data from PDF report generator endpoint for a given time range. It uses tesseract OCR for extracting data from the PDF files. Best effort, the OCR makes some mistakes in certain cases for parsing tables |
MBH | mbh_nyugdij | Can scrape historical data |
OTP | otp_nyugdij | Can scrape historical data |
Pannónia | pannonia_nyugdij | Can scrape historical data |
Szövetség | szovetseg_nyugdij | Scrapes historical data from excel |
For local execution you need to install the following packages.
- Install
sudo apt-get install python3 python3-dev python3-pip libxml2-dev libxslt1-dev zlib1g-dev libffi-dev libssl-dev python3-venv docker.io tesseract-ocr
pip install -r requirements.txt