Python utilities to predict future performance of upcoming IPO (Initial Public Offering).
Checkout the accompanying paper for more details.
This project is a collection of datasets and Python code to perform Text Mining on raw SEC S-1 filings.
The goal of this project is to apply Text Mining tools and techniques to spot investment opportunities in upcoming IPO. The system is comprised of three main modules. The first module is responsible for IPO data retrieval via EDGAR SEC system. The second module is responsible for Text Mining. The third module is a classifier of upcoming IPO performance.
Jupyter Notebooks are available for data retrieval, summarization, keywords extraction and Machine Learning.
Start by running all cells in the following notebooks:
- S-1 Downloader.ipynb - Download raw IPO data.
- Performance Downloader.ipynb - Download historical performance from Yahoo Finance.
- Summarizer.ipynb - Summarize raw S-1 filings.
- Keywords Extractor.ipynb - Extract keywords from S-1 filings.
Then run all cells in the following notebooks:
- 1 Baseline.ipynb - Transform raw IPO listings.
- 2 Sentiment Analysis.ipynb - Add Sentiment Analysis features.
- 3 Summarization.ipynb - Add summarization features.
- 4 Keywords.ipynb - Add keywords analysis.
Making predictions:
- Run all cells in Predictor.ipynb - Get upcoming IPO and predict performance.
This project is intended for traders and researchers as potential fork for alpha generation.
- Notebooks - Python scripts and Jupyter notebooks.
- Data - Raw S-1 SEC filings since 2000. Sample filings are provided.
- Datasets - CSV files used for training and evaluating Machine Learning models.
- Keywords - Top keywords for S-1 SEC filings.
- Summary - Summarized S-1 SEC filings.