Skip to content

Latest commit

 

History

History
97 lines (51 loc) · 2.09 KB

File metadata and controls

97 lines (51 loc) · 2.09 KB

This repository is separated into four parts:

1. Data Preprocessing

./crawl.py

Output: ./input/raw_*

crawl ticker list and basic stock from http://www.nasdaq.com/screening/company-list.aspx

crawl financial statement from http://financials.morningstar.com/ratios/r.html?t=BIDU&region=USA&culture=en_US

crawl historical stock prices from https://finance.yahoo.com/

2. Feature engineering

./feature engineering.py

Input: ./input/feature_projection
Output: ./input/feature_label_*, selected_feature_*

2.1 Feature format

Format basic information to the (sample_n, feature_m) matrix

	[ feature1_sample_1, feature2_sample_1, ... feature_m_sample_1]

	...

	[ feature1_sample_m, feature2_sample_m, ... feature_m_sample_n]

Before filter, a sample has feature dimension 80 * 11 (880 financial ratios)

2.2 Missing value filter

Delele the feature if it has more than N% missing values (we can set N as 1, 5, 10)

2.3 feature time-horizon completeness check

Since we need a complete time window to shift, if it doesn't have full 11 data, delete all of this

type of feature, such P/E or Asset turnover

2.4 Feature date filter

Some stock may only have data from 2005 - 2014

Some starts at April 2007, we regard them as 2006

3. Data accuracy check

./dataCheck.py

4. Stock classification based on financial statements

./learning.py

Input: ./input/feature_label_*
Output: ./output/result_*, ./output/tickers_*

4.1 Train the model

Take financial ratios (2006 - Dec.2014) to train the model

Increase training sample by using 2006-2011, 2007-2012, 2008-2013, 2009-2014, 2010-2015

Label based on Sortino ratio

4.2 Predict data and make comparison

In summary, gradient boosting gives us the best performance (highest precision of label 1 in average)

5. Optimize the best weight for your portfolio

./mean_variance_optimization.py

Do a brute-force method to randomly pick 15 stocks from the stock sets and implement mean-variance portfolio with no short constraint

Delelte the portfolio with the worst CVAR