This is a project about my data science course, the main idea is a AI model through fuse the stock news title and past stock price to predict the stock price in the future.
It is important to define an evaluation for various imputation of missing values. We know that the amount of missing data will be
Thus the Loss function could be:
$\widetilde{Loss} = \frac{\widehat{Loss}}{stock_{avg}} = \frac{\sum^{D}{i}{SD_i-SDM_i}}{D\cdot p \cdot stock{avg}}$
Compared the methods of interpolation methods: interp1d, UnivariateSpline, Rbf, make_interp_spline, and LinearRegression
Loss value in 100 day | Loss value in 1000 day | |
---|---|---|
Figure |
Mask Value | Risk Value | |
---|---|---|
Figure |
Finally, the loss of using interp1d to interpolate these stock is fair:
method | meta | goog | amzn | nflx | aapl |
---|---|---|---|---|---|
interp1d | 1.462% | 1.261% | 1.636% | 1.817% | 1.351% |
2 months stock | 1.5 years stock | |
---|---|---|
Figure |
After we draw the price in 2 months and 1.5 years, we found that every stock has a big correlation with others. For example, META(FB) and GOOG(google) in recent 2 months totally have the same rate of ups and downs. It's intuitive, because there are same type of the company. So they face the same marketing problems.Another example is in 1.5 years figure, the would ups and downs in same time.
Mask ratio & Mask loss | Visualize random mask of different mask ratio | |
---|---|---|
Figure |
|
Figure | |
---|---|
META | |
AMZN | |
AAPL | |
NFLX | |
GOOGL |
Figure | |
---|---|
Positive News of FAANG | |
Negative News of FAANG | |
Neutral News of FAANG |
Negative News distribution | Postive News distribution | |
---|---|---|
Figure |
500 days correlation | 100 days correlation | |
---|---|---|
Figure |
|
$python correlation_gif.py
This source code is based on FFN, Numpy, Pandas. Thanks for their wonderful works.