This application uses a deep reinforcement learning model Double Deep Q Network to generate an optimal set of trades that maximizes daily return.
This application takes a model free approach and develops a variation of Deep Q-Learning to estimate the optimal actions of a trader.
The model is a FCN trained using experience replay and Double DQN with input features given by the current state of the limit order book, 33 additional technical indicators, and available execution actions, while the output is the Q-value function estimating the future rewards under an arbitrary action.
We apply the model to ten stocks and observe that it does on occasion outperform the standard benchmark approach on most stocks using the measure of Sharpe Ratio.
Further details regarding the motivation, methods and results of implementation can be found in my presentation here.
- To play interactively with the model, visit the deployed Streamlit app here
- To run it locally:
git clone https://github.com/DeepNeuralAI/RL-DeepQLearning-Trading.git
conda env create -f environment.yml
conda activate deepRLTrader
streamlit run app.py
To train the model, use the following command:
$ python3 train.py --train data/GOOG.csv --valid GOOG_2018.csv --episode-count 50 --window-size 10
To evaluate the given model, use the following command:
$ python3 evaluate.py --eval data/GOOG.csv --model-name GOOG --window-size 10 --verbose True
RL-DeepQLearning-Trading
├── README.md
├── app.py
├── conda.txt
├── data
│ ├── CSV Data
├── evaluate.py
├── how_it_works.py
├── models
│ ├── Models
├── public
│ ├── Media Assets
├── requirements.txt
├── src
│ ├── Helper Methods
└── train.py
Inputs: Price & Volume
The model outputs an optimal set of trades (Buy/Sell/Hold), as observed in the figure below:
Based upon the previous figure, the model calculates the normalized portfolio value for:
- Buy & Hold Strategy: Baseline Model
- Heuristic:
- Buy if price below 2 standard deviations from the simple moving average
- Sell if price above 2 standard deviations from the simple moving average
- Double DQN: The trained policy of the Double Deep Q Network/RL Model
An API call is made to AlphaVantage Stock Time Series Data, specifically TIME_SERIES_DAILY_ADJUSTED
This API returns daily time series (date, daily open, daily high, daily low, daily close, daily volume, daily adjusted close, and split/dividend events) of the global equity specified, covering 20+ years of historical data.
The most recent data point is the prices and volume information of the current trading day, updated realtime.
Technical indicators are derived from fundamental price and volume in the categories of:
- Trend
- Momentum
- Volatility
- Volume
The data has a total of 33 technical features and is then normalized and fed through the Double DQN
The RL agent is trained on 7-10 years of historical data
The RL agent is tested on an unseen set of 1-2 years of price/volume data. In most cases, this would be 2019 price/volume data
- The Agent observes the environment, in the form of a state
- Based on that state, the Agent takes a certain action based upon a policy
- For that given action, and state, the Agent receives a reward from the environment.
- The action mutates the environment to transition to a new state.
- Repeat
Q-learning is a model-free algorithm in RL for the purpose of learning a policy. The policy of an agent is arguably the most important as it is the policy that drives how the agent interacts with its environment. We define the "goodness" of an action by using the mathematical action-value function Q(s,a).
The higher the Q-value, the higher probability that given action a in state s will bring a higher reward r.
We can use a table to store experience tuples, namely a Q-table, to take a discrete input of state s and action a and output an associated Q-value. The one limitation of this method, despite its intuitiveness, is the scalability. With continuous states such as a stock price, the computational space would be inefficient to store n states by m actions. Chess for example would take a 10^120 size states space.
Instead of storing a massive lookup table, this project will approximate Q(s,a) with neural networks, namely a Deep Q Network (DQN)
In 2015, Google DeepMind showed that in stochastic environments, Q-learning and DQN tends to overestimate and learn very poorly. From a high level perspective, these overestimations tend to result from a positive bias due to taking the maximum expected action value.
Hasselt, et.al proposed using a double estimator to construct DQN and showed that the Double DQN (DDQN) converged to a more optimal policy and tended to estimate the true value more closely.
The figure below is the implementation used in this application: