This project was submitted to QHack Open Hackathon 2022. (Team Name: DynamicQuantumWorld)
For full tutorials, visit here
In general markets, the competitive equilibrium, or more generally, Dynamic Stochastic General Equilibrium (DSGE) is characterized by a set of state variables and the consumption and production plans of each agent to maximize the utility. Such utility maximization problem has been traditionally dealt with Lagrangian methods. In this Hackathon project, we demonstrate a quantum approach to solving utility maximization problem. Specifically, we employed Quantum Reinforcement Learning to train the policy that determines the agent's actions.
We first constructed a simplified model of DSGE problem based on the recent work done by researchers at University of College, London [1]. In our model, we assume a single, rational household agent, and a single firm existing in the market. The agent is employed by the firm, and the utility per period is given by
- A single agent and a single firm
- Maximize
$\sum_{t=0}^{T} \beta^t u_t$ , where$u_t=\textrm{ln}(c_{t})-\frac{\theta}{2}n_t^2$ ,$\beta=0.97$ and$T=20$ - Budget Constraints
$b_{t+1}=b_t+w_tn_t-p_tc_t$ ;$b_T=0$ , where$p_{t}=1$ ,$w=1$ or$0.5$
There have been some attempts to solve this type of problem using classical machine learning or numerical analysis, but Quantum approach to such problems is absent. We applied Quantum Reinforcement Learning technique to solve the problem. Before we explain our solution, we first present some preliminaries on Reinforcement Learning necessary to understand our approach.
In Reinforcement Learning problems, there are states, and actions that the agent perform at each time step. The agent is given some rewards upon performing each action. Our goal is to find a set of states and actions that maximize the total rewards over the entire period. To be more mathematically precise, we have to introduce the concept of Markov Decision Process (MDP). A MDP problem is completely characterized by
While the value function is a measure of how preferrable each state is, it's actually more important to identify how preferaable each (state, action) pair is. The measure of such preferrability of (state,action) pair is given by the Q-value function, which is defined as
Machine Learning technique that attempts to systematically learn this Q-value function is known as Q-learning. Q-learning is also a method that we utilized in this hackathon project to solve our problem.
Quantum Reinforcement Learning model is virtually same as its classical counterpart, except for the fact that states are now represented and quantum states. As described in Figure1, there is an agent interacting with the environment, which is characterized by its state. The agent's action influences the environment, and this action is determined by the policy, which is chosen to maximize the expected reward over all times. The Quantum Circuit implementing this framework is shown below.
In the above, the reward is calculated through a proper choice of
As we're solving Quantum Reinforcement Learning problem, it's essential to define (1) rewards, (2) actions, and (3) states. Reward is straightforwardly given by the utility function. Actions, as specified in the problem statement, consist of consumption and the number of hours worked. We're now left with properly defining the states, and encoding these states into our Quantum Circuit. We first note that the agent's state is completely characterized by specifying