Skip to content

Latest commit

 

History

History
11 lines (7 loc) · 2.51 KB

README.md

File metadata and controls

11 lines (7 loc) · 2.51 KB

Deep Reinforcement Learning for Smart Factory Optimization

In this work we describe an approach to using reinforcement learning techniques to optimize manufacturing processes. As a case study, the manufacturing system at the Western Digital Corporation facility in San Jose is used to model the factory system examined here. By first building a simulation of the factory system and then applying reinforcement learning techniques to it, reinforcement algorithms involving q-learning, Deep Q-Networks, Policy Gradients and Policy Gradient Search were developed and implemented on the simulation. Results for these methods are compared.

Simulation

Reinforcement learning algorithms are trained through experience by interacting with an environment and updating the policy in response to reward signals. However it is not feasible to do this training by on the actual factory system so a simulated factory environment was created for the Reinforcement learning algorithms to train on. This simulated environment was created in Python using the simulation package SimPy.

In the simulated environment a python object is maintained for each machine and each cassette of wafers in the factory. The machine objects have methods which correspond to processing wafers on that machine. The machine objects also record the current operational status of the machines including whether or not they are currently processing a part and whether or not the machine is broken. Each machine can only process one cassette of wafers at a time and the processing time for that cassette is determined by the head type and sequence step for that cassette as well as the number of wafers within the cassette. All the experiments done so far assume there is the same number of wafers in each cassette, but in future work this may be generalized to allow for variable numbers of wafers.

The wafer cassette objects represent cassettes of wafers and maintain information about the cassette such as the number of wafers in the cassette, the head type, and the sequence step of the wafers in that cassette. In the simulation the machines are organized into stations which each contain a set of machines which are all capable of performing the same operation. There is a recipe corresponding to each head type which indicates the sequence of stations that the cassette must be processed at in order to complete a cassette of wafers of that head type. Also included in the recipe are parameters which are used to calculate the processing time for each step in the sequence for that head type.