- Deep Learning cheatsheet
⟶
深度學習參考手冊
- Neural Networks
⟶
神經網路
- Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.
⟶
神經網路是一種透過 layer 來建構的模型。經常被使用的神經網路模型包括了卷積神經網路 (CNN) 和遞迴式神經網路 (RNN)。
- Architecture ― The vocabulary around neural networks architectures is described in the figure below:
⟶
架構 - 神經網路架構所需要用到的詞彙描述如下:
- [Input layer, hidden layer, output layer]
⟶
[輸入層、隱藏層、輸出層]
- By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:
⟶
我們使用 i 來代表網路的第 i 層、j 來代表某一層中第 j 個隱藏神經元的話,我們可以得到下面得等式:
- where we note w, b, z the weight, bias and output respectively.
⟶
其中,我們分別使用 w 來代表權重、b 代表偏差項、z 代表輸出的結果。
- Activation function ― Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:
⟶
Activation function - Activation function 是為了在每一層尾端的神經元帶入非線性轉換而設計的。底下是一些常見 Activation function:
- [Sigmoid, Tanh, ReLU, Leaky ReLU]
⟶
[Sigmoid, Tanh, ReLU, Leaky ReLU]
- Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:
⟶
交叉熵損失函式
- Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.
⟶
學習速率 - 學習速率通常用 α 或 η 來表示,目的是用來控制權重更新的速度。學習速度可以是一個固定值,或是隨著訓練的過程改變。現在最熱門的最佳化方法叫作 Adam,是一種隨著訓練過程改變學習速率的最佳化方法。
- Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form:
⟶
反向傳播演算法 - 反向傳播演算法是一種在神經網路中用來更新權重的方法,更新的基準是根據神經網路的實際輸出值和期望輸出值之間的關係。權重的導數是根據連鎖律 (chain rule) 來計算,通常會表示成下面的形式:
- As a result, the weight is updated as follows:
⟶
因此,權重會透過以下的方式來更新:
- Updating weights ― In a neural network, weights are updated as follows:
⟶
更新權重 - 在神經網路中,權重的更新會透過以下步驟進行:
- Step 1: Take a batch of training data.
⟶
步驟一:取出一個批次 (batch) 的資料
- Step 2: Perform forward propagation to obtain the corresponding loss.
⟶
步驟二:執行前向傳播演算法 (forward propagation) 來得到對應的損失值
- Step 3: Backpropagate the loss to get the gradients.
⟶
步驟三:將損失值透過反向傳播演算法來得到梯度
- Step 4: Use the gradients to update the weights of the network.
⟶
步驟四:使用梯度來更新網路的權重
- Dropout ― Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1−p
⟶
Dropout - Dropout 是一種透過丟棄一些神經元,來避免過擬和的技巧。在實務上,神經元會透過機率值的設定來決定要丟棄或保留
- Convolutional Neural Networks
⟶
卷積神經網絡
- Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:
⟶
卷積層的需求 - 我們使用 W 來表示輸入資料的維度大小、F 代表卷積層的 filter 尺寸、P 代表對資料墊零 (zero padding) 使資料長度齊一後的長度,S 代表卷積後取出的特徵 stride 數量,則輸出的維度大小可以透過以下的公式表示:
- Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:
⟶
批次正規化 (Batch normalization) - 它是一個藉由 γ,β 兩個超參數來正規化每個批次 {xi} 的過程。每一次正規化的過程,我們使用 μB,σ2B 分別代表平均數和變異數。請參考以下公式:
- It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.
⟶
批次正規化的動作通常在全連接層/卷積層之後、在非線性層之前進行。目的在於接納更高的學習速率,並且減少該批次學習初期對取樣資料特徵的依賴性。
- Recurrent Neural Networks
⟶
遞歸神經網路 (RNN)
- Types of gates ― Here are the different types of gates that we encounter in a typical recurrent neural network:
⟶
閘的種類 - 在傳統的遞歸神經網路中,你會遇到幾種閘:
- [Input gate, forget gate, gate, output gate]
⟶
輸入閘、遺忘閥、閘、輸出閘
- [Write to cell or not?, Erase a cell or not?, How much to write to cell?, How much to reveal cell?]
⟶
要不要將資料寫入到記憶區塊中?要不要將存在在記憶區塊中的資料清除?要寫多少資料到記憶區塊?要不要將資料從記憶區塊中取出?
- LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.
⟶
長短期記憶模型 - 長短期記憶模型是一種遞歸神經網路,藉由導入遺忘閘的設計來避免梯度消失的問題
- Reinforcement Learning and Control
⟶
強化學習及控制
- The goal of reinforcement learning is for an agent to learn how to evolve in an environment.
⟶
強化學習的目標就是為了讓代理 (agent) 能夠學習在環境中進化
- Definitions
⟶
定義
- Markov decision processes ― A Markov decision process (MDP) is a 5-tuple (S,A,{Psa},γ,R) where:
⟶
馬可夫決策過程 - 一個馬可夫決策過程 (MDP) 包含了五個元素:
- S is the set of states
⟶
S 是一組狀態的集合
- A is the set of actions
⟶
A 是一組行為的集合
- {Psa} are the state transition probabilities for s∈S and a∈A
⟶
{Psa} 指的是,當 s∈S、a∈A 時,狀態轉移的機率
- γ∈[0,1[ is the discount factor
⟶
γ∈[0,1[ 是衰減係數
- R:S×A⟶R or R:S⟶R is the reward function that the algorithm wants to maximize
⟶
R:S×A⟶R 或 R:S⟶R 指的是獎勵函數,也就是演算法想要去最大化的目標函數
- Policy ― A policy π is a function π:S⟶A that maps states to actions.
⟶
策略 - 一個策略 π 指的是一個函數 π:S⟶A,這個函數會將狀態映射到行為
- Remark: we say that we execute a given policy π if given a state a we take the action a=π(s).
⟶
注意:我們會說,我們給定一個策略 π,當我們給定一個狀態 s 我們會採取一個行動 a=π(s)
- Value function ― For a given policy π and a given state s, we define the value function Vπ as follows:
⟶
價值函數 - 給定一個策略 π 和狀態 s,我們定義價值函數 Vπ 為:
- Bellman equation ― The optimal Bellman equations characterizes the value function Vπ∗ of the optimal policy π∗:
⟶
貝爾曼方程 - 最佳的貝爾曼方程是將價值函數 Vπ∗ 和策略 π∗ 表示為:
- Remark: we note that the optimal policy π∗ for a given state s is such that:
⟶
注意:對於給定一個狀態 s,最佳的策略 π∗ 是:
- Value iteration algorithm ― The value iteration algorithm is in two steps:
⟶
價值迭代演算法 - 價值迭代演算法包含兩個步驟:
- 1) We initialize the value:
⟶
- 針對價值初始化:
- 2) We iterate the value based on the values before:
⟶
根據之前的值,迭代此價值的值:
- Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:
⟶
最大概似估計 - 針對狀態轉移機率的最大概似估計為:
- times took action a in state s and got to s′
⟶
從狀態 s 到 s′ 所採取行為的次數
- times took action a in state s
⟶
從狀態 s 所採取行為的次數
- Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:
⟶ Q-learning 演算法 - Q-learning 演算法是針對 Q 的一個 model-free 的估計,如下:
- View PDF version on GitHub
⟶
前往 GitHub 閱讀 PDF 版本
- [Neural Networks, Architecture, Activation function, Backpropagation, Dropout]
⟶
[神經網路, 架構, Activation function, 反向傳播演算法, Dropout]
- [Convolutional Neural Networks, Convolutional layer, Batch normalization]
⟶
[卷積神經網絡, 卷積層, 批次正規化]
- [Recurrent Neural Networks, Gates, LSTM]
⟶
[遞歸神經網路 (RNN), 閘, 長短期記憶模型]
- [Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]
⟶ [強化學習, 馬可夫決策過程, 價值/策略迭代, 近似動態規劃, 策略搜尋]