http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML17_2.html
深度学习(DL, Deep Learning)
每个 "logistic regression"看成一个 “Neuron”,多个 “Neuron” 组成Neural Network 神经网络。
1958:Perceptron感知器(linear model)
1969:Perceptron has limitation
1980s:Multi-layer perceptron
[Do not have significant difference from Deep Neural Networks(DNN) today]
1986:Backpropagation
[Usually more than 3 hidden layers is not helpful]
1989:1 hidden layer is “good enough”. why deep?
[突破:改了个名字 “深度学习”。]
2006:RBM(Restricted Boltzmann Machine) initialization_Geoffrey E. Hinton
[后面证明帮助不大,但重要的是再次引发人们的研究兴趣]
2009:GPU
2011:Start to be popular in speech recognition
2012:win ILSVRC image competition
sigmoid function ——> Activation Function
fully connected Feedforward network This is a function(Input vector, output vector)
Matrix Operation
You need to decide the network structure to let a good function in your function set.(层数,每层的个数和用什么激活函数)
special structure:
- Convolutional Neural Network (CNN)
Backpropagation
Chain Rule
Forward pass:
Backward pass:
Keras
用Keras 就像是在搭积木。
Tips in DNN
层数越多training data不一定会更好,所有首先在training data上得到好的结果。
Do not always blame Overfitting
到底要在training data好还是testing data好。
Good Results on Training Data? No
1new activation function
Vanishing Gradient Problem
[sigmoid——>
ReLU(Rectified Linear Unit),ReLU-variant(Leaky ReLU, Parametric ReLU, ELU),
Maxout[Learnable activation function, ReLU is a special cases of Maxout]]
2adaptive learning rate
[Adagrad——>RMSProp,
Momentum
Adam(RMSProp + Momentum)]
Good Results on Testing Data? No
1Early Stopping
2Regularization
3Dropout[Dropout is a kind of ensemble]
CNN for 计算机视觉
Network 的架构是可以设计的。
CNN 是 fully connected network 的简化版(参数减少)。
[使用 CNN 处理图像 Image 的 3个理由:
1 A neuron does not have to see the whole image to discover the pattern. Connecting to small region with less parameters.
2The same patterns appear in different regions.
3Subsampling the pixels will not change the object.]
1、 2——》Convolution
3 ——》Max Pooling
Input, Convolution(layer), Max Pooling(layer), Convolution(layer), Max Pooling(layer),...,Convolution(layer), Max Pooling(layer), Flatten, Fully Connected Feedforward network, Output.
Property1 平移不变性
Property2 模型的空间层次结构
CNN – Convolution卷积运算
Filter过滤器(另一张较小的图)
[大小 ,数值;
stride步幅;
Feature Map特征图]
Convolution v.s. Fully Connected
注意:Each filter is a channel.
注意:输出的宽度和高度可能与输入的宽度和高度不停[边界效应和填充;步幅]
CNN – Max Pooling最大池化运算
分组,运算(另一张更小的图)
注意:最大池化不是实现这种采样的唯一方式,可以在前一个卷积层中使用步幅来实现;也可以使用平均池化来代替最大池化。
注意:卷积通常使用 3×3 窗口和步幅 1 ;最大池化通常使用 2×2 窗口和步幅 2 。
分析 CNN(Filter) 的结果:
1First Convolution Layer[Typical-looking filters on the trained first layer];How about higher layers?[Which images make a specific neuron activate]
2What does CNN learn?[Degree of the activation of the k-th filter,]
小型数据集的图像分类问题
- 从头开始训练一个小型模型
- 使用预训练的网络做特征提取
- 对预训练的网络进行微调
RNN for 文本和序列
生成式深度学习