"When I study `Residual Network`, it has made enormous confusion for me, therefore it is needed to make a memo post for later review as well as beginners' `skip connection`."

Basic idea
- (wikipedia quots)A residual neural network(ResNet) is an artificial neural network of a kind that builds on constructs known from pyramidal cells(锥体细胞 ) in the cerebral cortex(大脑皮层).
  The Residual neural networks do this by utilizing so called skip connections or short-cuts to jump over some layers in order to avoid the problem of vanishing gradients and training degradation.
- There are two types of Neural networks: Plain Networks and Deeper Networks.
  - The Plain Networks often contain layers at most smaller than 25 with accuracy roughly as good as it should be, i.e. the result is not too bad and there is going to more improvement.
  - The Deeper Networks often contain layers more than 25+ and people often think more deeper layer more better the accuracy, in reality it is often wrong because the deeper layers is the higher risk of vanishing or exploding gradient will be. Even if you add the regularization to save the whole network from it, there is also things called degradation
    - Vanishing or exploding gradient(it is trivial to explain it so we omit this part)
    - Degradation - This problem has been observed while training deeper neural networks, as we increase the network depth, accuracy gets saturated which is expected as more complex layers of the network to model all the intricacies of the data. Overtime, there will come a time, as we increase the layers of the network further(after the saturation region), the accuracy of the network dropped. We can think of this happened due to overfitting, but actually it is not, additional layers in a deep model lead to higher training errors(training not testing)
      
      image.png
      
      It is hard to grasp at the beginning since the common sense of training a neural network is to make deeper layers in order to achieve a higher accuracy, we often think more input more better result.
- So in order to avoid all those type of problems, people find out residual network which has been proved a good solution for deeper neural networks(25+ layers).
  
  comparison between plain and resnet
How
- Intuition behind ResNet
  - What is residual - A residual is the error in a result, for example, find out someone's age, if the actual age is 20 and you guessed 18, 2 is off from the right answer and it is the residua. In essence, residual is what you should have added to your prediction to match the actual data. It is important to realize that when residual is 0, we don't do anything since the prediction already matches the actual data.
  image.png
  
  In the diagram, x is our prediction and we want it to be equal to the Actual. However, if is it off by a margin, our residual function residual() will kick in and produce the residual of the operation so as to correct our prediction to match the actual. If x == Actual, residual(x) will be 0. The Identity function just copies x.
- How ResNet works
  - We want to go deeper without degradation in accuracy and error rate. We can do this via injecting identity mappings.
  - We want to be able to learn the residuals so that our predictions are close to the actuals.
  - Shortcut connections are those skipping one or more layers. In our case, the shortcut connections simply perform identity mapping, and their outputs are added to the outputs of the stacked layers. Identity shortcut connections add neither extra parameter nor computational complexity. The entire network can still be trained end-to-end by SGD with backpropagation, and can be easily implemented using common libraries without modifying the solvers.
    
    image.png
    
    H(x) = F(x) + x, where F(x) = W2 * relu(W1 * x + b1) + b2
    
    During training period, the residual network learns the weights of its layers such that if the identity mapping were optimal, all the weights get set to 0. In effect F(x) become 0, as in x gets directly mapped to H(x) and no corrections need to be made. Hence these become your identity mappings which help grow the network deep. And if there is a deviation from optimal identity mapping, weights and biases of F(x) are learned to adjust for it. Think of F(x) as learning how to adjust our predictions to match the actuals.

Conclusion

Deep residual networks works well due to the flow of information from the very first layer to the last layer of the network. By formulating residual functions as identity mappings, information is able to flow unimpeded throughout the entire network. This allows any layer to be represented as a function of the original input. Using pre-activation resnets by placing batch normalization and relu before the convolution, the output of the addition becomes the output of the layer, this achieves the identity effect we desire.

image.png

PS-1

image.png

PS-2

Take a close look of the residual block

image.png

image.png

The main take away here is to make the a[l+2] == relu(a[l]),
therefore, the gradients at every single layer could be computed with the original input taking into consideration. Given the above equation, when G and H are identity functions, information would always flow unimpeded and gradients would never vanish no matter how deep we go.

PS-3 ResNet code example

#import needed classes
import keras
from keras.datasets import cifar10
from keras.layers import Dense,Conv2D,MaxPooling2D,Flatten,AveragePooling2D,Dropout,BatchNormalization,Activation
from keras.models import Model,Input
from keras.optimizers import Adam
from keras.callbacks import LearningRateScheduler
from keras.callbacks import ModelCheckpoint
from math import ceil
import os
from keras.preprocessing.image import ImageDataGenerator


def Unit(x,filters,pool=False):
    res = x
    if pool:
        x = MaxPooling2D(pool_size=(2, 2))(x)
        res = Conv2D(filters=filters,kernel_size=[1,1],strides=(2,2),padding="same")(res)
    out = BatchNormalization()(x)
    out = Activation("relu")(out)
    out = Conv2D(filters=filters, kernel_size=[3, 3], strides=[1, 1], padding="same")(out)

    out = BatchNormalization()(out)
    out = Activation("relu")(out)
    out = Conv2D(filters=filters, kernel_size=[3, 3], strides=[1, 1], padding="same")(out)

    out = keras.layers.add([res,out])

    return out

#Define the model


def MiniModel(input_shape):
    images = Input(input_shape)
    net = Conv2D(filters=32, kernel_size=[3, 3], strides=[1, 1], padding="same")(images)
    net = Unit(net,32)
    net = Unit(net,32)
    net = Unit(net,32)

    net = Unit(net,64,pool=True)
    net = Unit(net,64)
    net = Unit(net,64)

    net = Unit(net,128,pool=True)
    net = Unit(net,128)
    net = Unit(net,128)

    net = Unit(net, 256,pool=True)
    net = Unit(net, 256)
    net = Unit(net, 256)

    net = BatchNormalization()(net)
    net = Activation("relu")(net)
    net = Dropout(0.25)(net)

    net = AveragePooling2D(pool_size=(4,4))(net)
    net = Flatten()(net)
    net = Dense(units=10,activation="softmax")(net)

    model = Model(inputs=images,outputs=net)

    return model

#load the cifar10 dataset
(train_x, train_y) , (test_x, test_y) = cifar10.load_data()

#normalize the data
train_x = train_x.astype('float32') / 255
test_x = test_x.astype('float32') / 255

#Subtract the mean image from both train and test set
train_x = train_x - train_x.mean()
test_x = test_x - test_x.mean()

#Divide by the standard deviation
train_x = train_x / train_x.std(axis=0)
test_x = test_x / test_x.std(axis=0)

# Generate batches of tensor image data with real-time data augmentation. 
# The data will be looped over (in batches).
datagen = ImageDataGenerator(rotation_range=10,
                             width_shift_range=5. / 32,
                             height_shift_range=5. / 32,
                             horizontal_flip=True)

# Compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied).
datagen.fit(train_x)



#Encode the labels to vectors
train_y = keras.utils.to_categorical(train_y,10)
test_y = keras.utils.to_categorical(test_y,10)

#define a common unit


input_shape = (32,32,3)
model = MiniModel(input_shape)

#Print a Summary of the model

model.summary()
#Specify the training components
model.compile(optimizer=Adam(0.001),loss="categorical_crossentropy",metrics=["accuracy"])



epochs = 50
steps_per_epoch = ceil(50000/128)

# Fit the model on the batches generated by datagen.flow().
model.fit_generator(datagen.flow(train_x, train_y, batch_size=128),
                    validation_data=[test_x,test_y],
                    epochs=epochs,steps_per_epoch=steps_per_epoch, verbose=1, workers=4)


#Evaluate the accuracy of the test dataset
accuracy = model.evaluate(x=test_x,y=test_y,batch_size=128)
model.save("cifar10model.h5")

running result

<pre style="box-sizing: border-box; overflow: auto; font-family: monospace; font-size: inherit; display: block; padding: 1px 0px; margin: 0px; line-height: inherit; word-break: break-all; overflow-wrap: break-word; color: black; background-color: transparent; border: 0px; border-radius: 0px; white-space: pre-wrap; vertical-align: baseline;">Using TensorFlow backend.
</pre>

<pre style="box-sizing: border-box; overflow: auto; font-family: monospace; font-size: inherit; display: block; padding: 1px 0px; margin: 0px; line-height: inherit; word-break: break-all; overflow-wrap: break-word; color: black; background-color: transparent; border: 0px; border-radius: 0px; white-space: pre-wrap; vertical-align: baseline;">Downloading data from [https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz](https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz)
170500096/170498071 [==============================] - 42s 0us/step
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 32, 32, 3)    0                                            
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 32, 32, 32)   896         input_1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 32, 32, 32)   128         conv2d_1[0][0]                   
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 32, 32, 32)   0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 32, 32, 32)   9248        activation_1[0][0]               
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 32, 32, 32)   128         conv2d_2[0][0]                   
__________________________________________________________________________________________________
activation_2 (Activation)       (None, 32, 32, 32)   0           batch_normalization_2[0][0]      
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 32, 32, 32)   9248        activation_2[0][0]               
__________________________________________________________________________________________________
add_1 (Add)                     (None, 32, 32, 32)   0           conv2d_1[0][0]                   
                                                                 conv2d_3[0][0]                   
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 32, 32, 32)   128         add_1[0][0]                      
__________________________________________________________________________________________________
activation_3 (Activation)       (None, 32, 32, 32)   0           batch_normalization_3[0][0]      
__________________________________________________________________________________________________
conv2d_4 (Conv2D)               (None, 32, 32, 32)   9248        activation_3[0][0]               
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 32, 32, 32)   128         conv2d_4[0][0]                   
__________________________________________________________________________________________________
activation_4 (Activation)       (None, 32, 32, 32)   0           batch_normalization_4[0][0]      
__________________________________________________________________________________________________
conv2d_5 (Conv2D)               (None, 32, 32, 32)   9248        activation_4[0][0]               
__________________________________________________________________________________________________
add_2 (Add)                     (None, 32, 32, 32)   0           add_1[0][0]                      
                                                                 conv2d_5[0][0]                   
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 32, 32, 32)   128         add_2[0][0]                      
__________________________________________________________________________________________________
activation_5 (Activation)       (None, 32, 32, 32)   0           batch_normalization_5[0][0]      
__________________________________________________________________________________________________
conv2d_6 (Conv2D)               (None, 32, 32, 32)   9248        activation_5[0][0]               
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, 32, 32, 32)   128         conv2d_6[0][0]                   
__________________________________________________________________________________________________
activation_6 (Activation)       (None, 32, 32, 32)   0           batch_normalization_6[0][0]      
__________________________________________________________________________________________________
conv2d_7 (Conv2D)               (None, 32, 32, 32)   9248        activation_6[0][0]               
__________________________________________________________________________________________________
add_3 (Add)                     (None, 32, 32, 32)   0           add_2[0][0]                      
                                                                 conv2d_7[0][0]                   
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 16, 16, 32)   0           add_3[0][0]                      
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, 16, 16, 32)   128         max_pooling2d_1[0][0]            
__________________________________________________________________________________________________
activation_7 (Activation)       (None, 16, 16, 32)   0           batch_normalization_7[0][0]      
__________________________________________________________________________________________________
conv2d_9 (Conv2D)               (None, 16, 16, 64)   18496       activation_7[0][0]               
__________________________________________________________________________________________________
batch_normalization_8 (BatchNor (None, 16, 16, 64)   256         conv2d_9[0][0]                   
__________________________________________________________________________________________________
activation_8 (Activation)       (None, 16, 16, 64)   0           batch_normalization_8[0][0]      
__________________________________________________________________________________________________
conv2d_8 (Conv2D)               (None, 16, 16, 64)   2112        add_3[0][0]                      
__________________________________________________________________________________________________
conv2d_10 (Conv2D)              (None, 16, 16, 64)   36928       activation_8[0][0]               
__________________________________________________________________________________________________
add_4 (Add)                     (None, 16, 16, 64)   0           conv2d_8[0][0]                   
                                                                 conv2d_10[0][0]                  
__________________________________________________________________________________________________
batch_normalization_9 (BatchNor (None, 16, 16, 64)   256         add_4[0][0]                      
__________________________________________________________________________________________________
activation_9 (Activation)       (None, 16, 16, 64)   0           batch_normalization_9[0][0]      
__________________________________________________________________________________________________
conv2d_11 (Conv2D)              (None, 16, 16, 64)   36928       activation_9[0][0]               
__________________________________________________________________________________________________
batch_normalization_10 (BatchNo (None, 16, 16, 64)   256         conv2d_11[0][0]                  
__________________________________________________________________________________________________
activation_10 (Activation)      (None, 16, 16, 64)   0           batch_normalization_10[0][0]     
__________________________________________________________________________________________________
conv2d_12 (Conv2D)              (None, 16, 16, 64)   36928       activation_10[0][0]              
__________________________________________________________________________________________________
add_5 (Add)                     (None, 16, 16, 64)   0           add_4[0][0]                      
                                                                 conv2d_12[0][0]                  
__________________________________________________________________________________________________
batch_normalization_11 (BatchNo (None, 16, 16, 64)   256         add_5[0][0]                      
__________________________________________________________________________________________________
activation_11 (Activation)      (None, 16, 16, 64)   0           batch_normalization_11[0][0]     
__________________________________________________________________________________________________
conv2d_13 (Conv2D)              (None, 16, 16, 64)   36928       activation_11[0][0]              
__________________________________________________________________________________________________
batch_normalization_12 (BatchNo (None, 16, 16, 64)   256         conv2d_13[0][0]                  
__________________________________________________________________________________________________
activation_12 (Activation)      (None, 16, 16, 64)   0           batch_normalization_12[0][0]     
__________________________________________________________________________________________________
conv2d_14 (Conv2D)              (None, 16, 16, 64)   36928       activation_12[0][0]              
__________________________________________________________________________________________________
add_6 (Add)                     (None, 16, 16, 64)   0           add_5[0][0]                      
                                                                 conv2d_14[0][0]                  
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)  (None, 8, 8, 64)     0           add_6[0][0]                      
__________________________________________________________________________________________________
batch_normalization_13 (BatchNo (None, 8, 8, 64)     256         max_pooling2d_2[0][0]            
__________________________________________________________________________________________________
activation_13 (Activation)      (None, 8, 8, 64)     0           batch_normalization_13[0][0]     
__________________________________________________________________________________________________
conv2d_16 (Conv2D)              (None, 8, 8, 128)    73856       activation_13[0][0]              
__________________________________________________________________________________________________
batch_normalization_14 (BatchNo (None, 8, 8, 128)    512         conv2d_16[0][0]                  
__________________________________________________________________________________________________
activation_14 (Activation)      (None, 8, 8, 128)    0           batch_normalization_14[0][0]     
__________________________________________________________________________________________________
conv2d_15 (Conv2D)              (None, 8, 8, 128)    8320        add_6[0][0]                      
__________________________________________________________________________________________________
conv2d_17 (Conv2D)              (None, 8, 8, 128)    147584      activation_14[0][0]              
__________________________________________________________________________________________________
add_7 (Add)                     (None, 8, 8, 128)    0           conv2d_15[0][0]                  
                                                                 conv2d_17[0][0]                  
__________________________________________________________________________________________________
batch_normalization_15 (BatchNo (None, 8, 8, 128)    512         add_7[0][0]                      
__________________________________________________________________________________________________
activation_15 (Activation)      (None, 8, 8, 128)    0           batch_normalization_15[0][0]     
__________________________________________________________________________________________________
conv2d_18 (Conv2D)              (None, 8, 8, 128)    147584      activation_15[0][0]              
__________________________________________________________________________________________________
batch_normalization_16 (BatchNo (None, 8, 8, 128)    512         conv2d_18[0][0]                  
__________________________________________________________________________________________________
activation_16 (Activation)      (None, 8, 8, 128)    0           batch_normalization_16[0][0]     
__________________________________________________________________________________________________
conv2d_19 (Conv2D)              (None, 8, 8, 128)    147584      activation_16[0][0]              
__________________________________________________________________________________________________
add_8 (Add)                     (None, 8, 8, 128)    0           add_7[0][0]                      
                                                                 conv2d_19[0][0]                  
__________________________________________________________________________________________________
batch_normalization_17 (BatchNo (None, 8, 8, 128)    512         add_8[0][0]                      
__________________________________________________________________________________________________
activation_17 (Activation)      (None, 8, 8, 128)    0           batch_normalization_17[0][0]     
__________________________________________________________________________________________________
conv2d_20 (Conv2D)              (None, 8, 8, 128)    147584      activation_17[0][0]              
__________________________________________________________________________________________________
batch_normalization_18 (BatchNo (None, 8, 8, 128)    512         conv2d_20[0][0]                  
__________________________________________________________________________________________________
activation_18 (Activation)      (None, 8, 8, 128)    0           batch_normalization_18[0][0]     
__________________________________________________________________________________________________
conv2d_21 (Conv2D)              (None, 8, 8, 128)    147584      activation_18[0][0]              
__________________________________________________________________________________________________
add_9 (Add)                     (None, 8, 8, 128)    0           add_8[0][0]                      
                                                                 conv2d_21[0][0]                  
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D)  (None, 4, 4, 128)    0           add_9[0][0]                      
__________________________________________________________________________________________________
batch_normalization_19 (BatchNo (None, 4, 4, 128)    512         max_pooling2d_3[0][0]            
__________________________________________________________________________________________________
activation_19 (Activation)      (None, 4, 4, 128)    0           batch_normalization_19[0][0]     
__________________________________________________________________________________________________
conv2d_23 (Conv2D)              (None, 4, 4, 256)    295168      activation_19[0][0]              
__________________________________________________________________________________________________
batch_normalization_20 (BatchNo (None, 4, 4, 256)    1024        conv2d_23[0][0]                  
__________________________________________________________________________________________________
activation_20 (Activation)      (None, 4, 4, 256)    0           batch_normalization_20[0][0]     
__________________________________________________________________________________________________
conv2d_22 (Conv2D)              (None, 4, 4, 256)    33024       add_9[0][0]                      
__________________________________________________________________________________________________
conv2d_24 (Conv2D)              (None, 4, 4, 256)    590080      activation_20[0][0]              
__________________________________________________________________________________________________
add_10 (Add)                    (None, 4, 4, 256)    0           conv2d_22[0][0]                  
                                                                 conv2d_24[0][0]                  
__________________________________________________________________________________________________
batch_normalization_21 (BatchNo (None, 4, 4, 256)    1024        add_10[0][0]                     
__________________________________________________________________________________________________
activation_21 (Activation)      (None, 4, 4, 256)    0           batch_normalization_21[0][0]     
__________________________________________________________________________________________________
conv2d_25 (Conv2D)              (None, 4, 4, 256)    590080      activation_21[0][0]              
__________________________________________________________________________________________________
batch_normalization_22 (BatchNo (None, 4, 4, 256)    1024        conv2d_25[0][0]                  
__________________________________________________________________________________________________
activation_22 (Activation)      (None, 4, 4, 256)    0           batch_normalization_22[0][0]     
__________________________________________________________________________________________________
conv2d_26 (Conv2D)              (None, 4, 4, 256)    590080      activation_22[0][0]              
__________________________________________________________________________________________________
add_11 (Add)                    (None, 4, 4, 256)    0           add_10[0][0]                     
                                                                 conv2d_26[0][0]                  
__________________________________________________________________________________________________
batch_normalization_23 (BatchNo (None, 4, 4, 256)    1024        add_11[0][0]                     
__________________________________________________________________________________________________
activation_23 (Activation)      (None, 4, 4, 256)    0           batch_normalization_23[0][0]     
__________________________________________________________________________________________________
conv2d_27 (Conv2D)              (None, 4, 4, 256)    590080      activation_23[0][0]              
__________________________________________________________________________________________________
batch_normalization_24 (BatchNo (None, 4, 4, 256)    1024        conv2d_27[0][0]                  
__________________________________________________________________________________________________
activation_24 (Activation)      (None, 4, 4, 256)    0           batch_normalization_24[0][0]     
__________________________________________________________________________________________________
conv2d_28 (Conv2D)              (None, 4, 4, 256)    590080      activation_24[0][0]              
__________________________________________________________________________________________________
add_12 (Add)                    (None, 4, 4, 256)    0           add_11[0][0]                     
                                                                 conv2d_28[0][0]                  
__________________________________________________________________________________________________
batch_normalization_25 (BatchNo (None, 4, 4, 256)    1024        add_12[0][0]                     
__________________________________________________________________________________________________
activation_25 (Activation)      (None, 4, 4, 256)    0           batch_normalization_25[0][0]     
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, 4, 4, 256)    0           activation_25[0][0]              
__________________________________________________________________________________________________
average_pooling2d_1 (AveragePoo (None, 1, 1, 256)    0           dropout_1[0][0]                  
__________________________________________________________________________________________________
flatten_1 (Flatten)             (None, 256)          0           average_pooling2d_1[0][0]        
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 10)           2570        flatten_1[0][0]                  
==================================================================================================
Total params: 4,374,538
Trainable params: 4,368,714
Non-trainable params: 5,824
__________________________________________________________________________________________________
</pre>

<pre style="box-sizing: border-box; overflow: auto; font-family: monospace; font-size: inherit; display: block; padding: 1px 0px; margin: 0px; line-height: inherit; word-break: break-all; overflow-wrap: break-word; color: black; background-color: transparent; border: 0px; border-radius: 0px; white-space: pre-wrap; vertical-align: baseline;">Epoch 1/50
391/391 [==============================] - 27s 68ms/step - loss: 1.2885 - acc: 0.5326 - val_loss: 1.6630 - val_acc: 0.4961
Epoch 2/50
391/391 [==============================] - 21s 53ms/step - loss: 0.8541 - acc: 0.7001 - val_loss: 1.0465 - val_acc: 0.6674
Epoch 3/50
391/391 [==============================] - 21s 54ms/step - loss: 0.6907 - acc: 0.7593 - val_loss: 0.9077 - val_acc: 0.7053
Epoch 4/50
391/391 [==============================] - 22s 56ms/step - loss: 0.6064 - acc: 0.7902 - val_loss: 0.6870 - val_acc: 0.7732
Epoch 5/50
391/391 [==============================] - 21s 53ms/step - loss: 0.5409 - acc: 0.8119 - val_loss: 0.6286 - val_acc: 0.7820
Epoch 6/50
391/391 [==============================] - 20s 52ms/step - loss: 0.4976 - acc: 0.8276 - val_loss: 0.6467 - val_acc: 0.7915
Epoch 7/50
391/391 [==============================] - 21s 53ms/step - loss: 0.4554 - acc: 0.8428 - val_loss: 0.7318 - val_acc: 0.7812
Epoch 8/50
391/391 [==============================] - 21s 54ms/step - loss: 0.4276 - acc: 0.8515 - val_loss: 0.5955 - val_acc: 0.8024
Epoch 9/50
391/391 [==============================] - 20s 51ms/step - loss: 0.4037 - acc: 0.8592 - val_loss: 0.7164 - val_acc: 0.7742
Epoch 10/50
391/391 [==============================] - 20s 52ms/step - loss: 0.3785 - acc: 0.8691 - val_loss: 0.5306 - val_acc: 0.8272
Epoch 11/50
391/391 [==============================] - 20s 51ms/step - loss: 0.3606 - acc: 0.8747 - val_loss: 0.6534 - val_acc: 0.8090
Epoch 12/50
391/391 [==============================] - 20s 51ms/step - loss: 0.3378 - acc: 0.8816 - val_loss: 0.4706 - val_acc: 0.8475
Epoch 13/50
391/391 [==============================] - 20s 51ms/step - loss: 0.3182 - acc: 0.8888 - val_loss: 0.4721 - val_acc: 0.8438
Epoch 14/50
391/391 [==============================] - 21s 54ms/step - loss: 0.3070 - acc: 0.8941 - val_loss: 0.5304 - val_acc: 0.8327
Epoch 15/50
391/391 [==============================] - 20s 52ms/step - loss: 0.2959 - acc: 0.8972 - val_loss: 0.5714 - val_acc: 0.8310
Epoch 16/50
391/391 [==============================] - 22s 56ms/step - loss: 0.2757 - acc: 0.9032 - val_loss: 0.5431 - val_acc: 0.8413
Epoch 17/50
391/391 [==============================] - 21s 53ms/step - loss: 0.2722 - acc: 0.9045 - val_loss: 0.5690 - val_acc: 0.8257
Epoch 18/50
391/391 [==============================] - 21s 54ms/step - loss: 0.2542 - acc: 0.9105 - val_loss: 0.5157 - val_acc: 0.8502
Epoch 19/50
391/391 [==============================] - 20s 52ms/step - loss: 0.2447 - acc: 0.9150 - val_loss: 0.4588 - val_acc: 0.8625
Epoch 20/50
391/391 [==============================] - 21s 53ms/step - loss: 0.2299 - acc: 0.9180 - val_loss: 0.5702 - val_acc: 0.8410
Epoch 21/50
391/391 [==============================] - 20s 51ms/step - loss: 0.2238 - acc: 0.9207 - val_loss: 0.5116 - val_acc: 0.8418
Epoch 22/50
391/391 [==============================] - 20s 52ms/step - loss: 0.2201 - acc: 0.9242 - val_loss: 0.4404 - val_acc: 0.8655
Epoch 23/50
391/391 [==============================] - 21s 53ms/step - loss: 0.2071 - acc: 0.9270 - val_loss: 0.3913 - val_acc: 0.8784
Epoch 24/50
391/391 [==============================] - 21s 55ms/step - loss: 0.2007 - acc: 0.9300 - val_loss: 0.4831 - val_acc: 0.8581
Epoch 25/50
391/391 [==============================] - 20s 52ms/step - loss: 0.1993 - acc: 0.9298 - val_loss: 0.4367 - val_acc: 0.8684
Epoch 26/50
391/391 [==============================] - 24s 61ms/step - loss: 0.1902 - acc: 0.9327 - val_loss: 0.3972 - val_acc: 0.8818
Epoch 27/50
391/391 [==============================] - 25s 64ms/step - loss: 0.1804 - acc: 0.9355 - val_loss: 0.4377 - val_acc: 0.8714
Epoch 28/50
391/391 [==============================] - 24s 62ms/step - loss: 0.1751 - acc: 0.9396 - val_loss: 0.4713 - val_acc: 0.8644
Epoch 29/50
391/391 [==============================] - 23s 60ms/step - loss: 0.1686 - acc: 0.9399 - val_loss: 0.4441 - val_acc: 0.8689
Epoch 30/50
391/391 [==============================] - 22s 57ms/step - loss: 0.1619 - acc: 0.9436 - val_loss: 0.5143 - val_acc: 0.8729
Epoch 31/50
391/391 [==============================] - 21s 55ms/step - loss: 0.1562 - acc: 0.9439 - val_loss: 0.4043 - val_acc: 0.8834
Epoch 32/50
391/391 [==============================] - 22s 56ms/step - loss: 0.1512 - acc: 0.9463 - val_loss: 0.3830 - val_acc: 0.8895
Epoch 33/50
391/391 [==============================] - 21s 54ms/step - loss: 0.1456 - acc: 0.9482 - val_loss: 0.3707 - val_acc: 0.8900
Epoch 34/50
391/391 [==============================] - 23s 58ms/step - loss: 0.1415 - acc: 0.9498 - val_loss: 0.4362 - val_acc: 0.8788
Epoch 35/50
391/391 [==============================] - 22s 56ms/step - loss: 0.1423 - acc: 0.9501 - val_loss: 0.4081 - val_acc: 0.8881
Epoch 36/50
391/391 [==============================] - 22s 56ms/step - loss: 0.1350 - acc: 0.9523 - val_loss: 0.4355 - val_acc: 0.8809
Epoch 37/50
391/391 [==============================] - 22s 55ms/step - loss: 0.1343 - acc: 0.9526 - val_loss: 0.4465 - val_acc: 0.8825
Epoch 38/50
391/391 [==============================] - 22s 57ms/step - loss: 0.1314 - acc: 0.9526 - val_loss: 0.3857 - val_acc: 0.8941
Epoch 39/50
391/391 [==============================] - 22s 57ms/step - loss: 0.1207 - acc: 0.9574 - val_loss: 0.5319 - val_acc: 0.8636
Epoch 40/50
391/391 [==============================] - 21s 55ms/step - loss: 0.1206 - acc: 0.9569 - val_loss: 0.4038 - val_acc: 0.8907
Epoch 41/50
391/391 [==============================] - 22s 55ms/step - loss: 0.1191 - acc: 0.9578 - val_loss: 0.3672 - val_acc: 0.8963
Epoch 42/50
391/391 [==============================] - 21s 54ms/step - loss: 0.1148 - acc: 0.9596 - val_loss: 0.4449 - val_acc: 0.8819
Epoch 43/50
391/391 [==============================] - 21s 54ms/step - loss: 0.1116 - acc: 0.9591 - val_loss: 0.4252 - val_acc: 0.8844
Epoch 44/50
391/391 [==============================] - 22s 55ms/step - loss: 0.1097 - acc: 0.9612 - val_loss: 0.5019 - val_acc: 0.8774
Epoch 45/50
391/391 [==============================] - 22s 55ms/step - loss: 0.1066 - acc: 0.9619 - val_loss: 0.4458 - val_acc: 0.8822
Epoch 46/50
391/391 [==============================] - 22s 56ms/step - loss: 0.1032 - acc: 0.9634 - val_loss: 0.4647 - val_acc: 0.8833
Epoch 47/50
391/391 [==============================] - 22s 56ms/step - loss: 0.1027 - acc: 0.9634 - val_loss: 0.4329 - val_acc: 0.8845
Epoch 48/50
391/391 [==============================] - 22s 56ms/step - loss: 0.0990 - acc: 0.9644 - val_loss: 0.4254 - val_acc: 0.8880
Epoch 49/50
391/391 [==============================] - 22s 57ms/step - loss: 0.0935 - acc: 0.9676 - val_loss: 0.4516 - val_acc: 0.8850
Epoch 50/50
391/391 [==============================] - 22s 55ms/step - loss: 0.0969 - acc: 0.9660 - val_loss: 0.3984 - val_acc: 0.8995
10000/10000 [==============================] - 1s 143us/step</pre>

Decode "Residual Network"???

Decode "Residual Network"???

"When I study `Residual Network`, it has made enormous confusion for me, therefore it is needed to make a memo post for later review as well as beginners' `skip connection`."

Conclusion

PS-1

PS-2

PS-3 ResNet code example

推荐阅读更多精彩内容

Decode "Residual Network"???

"When I study Residual Network, it has made enormous confusion for me, therefore it is needed to make a memo post for later review as well as beginners' skip connection."

Conclusion

PS-1

PS-2

PS-3 ResNet code example

推荐阅读更多精彩内容

"When I study `Residual Network`, it has made enormous confusion for me, therefore it is needed to make a memo post for later review as well as beginners' `skip connection`."