这几天在review一些代码，手撸了一个softmax捋一捋思路，谈谈如何从无到有构建一个完整的神经网络模型，完整代码请查阅我的Github: https://github.com/LittletreeZou/Python-Projects
新的一年开始了，题主的新年愿望之一就是好好用心经营简书，多学习多写文章多分享ヾ(◍°∇°◍)ﾉﾞ

一、什么是softmax?

在数学，尤其是概率论和相关领域中，softmax函数，是逻辑函数的一种推广。它能将一个含任意实数的K维向量“压缩”到另一个K维实向量中，使得每一个元素的范围都在0-1之间，并且使所有元素的和为1。这样，每个元素就代表了属于某个分类的概率。值越大，概率就越大，属于某一类的可能性就越大。

二、softmax有什么用？

softmax广泛应用于机器学习和深度学习中的多分类问题。在深度学习中，softmax常用于多分类问题最后一层的激活函数，用于输出某样本属于某个分类的概率值。

三、如何实现softmax呢？

本文fashion_minist为例（为什么要用这个数据集而不用那个经典的手写数字识别数据集呢，当然是因为题主的电脑渣呀o(╥﹏╥)o），用Python从头到尾构建一个两层的神经网络多分类模型，预测图片的label。

数据集介绍：fashion_minist是tensorflow里面一个自带数据集，有60000张28x28的带label的训练图片和10000张28x28的测试图片，它的label有10个分类，鞋子裙子啥的，下图显示的是部分训练样本数据，是不是很高清无码！！！

fashion_minist

Softmax模型构建流程：
第一步：明确模型框架：input layer — hidden layer — output layer( activation = softmax)
第二步：参数初始化
第三步：循环：前向传播 - 计算损失 - 后向传播 - 更新参数（梯度下降）

四、先搭框架，再建模块，最后整合

1、搭建框架

模型框架：input layer — hidden layer (128 units, relu） — output layer(10 units, softmax)

2、定义辅助函数

辅助函数较多，不一一展开，具体代码请查阅我的Github。

# 激活函数
relu(Z)    # relu激活函数
softmax(Z)  # softmax激活函数，注意exp(z)容易造成数值上溢

# 参数初始化
initialize_parameters(n_x, n_h, n_y)  

# 前向传播模块
linear_forward(A, W, b) 
linear_activation_forward(A_prev, W, b, activation)

# 计算损失
compute_cost(AL, Y)

# 后向传播模块
linear_backward(dZ, cache)
relu_backward(dA, cache)
softmax_backward(Y, cache)
linear_activation_backward(dA, cache, activation)

# 参数更新
update_parameters(parameters, grads, learning_rate)

3、整合模型

def two_layer_model(X, Y, layers_dims, learning_rate = 0.1, num_iterations = 3000, print_cost=False):
    """
    two-layer neural network: LINEAR->RELU->LINEAR->SOFTMAX.
    
    Arguments:
    X -- input data, of shape (n_x, number of examples)
    Y -- true "label", of shape (classes, number of examples)
    layers_dims -- dimensions of the layers (n_x, n_h, n_y)
    num_iterations -- number of iterations of the optimization loop
    learning_rate -- learning rate of the gradient descent update rule
    print_cost -- If set to True, this will print the cost every 100 iterations 
    
    Returns:
    parameters -- a dictionary containing W1, W2, b1, and b2
    """
    #np.random.seed(1)
    grads = {}
    costs = []                              # to keep track of the cost
    m = X.shape[1]                    # number of examples
    (n_x, n_h, n_y) = layers_dims
    
    # Initialize parameters dictionary
    parameters = initialize_parameters(n_x, n_h, n_y)
    
    # Get W1, b1, W2 and b2 from the dictionary parameters.
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    
    # Loop (gradient descent)

    for i in range(0, num_iterations):

        # Forward propagation: LINEAR -> RELU -> LINEAR -> SOFTMAX. 
        A1, cache1 = linear_activation_forward(X, W1, b1, activation='relu')
        A2, cache2 = linear_activation_forward(A1, W2, b2, activation='softmax')

        # Compute cost
        cost = compute_cost(A2, Y)
        
        # Backward propagation
        dA1, dW2, db2 = linear_activation_backward(Y, cache2, activation='softmax')
        dA0, dW1, db1 = linear_activation_backward(dA1, cache1, activation='relu')
        
        # Set grads['dWl'] to dW1, grads['db1'] to db1, grads['dW2'] to dW2, grads['db2'] to db2
        grads['dW1'] = dW1
        grads['db1'] = db1
        grads['dW2'] = dW2
        grads['db2'] = db2
        
        # Update parameters.
        parameters = update_parameters(parameters, grads, learning_rate)

        # Retrieve W1, b1, W2, b2 from parameters
        W1 = parameters["W1"]
        b1 = parameters["b1"]
        W2 = parameters["W2"]
        b2 = parameters["b2"]
        
        # Print the cost every 100 training example
        if print_cost and i % 100 == 0:
            print("Cost after iteration {}: {}".format(i, np.squeeze(cost)))
        if print_cost and i % 100 == 0:
            costs.append(cost)
       
    # plot the cost

    plt.plot(np.squeeze(costs))
    plt.ylabel('cost')
    plt.xlabel('iterations (per tens)')
    plt.title("Learning rate =" + str(learning_rate))
    plt.show()
    
    return parameters

4、模型测试

# 拿了10000个训练样本来训练模型
train_x = train_set_x[:,0:10000]
train_y = train_labels_onehot[:,0:10000]
parameters = two_layer_model(train_x, train_y, layers_dims = (784, 128, 10), num_iterations = 1000, print_cost=True)

训练样本的performance

从上图可以看出，cost一直在下降，说明模型是work的。

我们再来看看测试集的表现。

def predict_labels(X, y, parameters):
    """
    Arguments:
    X -- data set of examples you would like to label
    parameters -- parameters of the trained model
    
    Returns:
    predict_label -- predictions for the given dataset X
    """
    m = X.shape[1]
    
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    
    # Forward propagation
    A1, _ = linear_activation_forward(X, W1, b1, activation='relu')
    probs, _ = linear_activation_forward(A1, W2, b2, activation='softmax')
    
    # convert probas to 0-9 predictions
    predict_label = np.argmax(probs, axis=0)
    
    print("Accuracy: "  + str(np.sum((predict_label == y)/float(m))))
        
    return predict_label

prediction = predict_labels(test_set_x, test_labels, parameters)
# output: Accuracy: 0.8132000000000001

测试集上的预测准确率为81.3%，说明这个模型的效果还是可以的，至少是有效的。

五、优化方向

上面我们已经构建了一个完整的神经网络模型用以实现有监督的多分类任务。不过，这只是最基本的框架结构，有很多可以优化的地方，主要的三个优化方向是：

参数初始化的方式：这里采用的是随机初始化，可尝试Xavier 初始化或者He初始化。
优化算法：这里采用的是梯度下降算法，可采用SGD、RMSprop、Adam等。
模型参数调优：学习率，隐藏结点个数，神经网络的层数等。

六、Reference

[1] Deep Learning and Neural Network on Coursera 吴恩达

如何用Python实现神经网络softmax多分类模型

如何用Python实现神经网络softmax多分类模型

一、什么是softmax?

二、softmax有什么用？

三、如何实现softmax呢？

四、先搭框架，再建模块，最后整合

1、搭建框架

2、定义辅助函数

3、整合模型

4、模型测试

五、优化方向

六、Reference

如何用Python实现神经网络softmax多分类模型

一、 什么是softmax?

二、softmax有什么用？

三、 如何实现softmax呢？

四、先搭框架，再建模块，最后整合

1、搭建框架

2、定义辅助函数

3、整合模型

4、模型测试

五、优化方向

六、Reference

一、什么是softmax?

三、如何实现softmax呢？