为什么我的反向传播不起作用？（尝试从头开始用2层编码神经网络进行数字识别）-Java 学习之路

我的问题是我的NN没有训练，我无法理解为什么 . 有人可以帮忙吗？这是我的代码，如果你不理解，我可以描述更多 . 谢谢！

首先，我导入数字识别数据库并将其拆分以进行训练和测试 .

import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import load_digits
digits = load_digits()
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.25, random_state=0)

我希望我的标签是大小为10的向量，这里，我的标签只是图像所代表的数字 . 所以我必须进行转换，这样我的标签矩阵在与该特征对应的坐标上的每一列上都有一个“1” . 如果特征X_1是“3”，则在矩阵的第1列的第2个坐标上将是“1”并且每个其他坐标将是“0” .

y_train1 = np.zeros((10,y_train.shape[0]))

  for i in range(0,y_train.shape[0]):
     y_train1[ y_train[i]  , i ] = 1    

x_train1 = np.transpose(x_train)

然后我创建激活函数：

def sigmoid(Z):

   A = 1/(1+np.exp(-Z))
   cache = Z

   return A, cache

def relu(Z):

   A = np.maximum(0,Z)

   assert(A.shape == Z.shape)

   cache = Z 
   return A, cache

初始化具有良好尺寸的参数的函数：

def init (n_x, n_h, n_y):

   ##Initialisation aléatoire des poids et biais pour un réseau à 1 couche cachée##
   W1 = np.random.randn(n_h,n_x)*0.01
   b1 = np.zeros((n_h,1))
   W2 = np.random.randn(n_y,n_h)*0.01
   b2 = np.zeros((n_y,1))

   assert(W1.shape == (n_h, n_x))
   assert(b1.shape == (n_h, 1))
   assert(W2.shape == (n_y, n_h))
   assert(b2.shape == (n_y, 1))


   parameters = {"W1": W1,
              "b1": b1,
              "W2": W2,
              "b2": b2}

   return parameters

函数进行线性foward激活：

def propa_avant(A,W,b):
   Z = np.dot(W,A) + b 
   assert(Z.shape == (W.shape[0], A.shape[1]))
   cache = (A,W,b)

   return Z, cache

def propa_avant_activ(A_prev, W, b, activation): 

   if activation =="sigmoid" : 
       Z, linear_cache = propa_avant(A_prev,W,b)
       A, activation_cache = sigmoid(Z)

   if activation =="relu":
       Z, linear_cache = propa_avant(A_prev,W,b)
       A, activation_cache = relu(Z)

   assert (A.shape == (W.shape[0], A_prev.shape[1]))

   cache = (linear_cache, activation_cache)

   return A, cache

交叉熵成本函数：

def fonction_cout(AL,Y): 

   m = Y.shape[1]
   cost = (-1/m)*np.sum(Y*np.log(AL)+(1-Y)*np.log(1-AL))

   cost = np.squeeze(cost)      
   assert(cost.shape == ())

   return cost

激活函数的衍生物：

def relu_backward(dA, cache):

   Z = cache
   dZ = np.array(dA, copy=True) 


   dZ[Z <= 0] = 0

   assert (dZ.shape == Z.shape)

   return dZ

def sigmoid_backward(dA, cache):

   Z = cache

   s = 1/(1+np.exp(-Z))
   dZ = dA * s * (1-s)

   assert (dZ.shape == Z.shape)

   return dZ

执行线性反向激活的函数：

def propa_arriere(dZ, cache):


   A_prev, W, b = cache
   m = A_prev.shape[1]


   dW = (1/2)*np.dot(dZ,A_prev.T)
   db = (1/2)*np.sum(dZ, axis=1, keepdims=True)
   dA_prev = np.dot(W.T,dZ)


   assert (dA_prev.shape == A_prev.shape)
   assert (dW.shape == W.shape)
   assert (db.shape == b.shape)

   return dA_prev, dW, db

def propa_arriere_activ(dA, cache, activation):

   linear_cache, activation_cache = cache

   if activation == "relu":

      dZ = relu_backward(dA, activation_cache)
      dA_prev, dW, db = propa_arriere(dZ, linear_cache)


   elif activation == "sigmoid":

      dZ = sigmoid_backward(dA, activation_cache)
      dA_prev, dW, db = propa_arriere(dZ, linear_cache)


   return dA_prev, dW, db

更新参数的功能：

def update_parameters(parameters, grads, learning_rate):


   L = len(parameters) // 2 


   for i in reversed(range(1, L-1)): 
       parameters["W"+str(i)]=parameters["W"+str(i)] - learning_rate*grads["W"+str(i)]
       parameters["b"+str(i)]=parameters["b"+str(i)] - learning_rate*grads["b"+str(i)]
       parameters["A"+str(i)]=parameters["A"+str(i)] - learning_rate*grads["A"+str(i)]




   return parameters

神经网络功能：

def two_layer_model(X, Y, layers_dims, learning_rate = 0.05, num_iterations = 2000, print_cost=False):



   grads = {}
   costs = []                              # to keep track of the cost
   m = X.shape[1]                           # number of examples
   (n_x, n_h, n_y) = layers_dims


   parameters = init(n_x, n_h, n_y)



   W1 = parameters["W1"]
   b1 = parameters["b1"]
   W2 = parameters["W2"]
   b2 = parameters["b2"]



   for i in range(0, num_iterations):


       A1, cache1 = propa_avant_activ(X, W1, b1, activation = "relu")
       A2, cache2 = propa_avant_activ(A1, W2, b2, activation = "sigmoid")



       cost = fonction_cout(A2, Y)



       dA2 = (1/m)* (- (np.divide(Y, A2) - np.divide(1 - Y, 1 - A2)))


       dA1, dW2, db2 = propa_arriere_activ(dA2, cache2, activation="sigmoid")
       dA0, dW1, db1 = propa_arriere_activ(dA1, cache1, activation="relu")



       grads['dW1'] = dW1
       grads['db1'] = db1
       grads['dW2'] = dW2
       grads['db2'] = db2


       parameters = update_parameters(parameters, grads, learning_rate)



       W1 = parameters["W1"]
       b1 = parameters["b1"]
       W2 = parameters["W2"]
       b2 = parameters["b2"]


       if print_cost and i % 100 == 0:
           print("Cost after iteration {}: {}".format(i, np.squeeze(cost)))
       if print_cost and i % 100 == 0:
           costs.append(cost)



   plt.plot(np.squeeze(costs))
   plt.ylabel('cost')
   plt.xlabel('iterations (per tens)')
   plt.title("Learning rate =" + str(learning_rate))
   plt.show()

   return parameters

然后我计算这行代码来运行一个输入形状为64的神经网络，一个隐藏的形状层20和一个形状10的输出：

parameters = two_layer_model(x_train1, y_train1, layers_dims = (x_train1.shape[0], 20, 10), num_iterations = 2000, print_cost=True)

我得到了这个：

Cost after iteration 0: 6.962808001140989
Cost after iteration 100: 6.962808001140989
Cost after iteration 200: 6.962808001140989
Cost after iteration 300: 6.962808001140989
Cost after iteration 400: 6.962808001140989
Cost after iteration 500: 6.962808001140989
Cost after iteration 600: 6.962808001140989
Cost after iteration 700: 6.962808001140989
Cost after iteration 800: 6.962808001140989
Cost after iteration 900: 6.962808001140989
Cost after iteration 1000: 6.962808001140989
Cost after iteration 1100: 6.962808001140989
Cost after iteration 1200: 6.962808001140989
Cost after iteration 1300: 6.962808001140989
Cost after iteration 1400: 6.962808001140989
Cost after iteration 1500: 6.962808001140989
Cost after iteration 1600: 6.962808001140989
Cost after iteration 1700: 6.962808001140989
Cost after iteration 1800: 6.962808001140989
Cost after iteration 1900: 6.962808001140989

而这张图：

Cost after iteration

为什么我的反向传播不起作用？ （尝试从头开始用2层编码神经网络进行数字识别）

相关问题

为什么我的反向传播不起作用？（尝试从头开始用2层编码神经网络进行数字识别）