我试图实现一个使用backpropagation进行训练的神经网络 . 正如 Headers 所说,我在Python 3.6中做到了,并试图让它学习XOR功能 .
当我运行代码并测试网络时,输出大约为0.5,无论输入是什么,所以显然在训练过程中出现了错误 .
输出似乎受学习速率常数和动量常数的影响,但我尝试了大量不同的组合和值,我认为合理,但没有成功 .
如果有人能帮助我找出问题所在,我将非常感激 .
如果您想尝试代码,我认为如果您可以使用Spyder更容易,因为您将获得所有变量的完整列表 .
这是代码:
首先,我创建一个包含所有可能输入的数组,以及一个包含这些输入答案的数组 .
import numpy as np
import matplotlib.pyplot as plt
Error = []
OutArray = []
Xtrain = np.array([
[0,0],
[0,1],
[1,0],
[1,1]
])
Ttrain = np.array([0,1,1,0])
然后我创建变量“biasNr”,它表示每层有多少偏差 . 变量“Layers”定义每层有多少神经元 .
所以这里我们有2个输入,因为“Xtrain”,但也有输入层的偏差和隐藏层中的偏差 .
同时,隐藏层中有2个神经元,输出层中有一个神经元 .
这是模块化的,所以如果我们想要隐藏层中的更多神经元,我们可以写:“Layers = np.array([3,2,1])”和“biasNr = np.array([1,0, 1])
N = len(Xtrain)
D = 0
biasNr = np.array([1,1])
Layers = np.array([2,1])
L = len(Layers)
然后我连接输入层的偏差数,随机初始化权重和权重的增量,我用它来训练网络时增加动量 .
ones = np.array([[1]*N]).T
for i in range(biasNr[0]):
Xtrain = np.concatenate((Xtrain,ones), axis = 1)
for i in range(N):
if D < len(Xtrain[i]):
D = len(Xtrain[i])
w = [0] * L
dw = [0] * L
for j in range(L):
w[j] = []
dw[j] = []
for i in range(Layers[j]):
if j == 0:
Rw = np.random.uniform(-1, 1, D)
else:
Rw = np.random.uniform(-1, 1, Layers[j-1] + biasNr[j])
dRw = Rw - Rw*0.01
if i == 0:
w[j] = Rw
dw[j] = dRw
else:
w[j] = np.vstack((w[j], Rw))
dw[j] = np.vstack((dw[j], dRw))
然后我们有sigmoid函数:
def sigmodal(Y):
return 1 / (1 + np.exp(-Y))
以及单个神经元的功能:
def nevron(inputs, weights):
Y = inputs.dot(weights)
Y = sigmodal(Y)
return Y
整个网络的功能 . “输入”以3x1向量形式出现,而“层”表示每层有多少神经元,“偏置表示每层有多少偏差 . ”权重“作为MxN矩阵列表出现,其中每个MxN矩阵包含每层之间的权重 .
def MLP(inputs, layers, bias, weights):
Y = 0
S = []
for j in range(len(layers)):
R = []
for i in range(layers[j]):
Y = 0
if j == 0: # if 1
if layers[j] == 1:
Y = nevron(inputs, weights[j])
else:
Y = nevron(inputs, weights[j][i])
else: # else 1
if layers[j] == 1:
Y = nevron(S[j-1], weights[j])
else:
Y = nevron(S[j-1], weights[j][i])
Yj = np.array([Y])
if i == 0:
R = np.array([Y])
else:
R = np.concatenate((R, Yj), axis=0)
S.append(R)
if j < len(bias)-1:
for i in range(bias[j+1]):
S[j] = np.concatenate((S[j], np.array([1])), axis=0)
return S
然后我创建一个列表,其中包含每个神经元的所有输出,名为“sumN”:
sumN = []
培训网络的功能 .
def trainMLP(trainIn, trainOut, layers, bias, learning_rate, momentum, cycles):
global w
global dw
global sumN
repeatLR = 1
delta = []
for x in range(len(layers)):
delta.append(np.array([0.0]*layers[x]))
for j in range(cycles): # Loop 1
if j > (cycles * 0.9) and repeatLR == 1:
learning_rate = learning_rate * 0.1
repeatLR = 0
for i in range(N): # Loop 2
sumN = []
Y = 0
sumN = MLP(trainIn[i], layers, bias, w)
Y = sumN[len(layers)-1]
OutArray.append(Y)
if j % (cycles/10) == 0:
print(str(j) + "," + str(i), "In:", trainIn[i], "Out: ", Y)
if i == 3:
print("===============================================")
for h in range(len(layers)-1, -1, -1): # Loop 3
for g in range(layers[h]): # Loop 4
sigDer = sigmodal(sumN[h][g])*(1 - sigmodal(sumN[h][g]))
if h == (len(layers)-1):
delta[h] = (trainOut[i] - Y) * sigDer
dw[h] = learning_rate * delta[h] * sumN[h-1] + momentum * dw[h]
Error.append(trainOut[i] - Y)
else:
delA = np.ndarray((len(delta[h+1]), 1), buffer = delta[h+1], dtype = float)
if layers[h] == 1:
wA = np.ndarray((len(w[h]), 1), buffer = w[h],dtype = float)
else:
wA = np.ndarray((len(w[h][g]), 1), buffer = w[h][g],dtype = float)
delta[h] = delA.dot(wA.T) * sigDer
dw[h][g] = learning_rate * delta[h] * sumN[h-1] + momentum * dw[h][g]
if layers[h] == 1:
w[h] = w[h] + dw[h]
else:
w[h][g] = w[h][g] + dw[h][g]
测试网络的功能:
def testMLP(trainIn, weights):
print("------------------------------------")
for i in range(N):
Y = MLP(trainIn[i], Layers, biasNr, w)
Y0 = Y[len(Y)-1]
print(str(i), "In:", trainIn[i], "Out: ", Y0)
print("------------------------------------")
然后我调用主要功能来完成这个过程 . 学习率= 0.1,而动量= 0.5:
trainMLP(Xtrain, Ttrain, Layers, biasNr, 0.1, 0.5, 10000)
testMLP(Xtrain, w)
plt.plot(Error)
plt.plot(OutArray)
编辑:
这是我通常得到的输出 . 这是在10'000 epocs之后:
0 In: [0 0 1] Out: [ 0.38433476]
1 In: [0 1 1] Out: [ 0.38330449]
2 In: [1 0 1] Out: [ 0.70006104]
3 In: [1 1 1] Out: [ 0.52599719]
我想要的输出,因为我正在学习XOR功能 . 当然它不会完全是0或1,但如果值接近0或1,它应该更近,更明显:
0 In: [0 0 1] Out: [ 0.0]
1 In: [0 1 1] Out: [ 1.0]
2 In: [1 0 1] Out: [ 1.0]
3 In: [1 1 1] Out: [ 0.0]