我试图实现一个使用backpropagation进行训练的神经网络 . 正如 Headers 所说,我在Python 3.6中做到了,并试图让它学习XOR功能 .

当我运行代码并测试网络时,输出大约为0.5,无论输入是什么,所以显然在训练过程中出现了错误 .

输出似乎受学习速率常数和动量常数的影响,但我尝试了大量不同的组合和值,我认为合理,但没有成功 .

如果有人能帮助我找出问题所在,我将非常感激 .

如果您想尝试代码,我认为如果您可以使用Spyder更容易,因为您将获得所有变量的完整列表 .

这是代码:

首先,我创建一个包含所有可能输入的数组,以及一个包含这些输入答案的数组 .

import numpy as np
import matplotlib.pyplot as plt

Error = []
OutArray = []

Xtrain = np.array([
        [0,0], 
        [0,1], 
        [1,0], 
        [1,1]
    ])

Ttrain = np.array([0,1,1,0])

然后我创建变量“biasNr”,它表示每层有多少偏差 . 变量“Layers”定义每层有多少神经元 .

所以这里我们有2个输入,因为“Xtrain”,但也有输入层的偏差和隐藏层中的偏差 .

同时,隐藏层中有2个神经元,输出层中有一个神经元 .

这是模块化的,所以如果我们想要隐藏层中的更多神经元,我们可以写:“Layers = np.array([3,2,1])”和“biasNr = np.array([1,0, 1])

N = len(Xtrain)
D = 0

biasNr = np.array([1,1])

Layers = np.array([2,1])

L = len(Layers)

然后我连接输入层的偏差数,随机初始化权重和权重的增量,我用它来训练网络时增加动量 .

ones = np.array([[1]*N]).T
for i in range(biasNr[0]):
    Xtrain = np.concatenate((Xtrain,ones), axis = 1)

for i in range(N):
    if D < len(Xtrain[i]):
        D = len(Xtrain[i])

w = [0] * L
dw = [0] * L

for j in range(L):
    w[j] = []
    dw[j] = []
    for i in range(Layers[j]):
        if j == 0:
            Rw = np.random.uniform(-1, 1, D)
        else:
            Rw = np.random.uniform(-1, 1, Layers[j-1] + biasNr[j])

        dRw = Rw - Rw*0.01

        if i == 0:
            w[j] = Rw
            dw[j] = dRw
        else:
            w[j] = np.vstack((w[j], Rw))
            dw[j] = np.vstack((dw[j], dRw))

然后我们有sigmoid函数:

def sigmodal(Y):
    return 1 / (1 + np.exp(-Y))

以及单个神经元的功能:

def nevron(inputs, weights):
    Y = inputs.dot(weights)
    Y = sigmodal(Y)

    return Y

整个网络的功能 . “输入”以3x1向量形式出现,而“层”表示每层有多少神经元,“偏置表示每层有多少偏差 . ”权重“作为MxN矩阵列表出现,其中每个MxN矩阵包含每层之间的权重 .

def MLP(inputs, layers, bias, weights):
    Y = 0
    S = []

    for j in range(len(layers)):
        R = []

        for i in range(layers[j]):
            Y = 0

            if j == 0: # if 1

                if layers[j] == 1:
                    Y = nevron(inputs, weights[j])
                else:
                    Y = nevron(inputs, weights[j][i])

            else: # else 1

                if layers[j] == 1:
                    Y = nevron(S[j-1], weights[j])
                else:
                    Y = nevron(S[j-1], weights[j][i])

            Yj = np.array([Y])

            if i == 0:
                R = np.array([Y])
            else:
                R = np.concatenate((R, Yj), axis=0)


        S.append(R)

        if j < len(bias)-1:
            for i in range(bias[j+1]):
                S[j] = np.concatenate((S[j], np.array([1])), axis=0)

    return S

然后我创建一个列表,其中包含每个神经元的所有输出,名为“sumN”:

sumN = []

培训网络的功能 .

def trainMLP(trainIn, trainOut, layers, bias, learning_rate, momentum, cycles):
    global w
    global dw
    global sumN
    repeatLR = 1

    delta = []

    for x in range(len(layers)):
        delta.append(np.array([0.0]*layers[x]))

    for j in range(cycles): # Loop 1

        if j > (cycles * 0.9) and repeatLR == 1:
            learning_rate = learning_rate * 0.1
            repeatLR = 0

        for i in range(N): # Loop 2
            sumN = []
            Y = 0

            sumN = MLP(trainIn[i], layers, bias, w)
            Y = sumN[len(layers)-1]

            OutArray.append(Y)

            if j % (cycles/10) == 0:
                print(str(j) + "," + str(i), "In:", trainIn[i], "Out: ", Y)

                if i == 3:
                    print("===============================================")

            for h in range(len(layers)-1, -1, -1): # Loop 3

                for g in range(layers[h]): # Loop 4
                    sigDer = sigmodal(sumN[h][g])*(1 - sigmodal(sumN[h][g]))

                    if h == (len(layers)-1):
                        delta[h] = (trainOut[i] - Y) * sigDer

                        dw[h] = learning_rate * delta[h] * sumN[h-1] + momentum * dw[h]
                        Error.append(trainOut[i] - Y)
                    else:

                        delA = np.ndarray((len(delta[h+1]), 1), buffer = delta[h+1], dtype = float)

                        if layers[h] == 1:
                            wA = np.ndarray((len(w[h]), 1), buffer = w[h],dtype = float)
                        else:
                            wA = np.ndarray((len(w[h][g]), 1), buffer = w[h][g],dtype = float)

                        delta[h] = delA.dot(wA.T) * sigDer

                        dw[h][g] = learning_rate * delta[h] * sumN[h-1] + momentum * dw[h][g]

                    if layers[h] == 1:
                        w[h] = w[h] + dw[h]
                    else:
                        w[h][g] = w[h][g] + dw[h][g]

测试网络的功能:

def testMLP(trainIn, weights):
    print("------------------------------------")
    for i in range(N):
        Y = MLP(trainIn[i], Layers, biasNr, w)
        Y0 = Y[len(Y)-1]

        print(str(i), "In:", trainIn[i], "Out: ", Y0)

    print("------------------------------------")

然后我调用主要功能来完成这个过程 . 学习率= 0.1,而动量= 0.5:

trainMLP(Xtrain, Ttrain, Layers, biasNr, 0.1, 0.5, 10000)

testMLP(Xtrain, w)

plt.plot(Error)
plt.plot(OutArray)

编辑:

这是我通常得到的输出 . 这是在10'000 epocs之后:

0 In: [0 0 1] Out:  [ 0.38433476]
1 In: [0 1 1] Out:  [ 0.38330449]
2 In: [1 0 1] Out:  [ 0.70006104]
3 In: [1 1 1] Out:  [ 0.52599719]

我想要的输出,因为我正在学习XOR功能 . 当然它不会完全是0或1,但如果值接近0或1,它应该更近,更明显:

0 In: [0 0 1] Out:  [ 0.0]
1 In: [0 1 1] Out:  [ 1.0]
2 In: [1 0 1] Out:  [ 1.0]
3 In: [1 1 1] Out:  [ 0.0]