首页 文章

MxNet与R:简单的XOR神经网络不学习

提问于
浏览
0

我想尝试MxNet库并构建一个简单的神经网络来学习XOR功能 . 我面临的问题是,模型不是在学习 .

这是完整的脚本:

library(mxnet)

train = matrix(c(0,0,0,
                 0,1,1,
                 1,0,1,
                 1,1,0),
               nrow=4,
               ncol=3,
               byrow=TRUE)

train.x = train[,-3]
train.y = train[,3]

data <- mx.symbol.Variable("data")
fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=2)
act1 <- mx.symbol.Activation(fc1, name="relu1", act_type="relu")
fc2 <- mx.symbol.FullyConnected(act1, name="fc2", num_hidden=3)
act2 <- mx.symbol.Activation(fc2, name="relu2", act_type="relu")
fc3 <- mx.symbol.FullyConnected(act2, name="fc3", num_hidden=1)
softmax <- mx.symbol.SoftmaxOutput(fc3, name="sm")

mx.set.seed(0)
model <- mx.model.FeedForward.create(
  softmax,
  X = t(train.x),
  y = train.y,
  num.round = 10,
  array.layout = "columnmajor",
  learning.rate = 0.01,
  momentum = 0.4,
  eval.metric = mx.metric.accuracy,
  epoch.end.callback = mx.callback.log.train.metric(100))

predict(model,train.x,array.layout="rowmajor")

生成此输出:

Start training with 1 devices
[1] Train-accuracy=NaN
[2] Train-accuracy=0.5
[3] Train-accuracy=0.5
[4] Train-accuracy=0.5
[5] Train-accuracy=0.5
[6] Train-accuracy=0.5
[7] Train-accuracy=0.5
[8] Train-accuracy=0.5
[9] Train-accuracy=0.5
[10] Train-accuracy=0.5

> predict(model,train.x,array.layout="rowmajor")
[,1] [,2] [,3] [,4]
[1,]    1    1    1    1

我该如何使用mxnet来使这个例子工作?

此致,vaka

2 回答

  • 0

    通常,激活层在输入后不会正确,因为一旦完成第一层计算,它就应该被激活 . 您仍然可以使用旧代码模仿XOR函数,但需要进行一些调整:

    • 你需要初始化权重 . 这是深度学习社区中的一个重要讨论,初始权重是最好的,但从我的实践中,Xavier权重运作良好

    • 如果要使用softmax,则需要将最后隐藏的图层单位数量更改为2,因为您有2个类:0和1

    在完成这两件事后,几乎没有一些小的优化,比如删除矩阵的转置,使用以下代码:

    library(mxnet)
    
    train = matrix(c(0,0,0,
                     0,1,1,
                     1,0,1,
                     1,1,0),
                   nrow=4,
                   ncol=3,
                   byrow=TRUE)
    
    train.x = train[,-3]
    train.y = train[,3]
    
    data <- mx.symbol.Variable("data")
    fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=2)
    act1 <- mx.symbol.Activation(fc1, name="relu1", act_type="relu")
    fc2 <- mx.symbol.FullyConnected(act1, name="fc2", num_hidden=3)
    act2 <- mx.symbol.Activation(fc2, name="relu2", act_type="relu")
    fc3 <- mx.symbol.FullyConnected(act2, name="fc3", num_hidden=2)
    softmax <- mx.symbol.Softmax(fc3, name="sm")
    
    mx.set.seed(0)
    model <- mx.model.FeedForward.create(
      softmax,
      X = train.x,
      y = train.y,
      num.round = 50,
      array.layout = "rowmajor",
      learning.rate = 0.1,
      momentum = 0.99,
      eval.metric = mx.metric.accuracy,
      initializer = mx.init.Xavier(rnd_type = "uniform", factor_type = "avg", magnitude = 3),
      epoch.end.callback = mx.callback.log.train.metric(100))
    
    predict(model,train.x,array.layout="rowmajor")
    

    我们得到以下结果:

    Start training with 1 devices
    [1] Train-accuracy=NaN
    [2] Train-accuracy=0.75
    [3] Train-accuracy=0.5
    [4] Train-accuracy=0.5
    [5] Train-accuracy=0.5
    [6] Train-accuracy=0.5
    [7] Train-accuracy=0.5
    [8] Train-accuracy=0.5
    [9] Train-accuracy=0.5
    [10] Train-accuracy=0.75
    [11] Train-accuracy=0.75
    [12] Train-accuracy=0.75
    [13] Train-accuracy=0.75
    [14] Train-accuracy=0.75
    [15] Train-accuracy=0.75
    [16] Train-accuracy=0.75
    [17] Train-accuracy=0.75
    [18] Train-accuracy=0.75
    [19] Train-accuracy=0.75
    [20] Train-accuracy=0.75
    [21] Train-accuracy=0.75
    [22] Train-accuracy=0.5
    [23] Train-accuracy=0.5
    [24] Train-accuracy=0.5
    [25] Train-accuracy=0.75
    [26] Train-accuracy=0.75
    [27] Train-accuracy=0.75
    [28] Train-accuracy=0.75
    [29] Train-accuracy=0.75
    [30] Train-accuracy=0.75
    [31] Train-accuracy=0.75
    [32] Train-accuracy=0.75
    [33] Train-accuracy=0.75
    [34] Train-accuracy=0.75
    [35] Train-accuracy=0.75
    [36] Train-accuracy=0.75
    [37] Train-accuracy=0.75
    [38] Train-accuracy=0.75
    [39] Train-accuracy=1
    [40] Train-accuracy=1
    [41] Train-accuracy=1
    [42] Train-accuracy=1
    [43] Train-accuracy=1
    [44] Train-accuracy=1
    [45] Train-accuracy=1
    [46] Train-accuracy=1
    [47] Train-accuracy=1
    [48] Train-accuracy=1
    [49] Train-accuracy=1
    [50] Train-accuracy=1
    > 
    > predict(model,train.x,array.layout="rowmajor")
              [,1]         [,2]         [,3]         [,4]
    [1,] 0.9107883 2.618128e-06 6.384078e-07 0.9998743534
    [2,] 0.0892117 9.999974e-01 9.999994e-01 0.0001256234
    '''
    

    softmax的输出被解释为“属于一个类的概率” - 它在进行常规数学运算后得到的值不是“0”或“1” . 答案意味着以下内容:

    • 如果"0 and 0":类的概率"0" = 0.9107883,类"1" = 0.0892117,表示它为0

    • 如果"0 and 1":类的概率"0" = 2.618128e-06和类"1" = 9.999974e-01,意味着它是1(概率1高得多)

    • 如果"1 and 0":类的概率"0" = 6.384078e-07和类"1" = 9.999994e-01(概率1高得多)

    • 如果"1 and 1":类的概率"0" = 0.9998743534,类"1" = 0.0001256234,表示它为0 .

  • 1

    好吧,我尝试了一点点,现在我在R中使用了mxnet的XOR工作示例 . 复杂的部分不是mxnet API,而是使用神经网络 .

    所以这是工作的R代码:

    library(mxnet)
    
    train = matrix(c(0,0,0,
                     0,1,1,
                     1,0,1,
                     1,1,0),
                   nrow=4,
                   ncol=3,
                   byrow=TRUE)
    
    train.x = t(train[,-3])
    train.y = t(train[,3])
    
    data <- mx.symbol.Variable("data")
    act0 <- mx.symbol.Activation(data, name="relu1", act_type="relu")
    fc1 <- mx.symbol.FullyConnected(act0, name="fc1", num_hidden=2)
    act1 <- mx.symbol.Activation(fc1, name="relu2", act_type="tanh")
    fc2 <- mx.symbol.FullyConnected(act1, name="fc2", num_hidden=3)
    act2 <- mx.symbol.Activation(fc2, name="relu3", act_type="relu")
    fc3 <- mx.symbol.FullyConnected(act2, name="fc3", num_hidden=1)
    act3 <- mx.symbol.Activation(fc3, name="relu4", act_type="relu")
    softmax <- mx.symbol.LinearRegressionOutput(act3, name="sm")
    
    mx.set.seed(0)
    model <- mx.model.FeedForward.create(
      softmax,
      X = train.x,
      y = train.y,
      num.round = 10000,
      array.layout = "columnmajor",
      learning.rate = 10^-2,
      momentum = 0.95,
      eval.metric = mx.metric.rmse,
      epoch.end.callback = mx.callback.log.train.metric(10),
      lr_scheduler=mx.lr_scheduler.FactorScheduler(1000,factor=0.9),
      initializer=mx.init.uniform(0.5)
      )
    
    predict(model,train.x,array.layout="columnmajor")
    

    初始代码有一些差异:

    • 我通过在数据和第一层之间放置另一个激活层来改变神经网络的布局 . 我把它解释为在数据和输入层之间加权(这是正确的吗?)

    • 我将隐藏层(有3个神经元)的激活功能改为tanh,因为我猜XOR需要负权重

    • 我将SoftmaxOutput更改为LinearRegressionOutput,以优化平方损失

    • 微调学习率和动力

    • 最重要的是:我为权重添加了统一初始值设定项 . 我猜默认模式是将权重设置为零 . 使用随机初始权重时学习速度确实加快 .

    输出:

    Start training with 1 devices
    [1] Train-rmse=NaN
    [2] Train-rmse=0.706823888574888
    [3] Train-rmse=0.705537411582449
    [4] Train-rmse=0.701298592443344
    [5] Train-rmse=0.691897326795625
    ...
    [9999] Train-rmse=1.07453801496744e-07
    [10000] Train-rmse=1.07453801496744e-07
    > predict(model,train.x,array.layout="columnmajor")
         [,1]      [,2] [,3] [,4]
    [1,]    0 0.9999998    1    0
    

相关问题