首页 文章

Theano:如何将训练数据提供给神经网络

提问于
浏览
3

我正在尝试为Theano中的“逻辑和”创建一个简单的多层感知器(MLP) . 输入和输出之间有一层 . 结构是这样的:

2值输入 - >乘以权重,加偏置 - > softmax - > 1值输出

尺寸的变化是由权重矩阵引起的 .

该实现基于本教程:http://deeplearning.net/tutorial/logreg.html

这是我的Layer类:

class Layer():
"""
this is a layer in the mlp
it's not meant to predict the outcome hence it does not compute a loss.
apply the functions for negative log likelihood = cost on the output of the last layer
"""

def __init__(self, input, n_in, n_out):
    self.W = theano.shared(
            value=numpy.zeros(
                    (n_in, n_out),
                    dtype=theano.config.floatX
            ),
            name="W",
            borrow=True
    )
    self.b = theano.shared(
            value=numpy.zeros((n_in
                               , n_out),
                              dtype=theano.config.floatX),
            name="b",
            borrow=True
    )

    self.output = T.nnet.softmax(T.dot(input, self.W) + self.b)
    self.params = (self.W, self.b)
    self.input = input

该课程是模块化的 . 我希望能够添加多个图层,而不仅仅是一个图层 . 因此,预测,成本和错误的功能都在类之外(与教程相反):

def y_pred(output):
    return T.argmax(output, axis=1)


def negative_log_likelihood(output, y):
    return -T.mean(T.log(output)[T.arange(y.shape[0]), y])


def errors(output, y):
    # check if y has same dimension of y_pred
    if y.ndim != y_pred(output).ndim:
        raise TypeError(
                'y should have the same shape as self.y_pred',
                ('y', y.type, 'y_pred', y_pred(output).type)
        )
    # check if y is of the correct datatype
    if y.dtype.startswith('int'):
        # the T.neq operator returns a vector of 0s and 1s, where 1
        # represents a mistake in prediction
        return T.mean(T.neq(y_pred(output), y))
    else:
        raise NotImplementedError()

合乎逻辑且有4个培训课程:

  • [0,0] - > 0

  • [1,0] - > 0

  • [0,1] - > 0

  • [1,1] - > 1

以下是分类器的设置以及培训和评估的功能:

data_x = numpy.matrix([[0, 0],
                       [1, 0],
                       [0, 1],
                       [1, 1]])

data_y = numpy.array([0,
                      0,
                      0,
                      1])

train_set_x = theano.shared(numpy.asarray(data_x,
                         dtype=theano.config.floatX),
                         borrow=True)

train_set_y = T.cast(theano.shared(numpy.asarray(data_y,
                         dtype=theano.config.floatX),
                         borrow=True),"int32")

x = T.vector("x",theano.config.floatX)  # data
y = T.ivector("y")  # labels

classifier = Layer(input=x, n_in=2, n_out=1)

cost = negative_log_likelihood(classifier.output, y)

g_W = T.grad(cost=cost, wrt=classifier.W)
g_b = T.grad(cost=cost, wrt=classifier.b)
index = T.lscalar()

learning_rate = 0.15

updates = [
    (classifier.W, classifier.W - learning_rate * g_W),
    (classifier.b, classifier.b - learning_rate * g_b)
]

train_model = theano.function(
        inputs=[index],
        outputs=cost,
        updates=updates,
        givens={
            x: train_set_x[index],
            y: train_set_y[index]
        }
)
validate_model = theano.function(
        inputs=[index],
        outputs=classifier.errors(y),
        givens={
            x: train_set_x[index],
            y: train_set_y[index]
        }
)

我试图遵循惯例 . 数据矩阵中的每一行都是训练样本 . 每个训练样本都与正确的输出相匹配 . 不幸的是代码中断了 . 我无法解释错误消息 . 我做错了什么 ?错误:

TypeError:无法将Type TensorType(int32,标量)(Variable Subtensor .0)转换为Type TensorType(int32,vector) . 您可以尝试手动将Subtensor .0转换为TensorType(int32,vector) .

此错误发生在Theano代码的深处 . 我程序中的冲突线是:

train_model = theano.function(
        inputs=[index],
        outputs=cost,
        updates=updates,
        givens={
            x: train_set_x[index],
            y: train_set_y[index]      # <---------------HERE
        }
)

显然,y的维度与训练数据之间存在不匹配 . 我在pastebin上的完整代码:http://pastebin.com/U5jYitk2 pastebin上的完整错误消息:http://pastebin.com/hUQJhfNM

Concise question: 将训练数据提供给theano中的mlp的正确方法是什么?我的错误在哪里?

我复制了本教程的大部分代码 . 值得注意的变化(错误的可能原因)是:

y的

  • 训练数据不是矩阵 . 我认为这是正确的,因为我的网络输出只是一个标量值

  • 第一层的输入是矢量 . 此变量名为x .

  • 访问训练数据不使用切片 . 在教程中,训练数据非常复杂,我发现很难读取数据访问代码 . 我相信x应该是数据矩阵的一行 . 这就是我实现它的方式 .

UPDATE: 我使用了Amir的代码 . 看起来很好,谢谢 .

但它也会产生错误 . 最后一个循环超出范围:

/usr/bin/python3.4 /home/lhk/programming/sk/mlp/mlp/Layer.py Traceback(最近一次调用最后一次):文件“/usr/local/lib/python3.4/dist-packages/ theano / compile / function_module.py“,第595行,在调用outputs = self.fn()中ValueError:y_i值超出范围在处理上述异常期间,发生了另一个异常:Traceback(最近一次调用last):File”/ home / lhk / programming / sk / mlp / mlp / Layer.py“,第113行,在train_model(i)文件”/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py“ ,第606行,在调用storage_map = self.fn.storage_map)文件“/usr/local/lib/python3.4/dist-packages/theano/gof/link.py”,第206行,在raise_with_op中引发exc_type(exc_value) .with_traceback(exc_trace)文件“/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py”,第595行,调用输出= self.fn()ValueError:y_i值超出范围应用导致错误的节点:CrossentropySoftmaxArgmax1HotWithBias(Dot22.0,b,Elemwise {Cast } . 0)输入类型:[TensorType (float64,matrix),TensorType(float64,vector),TensorType(int32,vector)]输入形状:[(1,1),(1,),(1,)]输入步幅:[(8,8), (8,),(4,)]输入值:[array([[0.]]),array([0.]),array([1],dtype = int32)]提示:重新运行最多禁用Theano优化可以为您提供创建此节点的回溯 . 这可以通过设置Theano标志'optimizer = fast_compile'来完成 . 如果这不起作用,可以使用'optimizer = None'禁用Theano优化 . 提示:将Theano标志'exception_verbosity = high'用于此apply节点的调试打印和存储映射占用空间 .

第113行是这一行:

#train the model
for i in range(train_set_x.shape[0].eval()):
    train_model(i)              # <-----------------HERE

我相信这是因为训练数据的索引使用 indexindex+1 . 为什么这有必要?一行应该是一个训练样本 . 一排是 train_set_x[index]

编辑:我调试了代码 . 没有切片,它返回一个1d阵列,切片为2d . 1d应该与矩阵x不兼容 .

但是当我这样做时,我发现了另一个奇怪的问题:我添加了这段代码来查看培训的效果:

print("before")
print(classifier.W.get_value())
print(classifier.b.get_value())

for i in range(3):
    train_model(i)

print("after")
print(classifier.W.get_value())
print(classifier.b.get_value())

before
[[ 0.]
 [ 0.]]
[ 0.]
after
[[ 0.]
 [ 0.]]
[ 0.]

这是有道理的,因为第一个三个样本的正确输出为0 . 如果我更改顺序并将训练样本(1,1),1移到前面,程序将崩溃 .

之前[[0.] [0.]] [0.] Traceback(最近一次调用最后一次):文件“/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py”,第595行,在调用outputs = self.fn()中ValueError:y_i值超出范围在处理上述异常期间,发生了另一个异常:Traceback(最近一次调用last):文件“/ home / lhk / programming / sk / mlp /mlp/Layer.py“,第121行,在train_model(i)文件”/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py“,第606行,在调用storage_map = self .fn.storage_map)文件“/usr/local/lib/python3.4/dist-packages/theano/gof/link.py”,第206行,在raise_with_op中引发exc_type(exc_value).with_traceback(exc_trace)文件“/ usr /local/lib/python3.4/dist-packages/theano/compile/function_module.py“,第595行,在调用输出中= self.fn()ValueError:y_i值超出范围应用导致错误的节点:CrossentropySoftmaxArgmax1HotWithBias( Dot22.0,b,Elemwise {Cast } . 0)输入类型:[TensorType(float64,matrix),TensorType(floa) t64,vector),TensorType(int32,vector)]输入形状:[(1,1),(1,),(1,)]输入步幅:[(8,8),(8,),(4, )]输入值:[array([[0.]]),array([0.]),array([1],dtype = int32)]提示:重新运行大多数Theano优化禁用可以给你一个回-trace创建此节点的时间 . 这可以通过设置Theano标志'optimizer = fast_compile'来完成 . 如果这不起作用,可以使用'optimizer = None'禁用Theano优化 . 提示:将Theano标志'exception_verbosity = high'用于此apply节点的调试打印和存储映射占用空间 .

UPDATE

我在Theano上安装了Python2.7并尝试再次运行代码 . 发生同样的错误 . 我添加了详细的异常处理 . 这是输出:

/usr/bin/python2.7 /home/lhk/programming/sk/mlp/mlp/Layer.py
Traceback (most recent call last):
  File "/home/lhk/programming/sk/mlp/mlp/Layer.py", line 113, in <module>
    train_model(i)
  File "/home/lhk/.local/lib/python2.7/site-packages/theano/compile/function_module.py", line 595, in __call__
    outputs = self.fn()
  File "/home/lhk/.local/lib/python2.7/site-packages/theano/gof/link.py", line 485, in streamline_default_f
    raise_with_op(node, thunk)
  File "/home/lhk/.local/lib/python2.7/site-packages/theano/gof/link.py", line 481, in streamline_default_f
    thunk()
  File "/home/lhk/.local/lib/python2.7/site-packages/theano/gof/op.py", line 768, in rval
    r = p(n, [x[0] for x in i], o)
  File "/home/lhk/.local/lib/python2.7/site-packages/theano/tensor/nnet/nnet.py", line 896, in perform
    nll[i] = -row[y_idx[i]] + m + numpy.log(sum_j)
IndexError: index 1 is out of bounds for axis 0 with size 1
Apply node that caused the error: CrossentropySoftmaxArgmax1HotWithBias(Dot22.0, b, Subtensor{int32:int32:}.0)
Inputs types: [TensorType(float64, matrix), TensorType(float64, vector), TensorType(int32, vector)]
Inputs shapes: [(1, 1), (1,), (1,)]
Inputs strides: [(8, 8), (8,), (4,)]
Inputs values: [array([[ 0.]]), array([ 0.]), array([1], dtype=int32)]

Debugprint of the apply node: 
CrossentropySoftmaxArgmax1HotWithBias.0 [@A] <TensorType(float64, vector)> ''   
 |Dot22 [@B] <TensorType(float64, matrix)> ''   
 | |Subtensor{int32:int32:} [@C] <TensorType(float64, matrix)> ''   
 | | |<TensorType(float64, matrix)> [@D] <TensorType(float64, matrix)>
 | | |ScalarFromTensor [@E] <int32> ''   
 | | | |<TensorType(int32, scalar)> [@F] <TensorType(int32, scalar)>
 | | |ScalarFromTensor [@G] <int32> ''   
 | |   |Elemwise{add,no_inplace} [@H] <TensorType(int32, scalar)> ''   
 | |     |<TensorType(int32, scalar)> [@F] <TensorType(int32, scalar)>
 | |     |TensorConstant{1} [@I] <TensorType(int8, scalar)>
 | |W [@J] <TensorType(float64, matrix)>
 |b [@K] <TensorType(float64, vector)>
 |Subtensor{int32:int32:} [@L] <TensorType(int32, vector)> ''   
   |Elemwise{Cast{int32}} [@M] <TensorType(int32, vector)> ''   
   | |<TensorType(float64, vector)> [@N] <TensorType(float64, vector)>
   |ScalarFromTensor [@E] <int32> ''   
   |ScalarFromTensor [@G] <int32> ''   
CrossentropySoftmaxArgmax1HotWithBias.1 [@A] <TensorType(float64, matrix)> ''   
CrossentropySoftmaxArgmax1HotWithBias.2 [@A] <TensorType(int32, vector)> ''   

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.

Process finished with exit code 1

UPDATE:

我再次查看了训练数据 . 任何带有1作为标签的样本都会产生上述错误 .

data_y = numpy.array([1,
                      1,
                      1,
                      1])

对于(0,1,2,3)中的i,上述样本标签将针对每个train_model(i)崩溃 . 显然,在索引样本和样本内容之间存在干扰 .

UPDATE: 确实问题与Amir的联系方式一样,是输出层的尺寸 . 我有一种误解,我可以训练网络直接在输出神经元中编码函数"logical and"的输出 . 虽然这当然是可能的,但这种训练方法使用y值索引来选择应具有最高值的输出节点 . 将输出大小更改为2后,代码可以正常工作 . 通过足够的培训,所有案例的错误确实变为零 .

2 回答

  • 1

    这是您的问题的工作代码 . 你的代码中有很多小错误 . 导致您获得错误的原因是由于将 b 定义为 n_inn_out 矩阵而不是简单地将其定义为'n_out'向量 . 更新部分在括号 [] 中定义,而不是括号 () .

    此外,索引被定义为 int32 符号标量(这不是很重要) . 另一个导入更改是在给定正确索引的情况下定义函数 . 您使用 index 编译函数的方式不会让函数由于某种原因进行编译 . 您还将输入声明为向量 . 这样,您将无法使用小批量或完整批次来训练模型 . 因此将其声明为符号矩阵是安全的 . 要使用向量,您需要将输入存储为向量而不是共享变量上的矩阵,以使程序运行 . 因此,会有这样的头痛将其声明为矢量 . 最后,虽然已从 Layer 类中删除了函数 errors ,但您已使用 classifier.errors(y) 编译验证函数 .

    import theano
    import theano.tensor as T
    import numpy
    
    
    class Layer(object):
        """
        this is a layer in the mlp
        it's not meant to predict the outcome hence it does not compute a loss.
        apply the functions for negative log likelihood = cost on the output of the last layer
        """
    
        def __init__(self, input, n_in, n_out):
            self.x = input
            self.W = theano.shared(
                    value=numpy.zeros(
                            (n_in, n_out),
                            dtype=theano.config.floatX
                    ),
                    name="W",
                    borrow=True
            )
            self.b = theano.shared(
                    value=numpy.zeros(n_out,
                                      dtype=theano.config.floatX),
                    name="b",
                    borrow=True
            )
    
            self.output = T.nnet.softmax(T.dot(self.x, self.W) + self.b)
            self.params = [self.W, self.b]
            self.input = input
    
    
    def y_pred(output):
        return T.argmax(output, axis=1)
    
    
    def negative_log_likelihood(output, y):
        return -T.mean(T.log(output)[T.arange(y.shape[0]), y])
    
    
    def errors(output, y):
        # check if y has same dimension of y_pred
        if y.ndim != y_pred(output).ndim:
            raise TypeError(
                    'y should have the same shape as self.y_pred',
                    ('y', y.type, 'y_pred', y_pred(output).type)
            )
        # check if y is of the correct datatype
        if y.dtype.startswith('int'):
            # the T.neq operator returns a vector of 0s and 1s, where 1
            # represents a mistake in prediction
            return T.mean(T.neq(y_pred(output), y))
        else:
            raise NotImplementedError()
    
    data_x = numpy.matrix([[0, 0],
                           [1, 0],
                           [0, 1],
                           [1, 1]])
    
    data_y = numpy.array([0,
                          0,
                          0,
                          1])
    
    train_set_x = theano.shared(numpy.asarray(data_x,
                             dtype=theano.config.floatX),
                             borrow=True)
    
    train_set_y = T.cast(theano.shared(numpy.asarray(data_y,
                             dtype=theano.config.floatX),
                             borrow=True),"int32")
    
    x = T.matrix("x")  # data
    y = T.ivector("y")  # labels
    
    classifier = Layer(input=x, n_in=2, n_out=1)
    
    cost = negative_log_likelihood(classifier.output, y)
    
    g_W = T.grad(cost=cost, wrt=classifier.W)
    g_b = T.grad(cost=cost, wrt=classifier.b)
    index = T.iscalar()
    
    learning_rate = 0.15
    
    updates = (
        (classifier.W, classifier.W - learning_rate * g_W),
        (classifier.b, classifier.b - learning_rate * g_b)
    )
    
    train_model = theano.function(
            inputs=[index],
            outputs=cost,
            updates=updates,
            givens={
                x: train_set_x[index:index + 1],
                y: train_set_y[index:index + 1]
            }
    )
    validate_model = theano.function(
            inputs=[index],
            outputs=errors(classifier.output, y),
            givens={
                x: train_set_x[index:index + 1],
                y: train_set_y[index:index + 1]
            }
    )
    
    #train the model
    for i in range(train_set_x.shape[0].eval()):
        train_model(i)
    

    Here's the updated code . 请注意,上面的代码与下面的代码之间的主要区别在于,后者适用于二进制问题,而另一个仅在您遇到多类问题时才有效,而在此情况并非如此 . 我在这里放两个代码片段的原因是出于教育目的 . 请阅读评论以了解上述代码的问题以及我如何解决它 .

    import theano
    import theano.tensor as T
    import numpy
    
    
    class Layer(object):
        """
        this is a layer in the mlp
        it's not meant to predict the outcome hence it does not compute a loss.
        apply the functions for negative log likelihood = cost on the output of the last layer
        """
    
        def __init__(self, input, n_in, n_out):
            self.x = input
            self.W = theano.shared(
                    value=numpy.zeros(
                            (n_in, n_out),
                            dtype=theano.config.floatX
                    ),
                    name="W",
                    borrow=True
            )
            self.b = theano.shared(
                    value=numpy.zeros(n_out,
                                      dtype=theano.config.floatX),
                    name="b",
                    borrow=True
            )
    
            self.output = T.reshape(T.nnet.sigmoid(T.dot(self.x, self.W) + self.b), (input.shape[0],))
            self.params = [self.W, self.b]
            self.input = input
    
    
    def y_pred(output):
        return output
    
    
    def negative_log_likelihood(output, y):
        return T.mean(T.nnet.binary_crossentropy(output,y))
    
    
    def errors(output, y):
        # check if y has same dimension of y_pred
        if y.ndim != y_pred(output).ndim:
            raise TypeError(
                    'y should have the same shape as self.y_pred',
                    ('y', y.type, 'y_pred', y_pred(output).type)
            )
        # check if y is of the correct datatype
        if y.dtype.startswith('int'):
            # the T.neq operator returns a vector of 0s and 1s, where 1
            # represents a mistake in prediction
            return T.mean(T.neq(y_pred(output), y))
        else:
            raise NotImplementedError()
    
    data_x = numpy.matrix([[0, 0],
                           [1, 0],
                           [0, 1],
                           [1, 1]])
    
    data_y = numpy.array([0,
                          0,
                          0,
                          1])
    
    train_set_x = theano.shared(numpy.asarray(data_x,
                             dtype=theano.config.floatX),
                             borrow=True)
    
    train_set_y = T.cast(theano.shared(numpy.asarray(data_y,
                             dtype=theano.config.floatX),
                             borrow=True),"int32")
    
    x = T.matrix("x")  # data
    y = T.ivector("y")  # labels
    
    classifier = Layer(input=x, n_in=2, n_out=1)
    
    cost = negative_log_likelihood(classifier.output, y)
    
    g_W = T.grad(cost=cost, wrt=classifier.W)
    g_b = T.grad(cost=cost, wrt=classifier.b)
    index = T.iscalar()
    
    learning_rate = 0.15
    
    updates = (
        (classifier.W, classifier.W - learning_rate * g_W),
        (classifier.b, classifier.b - learning_rate * g_b)
    )
    
    train_model = theano.function(
            inputs=[index],
            outputs=cost,
            updates=updates,
            givens={
                x: train_set_x[index:index+1],
                y: train_set_y[index:index+1]
            }
    )
    validate_model = theano.function(
            inputs=[index],
            outputs=errors(classifier.output, y),
            givens={
                x: train_set_x[index:index + 1],
                y: train_set_y[index:index + 1]
            }
    )
    
    #train the model
    for i in range(train_set_x.shape[0].eval()):
        train_model(i)
    
  • 0

    你可以尝试我的MLP课程:

    基于Lasagne / Theano的MultiLayer Perceptron MLP,它接受稀疏和密集输入矩阵,并且非常易于使用scikit-learn api相似性 .

    它有辍学可配置/稀疏输入/可以改为逻辑回归,易于改变成本函数和l1 / l2 / elasticnet规则化 .

    代码是here

相关问题