首页 文章

Keras:如何实现LSTM的目标复制?

提问于
浏览
1

使用Lipton et al (2016)中的示例,目标复制基本上计算LSTM(或GRU)的每个时间步(最终除外)的损失,并将此损失平均并将其添加到训练时的主要损失中 . 在数学上,它是由 -

loss equation

从图形上看,它可以表示为 -

enter image description here

那么如何在Keras中实现这一目标呢?说,我有二进制分类任务 . 让我们说我的模型是一个简单的模型 -

model.add(LSTM(50))    
model.add(Dense(1))
model.compile(loss='binary_crossentropy', class_weights={0:0.5, 1:4}, optimizer=Adam(), metrics=['accuracy'])
model.fit(x_train, y_train)
  • 我认为 y_train 需要从(batch_size,1)重新整形/平铺到(batch_size,time_step)吗?

  • 设置 return_sequences=True 后,密集层需要 TimeDistributed 正确应用于LSTM吗?

  • 如何准确实现上面给出的精确损失函数? class_weights 需要修改吗?

  • 目标复制仅在培训期间进行 . 如何仅使用主要损失实施验证集评估?

  • 如何在目标复制中处理零填充?我的序列被填充到15的 max_len ,平均长度为7.由于目标复制损失在所有步骤中平均,所以如何确保它在计算损失时不使用填充的单词?基本上,动态分配T实际的序列长度 .

1 回答

  • 3

    Question 1:

    所以,对于目标,你需要它的形状为 (batch_size, time_steps, 1) . 只需使用:

    y_train = np.stack([y_train]*time_steps, axis=1)
    

    Question 2:

    你是对的,但是Keras 2中的 TimeDistributed 是可选的 .

    Question 3:

    我不知道 class 权重的表现如何,但常规损失函数应该是这样的:

    from keras.losses import binary_crossentropy
    
    def target_replication_loss(alpha):
    
        def inner_loss(true,pred):
            losses = binary_crossentropy(true,pred)
    
            return (alpha*K.mean(losses[:,:-1], axis=-1)) + ((1-alpha)*losses[:,-1])
    
        return inner_loss
    
    model.compile(......, loss = target_replication_loss(alpha), ...)
    

    Question 3a:

    由于上述方法对于 class 权重并不合适,我创造了一种权重进入损失的替代方案:

    def target_replication_loss(alpha, class_weights):
    
        def get_weights(x):
            b = class_weights[0]
            a = class_weights[1] - b
            return (a*x) + b
    
        def inner_loss(true,pred):
            #this will only work for classification with only one class 0 or 1
            #and only if the target is the same for all classes
            true_classes = true[:,-1,0]
            weights = get_weights(true_classes)
    
            losses = binary_crossentropy(true,pred)
    
            return weights*((alpha*K.mean(losses[:,:-1], axis=-1)) + ((1-alpha)*losses[:,-1]))
    
        return inner_loss
    

    Question 4:

    为了避免复杂性,我会说你应该在验证中使用额外的 metric

    def last_step_BC(true,pred):
        return binary_crossentropy(true[:,-1], pred[:,-1])
    
    model.compile(...., 
                  loss = target_replication_loss(alpha), 
                  metrics=[last_step_BC])
    

    Question 5:

    这是一个很难的,我需要研究一下......

    作为初始解决方法,您可以使用 (None, features) 的输入形状设置模型,并单独训练每个序列 .


    没有class_weight的工作示例

    def target_replication_loss(alpha):
    
        def inner_loss(true,pred):
            losses = binary_crossentropy(true,pred)
            #print(K.int_shape(losses))
            #print(K.int_shape(losses[:,:-1]))
            #print(K.int_shape(K.mean(losses[:,:-1], axis=-1)))
            #print(K.int_shape(losses[:,-1]))
    
            return (alpha*K.mean(losses[:,:-1], axis=-1)) + ((1-alpha)*losses[:,-1])
    
        return inner_loss
    
    alpha = 0.6
    
    i1 = Input((5,2))
    i2 = Input((5,2))
    
    out = LSTM(1, activation='sigmoid', return_sequences=True)(i1)
    model = Model(i1, out)
    
    model.compile(optimizer='adam', loss = target_replication_loss(alpha))
    
    model.fit(np.arange(30).reshape((3,5,2)), np.arange(15).reshape((3,5,1)), epochs = 200)
    

    具有类权重的工作示例:

    def target_replication_loss(alpha, class_weights):
    
        def get_weights(x):
            b = class_weights[0]
            a = class_weights[1] - b
            return (a*x) + b
    
        def inner_loss(true,pred):
            #this will only work for classification with only one class 0 or 1
            #and only if the target is the same for all classes
            true_classes = true[:,-1,0]
            weights = get_weights(true_classes)
    
            losses = binary_crossentropy(true,pred)
            print(K.int_shape(losses))
            print(K.int_shape(losses[:,:-1]))
            print(K.int_shape(K.mean(losses[:,:-1], axis=-1)))
            print(K.int_shape(losses[:,-1]))
            print(K.int_shape(weights))
    
            return weights*((alpha*K.mean(losses[:,:-1], axis=-1)) + ((1-alpha)*losses[:,-1]))
    
        return inner_loss
    
    alpha = 0.6
    class_weights={0: 0.5, 1:4.}
    
    i1 = Input(batch_shape=(3,5,2))
    i2 = Input((5,2))
    
    out = LSTM(1, activation='sigmoid', return_sequences=True)(i1)
    model = Model(i1, out)
    
    model.compile(optimizer='adam', loss = target_replication_loss(alpha, class_weights))
    
    model.fit(np.arange(30).reshape((3,5,2)), np.arange(15).reshape((3,5,1)), epochs = 200)
    

相关问题