TensorFlow：实现类加权交叉熵损失？-Java 学习之路

假设在对用于分割的图像执行中值频率 balancer 之后，我们有以下类权重：

class_weights = {0: 0.2595,
                 1: 0.1826,
                 2: 4.5640,
                 3: 0.1417,
                 4: 0.9051,
                 5: 0.3826,
                 6: 9.6446,
                 7: 1.8418,
                 8: 0.6823,
                 9: 6.2478,
                 10: 7.3614,
                 11: 0.0}

我们的想法是创建一个weight_mask，使其可以乘以两个类的交叉熵输出 . 要创建此权重掩码，我们可以根据ground_truth标签或预测来广播这些值 . 我实施的一些数学：

标签和logits的形状都是 [batch_size, height, width, num_classes]
重量面具的形状 [batch_size, height, width, 1]
权重掩码被广播到logit的softmax与标签之间相乘的 num_classes 个通道数，以给出 [batch_size, height, width, num_classes] 的输出形状 . 在这种情况下， num_classes 是12 .
减少批处理中每个示例的总和，然后对一个批处理中的所有示例执行reduce mean，以获得单个标量损失值 .

在这种情况下， should we create the weight mask based on the predictions or the ground truth?

如果我们基于ground_truth构建它，则意味着无论预测的像素标签是什么，它们都会根据类的实际标签受到惩罚，这似乎并不能以合理的方式指导培训 .

但是如果我们基于预测构建它，那么对于生成的任何logit预测，如果预测的标签（来自logit的argmax）占主导地位，那么该像素的logit值将全部减少很多 .

虽然这意味着最大logit仍然是最大值，因为12个通道中的所有logits将按相同的值进行缩放，预测标签的最终softmax概率（在缩放之前和之后仍然相同），将比缩放之前更低（做一些简单的数学估算） . - >预计会有较低的损失

但问题是： If a lower loss is predicted as a result of this weighting, then wouldn't it contradict the idea that predicting dominant labels should give you a greater loss?

我对这种方法的总体印象是：

对于占主导地位的品牌，他们受到的惩罚和奖励要少得多 .
对于不那么显着的标签，如果预测是正确的，它们会得到很高的回报，但是对于错误的预测它们也会受到很大的惩罚 .

那么这有助于解决课堂 balancer 问题呢？我不太明白这里的逻辑 .

IMPLEMENTATION

这是我目前用于计算加权交叉熵损失的实现，尽管我不确定它是否正确 .

def weighted_cross_entropy(logits, onehot_labels, class_weights):
    if not logits.dtype == tf.float32:
        logits = tf.cast(logits, tf.float32)

    if not onehot_labels.dtype == tf.float32:
        onehot_labels = tf.cast(onehot_labels, tf.float32)

    #Obtain the logit label predictions and form a skeleton weight mask with the same shape as it
    logit_predictions = tf.argmax(logits, -1) 
    weight_mask = tf.zeros_like(logit_predictions, dtype=tf.float32)

    #Obtain the number of class weights to add to the weight mask
    num_classes = logits.get_shape().as_list()[3]

    #Form the weight mask mapping for each pixel prediction
    for i in xrange(num_classes):
        binary_mask = tf.equal(logit_predictions, i) #Get only the positions for class i predicted in the logits prediction
        binary_mask = tf.cast(binary_mask, tf.float32) #Convert boolean to ones and zeros
        class_mask = tf.multiply(binary_mask, class_weights[i]) #Multiply only the ones in the binary mask with the specific class_weight
        weight_mask = tf.add(weight_mask, class_mask) #Add to the weight mask

    #Multiply the logits with the scaling based on the weight mask then perform cross entropy
    weight_mask = tf.expand_dims(weight_mask, 3) #Expand the fourth dimension to 1 for broadcasting
    logits_scaled = tf.multiply(logits, weight_mask)

    return tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=logits_scaled)

任何人都可以验证我的加权损失概念是否正确，以及我的实施是否正确？这是我第一次熟悉具有不 balancer 类的数据集，所以如果有人能够验证这一点，我真的很感激 .

TESTING RESULTS: 在做了一些测试之后，我发现上面的实现导致更大的损失 . 应该是这样吗？即，这会使训练更难，但最终会产生更准确的模型吗？

SIMILAR THREADS

请注意，我在这里检查了一个类似的线程：How can I implement a weighted cross entropy loss in tensorflow using sparse_softmax_cross_entropy_with_logits

但似乎TF只有一个样本权重的损失，但不是一个类别的权重 .

非常感谢你们所有人 .

1 回答

1
以下是我在Keras中使用TensorFlow后端的实现：
```
def class_weighted_pixelwise_crossentropy(target, output):
    output = tf.clip_by_value(output, 10e-8, 1.-10e-8)
    with open('class_weights.pickle', 'rb') as f:
        weight = pickle.load(f)
    return -tf.reduce_sum(target * weight * tf.log(output))
```
其中 weight 只是一个标准的Python列表，其权重索引与单热矢量中相应类的索引匹配 . 我将权重存储为pickle文件，以避免重新计算它们 . 它是Keras categorical_crossentropy loss function的改编 . 第一行只是剪切该值以确保我们永远不会记录0 .

我不确定为什么人们会使用预测来计算权重而不是基本事实;如果你提供进一步的解释，我可以更新我的回答作为回应 .

编辑：使用这个numpy代码来了解它是如何工作的 . 另请查看cross entropy的定义 .
```
import numpy as np

weights = [1,2]

target = np.array([ [[0.0,1.0],[1.0,0.0]],
                    [[0.0,1.0],[1.0,0.0]]])

output = np.array([ [[0.5,0.5],[0.9,0.1]],
                    [[0.9,0.1],[0.4,0.6]]])

crossentropy_matrix = -np.sum(target * np.log(output), axis=-1)
crossentropy = -np.sum(target * np.log(output))
```
回复于 2024-04-28T00:32:24+08:00

TensorFlow：实现类加权交叉熵损失？

1 回答

相关问题