我想在tf keras中进行采样softmax丢失 . 我通过继承keras Model来定义自己的模型 . 在init中,我指定了我需要的层,包括最后一个Dense投影层 . 但是这个Dense层不应该在训练中被调用,因为我想要采样softmax并且只使用它的权重和偏差 . 然后我定义了这样的损失函数:

class SampledSoftmax:
    def init( self,
              num_sampled,
              num_classes,
              projection,
              bias,
              hidden_size):
        self.weights = tf.transpose(projection)
        self.bias = bias
        self.num_classes = num_classes
        self.num_sampled = num_sampled
        self.hidden_size = hidden_size

    def call(self, y_true, input):
        """ reshaping of y_true and input to make them fit each other """
        input = tf.reshape(input, (-1,self.hidden_size))
        y_true = tf.reshape(y_true, (-1,1))

        return tf.nn.sampled_softmax_loss(
                   weights=self.weights,
                   biases=self.bias,
                   labels=y_true,
                   inputs=input,
                   num_sampled=self.num_sampled,
                   num_classes=self.num_classes,
                   partition_strategy='div')

它接收必要的参数进行初始化,类调用将是所需的采样softmax损失函数 . 问题是为模型编译增加损失我需要最后一个Dense的权重等 . 但是1)在训练中Dense不包含在模型中,并且2)即使它确实如此,Dense层也只能与输入连接,从而在调用我的自定义模型时获得其输入维度等 . 简而言之,在编译模型之前,权重等将不可用 . 任何人都可以提供一些帮助指出我正确的方向吗?

现在是导致它失败的代码 . 我首先将模型子类化如下:

class LanguageModel(tf.keras.Model):
    def __init__(self, 
                 vocal_size=15003, 
                 embedding_size=512
                 input_len=64)
       self.embedding = Embedding(vocal_size, embedding_size, 
                                  input_length=input_len)
       self.lstm = LSTM(hidden_size, return_sequences=True)
       self.dense = Dense(vocal_size, activation='softmax')

   def call(self, inputs, training=False):
       emb_out = self.embedding(inputs)
       lstm_out = self.lstm(embrace_out)
       res = self.dense(lstm_out)
       if (training)
           ''' shouldn't use the last dense as we want to do sampling'''
           return lstm_out
       return res

然后训练模型的部分如下

sampled_loss = SampledSoftmax(num_sampled, vocal_size, 
                   model.dense.kernel, model.dense.bias,
                   hidden_size)

model.compile(optimizer=tf.train.RMSPropOptimizer(lr),
              loss=sampled_loss)

它会失败然而我玩它,因为无法访问model.dense.kernel,因为在编译模型时,密集层尚未在调用方法中初始化 . 错误信息如下:

Traceback (most recent call last):
  File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/wuxinyu/workspace/nlu/lm/main.py", line 72, in <module>
    train_main()
  File "/home/wuxinyu/workspace/nlu/lm/main.py", line 64, in train_main
    train_model.build_lm_model()
  File "/home/wuxinyu/workspace/nlu/lm/main.py", line 26, in build_lm_model
self.model.dense.kernel,
AttributeError: 'Dense' object has no attribute 'kernel'

顺便说一句,上面定义的损失适用于如下的小测试案例 .

x = Input(shape=(10,), name='input_x')
emb_out = Embedding(10000,200,input_length=10)(x)
lstm_out = LSTM(200, return_sequences=True)(emb_out)

dense = Dense(10000, activation='sigmoid')
output = dense(lstm_out)

sl = SampledSoftmax(10, 10000, dense.kernel, dense.bias)

model = Model(inputs=x, outputs=lstm_out)
model.compile(optimizer='adam', loss=sl)
model.summary()
model.fit(dataset, epochs=20, steps_per_epoch=5)