通过关于udacity的word2vec教程,从文章中可以看出,输入字向量和输出有单独的矩阵 .

例如 . ['the','cat','sat','on','mat'] . 这里输入向量$ w_i $, 'the','cat','on','mat' 将预测 'sat' 的输出向量$ w_o $ . 它通过如下所示的采样softmax来实现,其中 |context| 是上下文字的大小(在这种情况下为4) .

enter image description here

因此,一旦完成训练,可能有两个矢量用于 sat 作为输入矢量,另一个矢量用于输出矢量 . The question is why not have one matrix . 这将确保对齐相同单词的输入和输出向量 .

如果有帮助,张量流代码附在 (why not set softmax_weights = embedding and softmax_biases=0) 下面:

# Variables.
embeddings = tf.Variable(tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
softmax_weights = tf.Variable(tf.truncated_normal([vocabulary_size, embedding_size],
                         stddev=1.0 / math.sqrt(embedding_size)))
softmax_biases = tf.Variable(tf.zeros([vocabulary_size]))

# Model.
# Look up embeddings for inputs.
embed = tf.nn.embedding_lookup(embeddings, train_dataset)
# Compute the softmax loss, using a sample of the negative labels each time.
loss = tf.reduce_mean(tf.nn.sampled_softmax_loss(softmax_weights, softmax_biases, embed,
                               train_labels, num_sampled, vocabulary_size))

更新:

我没有单独的输出矩阵实现它,结果看起来仍然很好:https://github.com/sachinruk/word2vec_alternate . I suppose the questions now should be is there a mathematical reason as to why the output matrix should be different.