encoder中不使用mask？而且自注意力计算中的mask计算方式是不是有误？ #10

wjx-git · 2019-12-03T09:57:04Z

在 bert_model.py 中第92行，
encoder_class = Encoder(self.d_model, self.d_k, self.d_v, self.sequence_length, self.h, self.batch_size,
self.num_layer, self.input_representation, self.input_representation,
dropout_keep_prob=self.dropout_keep_prob,
use_residual_conn=self.use_residual_conn)
参数mask为何没有赋值，意思是默认不用掩模？但编码器中掩模操作是必须的吧。

在 multi_head_attention.py中第82行，
mask = tf.expand_dims(self.mask, axis=-1) # [batch,sequence_length,1]
mask = tf.expand_dims(mask, axis=1) # [batch,1,sequence_length,1]
dot_product = dot_product + mask # [batch,h,sequence_length,1]

掩模操作怎么会是直接相加呢？

brightmart · 2019-12-03T11:07:41Z

能否提交上来一个正确的呢？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

encoder中不使用mask？而且自注意力计算中的mask计算方式是不是有误？ #10

encoder中不使用mask？而且自注意力计算中的mask计算方式是不是有误？ #10

wjx-git commented Dec 3, 2019 •

edited

brightmart commented Dec 3, 2019

encoder中不使用mask？而且自注意力计算中的mask计算方式是不是有误？ #10

encoder中不使用mask？而且自注意力计算中的mask计算方式是不是有误？ #10

Comments

wjx-git commented Dec 3, 2019 • edited

brightmart commented Dec 3, 2019

wjx-git commented Dec 3, 2019 •

edited