Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion about loss calculation #134

Open
distillation-dcf opened this issue May 12, 2023 · 0 comments
Open

Confusion about loss calculation #134

distillation-dcf opened this issue May 12, 2023 · 0 comments

Comments

@distillation-dcf
Copy link

distillation-dcf commented May 12, 2023

Hi!

In forward() function of model.py, text loss and image loss is computed by

labels = torch.cat((text[:, 1:], image_input_ids), dim=1).contiguous().long()  # shape: (bs, 127+1024=1151)
loss_text = F.cross_entropy(
    text_logits,
    labels[:, :self.text_seq_length])  # shape: (bs, 128)
loss_img = F.cross_entropy(
    image_logits,
    labels[:, self.text_seq_length:])  # shape: (bs, 1023)

Here text[:, 1:] should be removal of the first [BOS] text token label, then there are only 128-1=127 text tokens left in labels. But in CE loss, text logits with seq_len=128 and labels[:, :self.text_seq_length]) # shape: (bs, 128) come to calculate the text loss. I guess that the very first image token after all text tokens are taken into text loss computation by mistake.

Am I understanding the code correctly? Will the text token length in CE loss calculation affect the training process?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant