Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于create_embedding_dict里面embedding_name 重复的问题 #66

Open
mmmmlz opened this issue Dec 15, 2021 · 3 comments
Open

关于create_embedding_dict里面embedding_name 重复的问题 #66

mmmmlz opened this issue Dec 15, 2021 · 3 comments
Labels
question Further information is requested

Comments

@mmmmlz
Copy link

mmmmlz commented Dec 15, 2021

Please refer to the FAQ in doc and search for the related issues before you ask the question.

Describe the question(问题描述)
在 create_embedding_dict 函数中,使用特征的embedding_name 作为key来进行存储,那么如果特征的embedding_name 重复,事实上在例如在run_sdm里面用户序列中变长特征和item特征就是重复的。这样的话会导致得到的emd被覆盖,如果只需要一份embedding的话,那这样写还有什么意义?

def create_embedding_dict(sparse_feature_columns, varlen_sparse_feature_columns, seed, l2_reg,
                          prefix='sparse_', seq_mask_zero=True):
    sparse_embedding = {}
    for feat in sparse_feature_columns:
        emb = Embedding(feat.vocabulary_size, feat.embedding_dim,
                        embeddings_initializer=feat.embeddings_initializer,
                        embeddings_regularizer=l2(l2_reg),
                        name=prefix + '_emb_' + feat.embedding_name)
        emb.trainable = feat.trainable
        sparse_embedding[feat.embedding_name] = emb

    if varlen_sparse_feature_columns and len(varlen_sparse_feature_columns) > 0:
        for feat in varlen_sparse_feature_columns:
            # if feat.name not in sparse_embedding:
            emb = Embedding(feat.vocabulary_size, feat.embedding_dim,
                            embeddings_initializer=feat.embeddings_initializer,
                            embeddings_regularizer=l2(
                                l2_reg),
                            name=prefix + '_seq_emb_' + feat.name,
                            mask_zero=seq_mask_zero)
            emb.trainable = feat.trainable
            sparse_embedding[feat.embedding_name] = emb
    return `sparse_embedding`
@mmmmlz mmmmlz added the question Further information is requested label Dec 15, 2021
@bbruceyuan
Copy link

我理解这样只是为了书写方便。

原因:val_len_item 和 item 都是通过同一个 embedding_name 作为key, 在初始化 embedding_dict 的时候会出现一次覆盖;但是之后 无论是更新 val_len_item 还是 item 的 embedding, 都是更新同一个 key 对应的 embedding,所以在创建的时候 embedding_name为 key 的 embedding 被覆盖一次并没有什么关系。

@haiming2019
Copy link

haiming2019 commented Mar 16, 2022

@cxyz1
Copy link

cxyz1 commented Aug 12, 2022

是为了share embedding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants