How to load large embedding efficiently? #105

matthew-z · 2019-11-22T11:54:48Z

Describe the Question

I tried to load 840B+300d GloVe using mz.embedding.load_from_file. However, it utilizes more than 60+ GB memory, which looks abnormal.

from pathlib import Path
import matchzoo as mz


_glove_6B_embedding_url = "http://nlp.stanford.edu/data/glove.6B.zip"
_glove_840B_embedding_url = "http://nlp.stanford.edu/data/glove.840B.300d.zip"


def load_glove_embedding(dimension: int = 50, size="6B") -> mz.embedding.Embedding:
    """
    Return the pretrained glove embedding.

    :param dimension: the size of embedding dimension, the value can only be
        50, 100, or 300.
    :return: The :class:`mz.embedding.Embedding` object.
    """
    file_name = 'glove.{}.{}d.txt'.format(size, dimension)
    file_path = (Path(mz.USER_DATA_DIR) / 'glove').joinpath(file_name)

    if not file_path.exists():
        if size=="6B":
            url = _glove_6B_embedding_url
        elif size == "840B":
            url = _glove_840B_embedding_url
        else:
            raise ValueError("Incorrect Size for GloVe: %d" % size)

        mz.utils.get_file('glove_embedding',
                                        url,
                                        extract=True,
                                        cache_dir=mz.USER_DATA_DIR,
                                        cache_subdir='glove')

    return mz.embedding.load_from_file(file_path=str(file_path), mode='glove')

embedding = load_glove_embedding(300, "840B")

Describe your attempts

The TF version matchzoo uses pandas to read the GloVe file, and requires much less memory.

The text was updated successfully, but these errors were encountered:

Chriskuei · 2019-11-22T13:31:21Z

Thanks for your feedback. We will fix it soon.

matthew-z added the question Further information is requested label Nov 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to load large embedding efficiently? #105

How to load large embedding efficiently? #105

matthew-z commented Nov 22, 2019 •

edited

Chriskuei commented Nov 22, 2019

How to load large embedding efficiently? #105

How to load large embedding efficiently? #105

Comments

matthew-z commented Nov 22, 2019 • edited

Describe the Question

Describe your attempts

Chriskuei commented Nov 22, 2019

matthew-z commented Nov 22, 2019 •

edited