Framework-agnostic `split_state_dict_into_shards` helper #1938

Wauplin · 2023-12-22T14:21:29Z

(feedback welcome 🙏)

PR based on an idea from @LysandreJik. Goal is to have a framework-agnostic helper to split a state dict into shards that can be reused in transformers, peft, accelerate, etc. The scope of this method is yet to be defined. At the moment, it takes as input a state_dict, a "max_size_per_shard" and a filename and returns an index + a "filename => tensor" mapping.

>>> import json
>>> import os
>>> from safetensors.torch import save_file as safe_save_file
>>> from huggingface_hub import split_torch_state_dict_into_shards

>>> def save_state_dict(state_dict: Dict[str, torch.Tensor], save_directory: str):
...     state_dict_split = split_torch_state_dict_into_shards(state_dict)
...     for filename, tensors in state_dict_split.filename_to_tensors.values():
...         shard = {tensor: state_dict[tensor] for tensor in tensors}
...         safe_save_file(
...             shard,
...             os.path.join(save_directory, filename),
...             metadata={"format": "pt"},
...         )
...     if state_dict_split.is_sharded:
...         index = {
...             "metadata": state_dict_split.metadata,
...             "weight_map": state_dict_split.tensor_to_filename,
...         }
...         with open(os.path.join(save_directory, "model.safetensors.index.json"), "w") as f:
...             f.write(json.dumps(index, indent=2))

Currently in PR:

take state_dict + threshold as input
take filename as input as a pattern (e.g. "model{suffix}.safetensors", "tf_model{suffix}.h5", "pytorch_model{suffix}.bin")
group tensors in shards
respect storage id if tensors have to be saved together (still to be done)
build index with metadata (total size) + weights_map
return shards (a list of state_dict) + index (a jsonable dict)
support for torch, tensorflow, numpy

Currently not in PR:

add framework to index (e.g. "pt")
provide filename for the index =>how?
save tensors to files =>do we want to provide a helper for that? Especially to save index in correct file + weights in a consistent way)
deserialize/load sharded model (will most probably never be done in huggingface_hub)

The current implementation is inspired by the torch implementation (see here). It support torch, tensorflow and numpy. This PR is still in draft so nothing is set in stone. In particular, depending on the scope we want, inputs and outputs can be adapted to be as user-friendly as possible (and still flexible).

Ping @amyeroberts @ArthurZucker @muellerzr on this (and please ping others if relevant).

HuggingFaceDocBuilderDev · 2023-12-22T14:27:00Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

muellerzr

This will be great to have inside huggingface_hub to make available for everyone rather than seperated. I think for the most part the implementations look similar to what we have in accelerate here.

cc @SunMarc if you can notice any differences or things we may struggle with?

On our end once this gets merge and we're sure it's 1->1 what we're doing we'll deprecate the util in Accelerate and rely on the Hub's version (esp relevant since it's a required dep of Accelerate now)

Wauplin · 2024-01-04T10:25:55Z

Linking internal slack thread discussing it (cc @pcuenca).

LysandreJik · 2024-01-24T15:08:43Z

Pretty excited by this PR! The current skeleton looks good to me. I'm wondering if it wouldn't make sense to make the addition of TensorT custom classes simpler by defining it as a class with the necessary overrideable methods, namely:

getting the storage IDs
getting the tensor sizes
anything else that might be worthwhile

and then we'd start with the definition of torch/tf/numpy methods but adding extras will therefore be super simple as long as those three methods are implemented.

Wauplin · 2024-01-24T15:45:30Z

Thanks for the review and the idea @LysandreJik! Will have a look on how I could make the framework-specific stuff simpler 👍

Wauplin · 2024-02-14T18:07:51Z

@LysandreJik I switched to a more functional programming design (which should be easier to test and maintain as you suggested). I realized that we don't need to test if the tensor if from tensorflow, numpy or torch each time we handle a new one. Instead I'm defining 1 method per framework and it's at the user discretion to use the correct one. I don't see a situation where a user doesn't know which type of tensor they are using. WDYT of the current design?

If that's fine, I'll clean this up, add some tests and document it a bit.

codecov · 2024-02-14T18:14:33Z

Codecov Report

Attention: 48 lines in your changes are missing coverage. Please review.

Comparison is base (d01206d) 82.22% compared to head (586b8d8) 80.29%.
Report is 2 commits behind head on main.

❗ Current head 586b8d8 differs from pull request most recent head f9c5057. Consider uploading reports for the commit f9c5057 to get more accurate results

Files	Patch %	Lines
src/huggingface_hub/serialization/_torch.py	30.76%	36 Missing ⚠️
src/huggingface_hub/serialization/_tensorflow.py	80.00%	4 Missing ⚠️
src/huggingface_hub/utils/_runtime.py	73.33%	4 Missing ⚠️
src/huggingface_hub/serialization/_base.py	96.96%	2 Missing ⚠️
src/huggingface_hub/serialization/_numpy.py	77.77%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1938      +/-   ##
==========================================
- Coverage   82.22%   80.29%   -1.93%     
==========================================
  Files          66       71       +5     
  Lines        8309     8461     +152     
==========================================
- Hits         6832     6794      -38     
- Misses       1477     1667     +190

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

LysandreJik · 2024-02-16T10:19:20Z

Yes, the current design looks great to me!

Wauplin · 2024-02-16T16:11:07Z

Thanks @LysandreJik, PR is now ready to be reviewed. I have added an example on how to use it with torch. We could add a "save" method for each framework that saves the state dict to files but that will be done in a follow-up PR.

pcuenca

Looks very clean to me :)

src/huggingface_hub/serialization/_base.py

src/huggingface_hub/serialization/_numpy.py

src/huggingface_hub/serialization/_tensorflow.py

Wauplin · 2024-02-19T14:29:13Z

Thanks for the thorough review @pcuenca! I have addressed or replied to all of your comments :)

LysandreJik · 2024-02-22T10:03:10Z

cc @mfuntowicz as well as we discussed it a while ago

LysandreJik

Looks good to me! It would be awesome to have docs for this somewhere.

Left a few suggestions and comments.

src/huggingface_hub/serialization/_base.py

src/huggingface_hub/serialization/_torch.py

src/huggingface_hub/serialization/_base.py

src/huggingface_hub/serialization/_tensorflow.py

tests/test_serialization.py

Wauplin · 2024-02-22T15:24:15Z

Thanks for the thorough review! Made the suggested changes and now waiting for the CI to complete before merging this stuff :)

Added them to the reference package under a "serialization" page that is meant to grow when adding the "save tensors" part. Let's start with that and reassess :) https://moon-ci-docs.huggingface.co/docs/huggingface_hub/pr_1938/en/package_reference/serialization

julien-c

i'm late to the party, but the filename convention is the same as the one we've been using in the existing sharding? (I don't remember if the existing sharding is implemented in transformers or in safetensors?)

Wauplin · 2024-02-23T16:31:13Z

Ends up with "model-00001-of-00002.safetensors", "model-00002-of-00002.safetensors", etc... which is the one defined in transformers (not defined in safetensors itself).
See https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/tree/main for example.

Draft for a framework-agnostic 'split_state_dict_into_shards' helper

777b2a8

muellerzr reviewed Dec 22, 2023

View reviewed changes

Wauplin mentioned this pull request Feb 6, 2024

chore(mlx-lm): add model weight index in save_weights ml-explore/mlx-examples#413

Merged

Wauplin added 3 commits February 13, 2024 14:47

Merge branch 'main' into add-helper-to-shard-model

7ddfe72

Merge branch 'main' into add-helper-to-shard-model

dd3a9d2

refactor

351a5ee

Wauplin changed the title ~~[RfC] Draft for a framework-agnostic split_state_dict_into_shards helper~~ Framework-agnostic split_state_dict_into_shards helper Feb 14, 2024

mee

ec50ac9

Wauplin added 3 commits February 16, 2024 15:13

Merge branch 'main' into add-helper-to-shard-model

56b3a2b

Fix storage id stuff + add tests

b58aa6c

add example

8f5f7a5

Wauplin marked this pull request as ready for review February 16, 2024 16:08

Wauplin requested review from muellerzr, LysandreJik, amyeroberts and ArthurZucker February 16, 2024 16:08

pcuenca reviewed Feb 16, 2024

View reviewed changes

Wauplin added 2 commits February 19, 2024 15:10

docstrings

5861f89

fix tf

7217f80

Wauplin added 2 commits February 20, 2024 14:19

Merge branch 'main' into add-helper-to-shard-model

8160d13

deterministic tests

bb8f593

LysandreJik approved these changes Feb 22, 2024

View reviewed changes

Wauplin added 7 commits February 22, 2024 15:52

5GB shards

f4d94bc

import torch

2f38490

rename to split_state_dict_into_shards_factory

b76583d

fix

586b8d8

docs/source/en/package_reference/serialization.md

f9c5057

with docs

d77c1cf

Merge branch 'main' into add-helper-to-shard-model

2c33aaa

Wauplin merged commit ae3c4a0 into main Feb 22, 2024
16 checks passed

Wauplin deleted the add-helper-to-shard-model branch February 22, 2024 15:24

julien-c reviewed Feb 23, 2024

View reviewed changes

This was referenced Feb 26, 2024

Use safetensors by default for PyTorchModelHubMixin class #1989

Closed

Implement save_state_dict and load_state_dict in serialization module #2065

Open

SunMarc mentioned this pull request May 14, 2024

[Core] support saving and loading of sharded checkpoints huggingface/diffusers#7830

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Framework-agnostic `split_state_dict_into_shards` helper #1938

Framework-agnostic `split_state_dict_into_shards` helper #1938

Wauplin commented Dec 22, 2023 •

edited

HuggingFaceDocBuilderDev commented Dec 22, 2023

muellerzr left a comment •

edited

Wauplin commented Jan 4, 2024

LysandreJik commented Jan 24, 2024

Wauplin commented Jan 24, 2024

Wauplin commented Feb 14, 2024

codecov bot commented Feb 14, 2024 •

edited

LysandreJik commented Feb 16, 2024

Wauplin commented Feb 16, 2024 •

edited

pcuenca left a comment

Wauplin commented Feb 19, 2024

LysandreJik commented Feb 22, 2024

LysandreJik left a comment

Wauplin commented Feb 22, 2024

julien-c left a comment

Wauplin commented Feb 23, 2024

Framework-agnostic split_state_dict_into_shards helper #1938

Framework-agnostic split_state_dict_into_shards helper #1938

Conversation

Wauplin commented Dec 22, 2023 • edited

HuggingFaceDocBuilderDev commented Dec 22, 2023

muellerzr left a comment • edited

Choose a reason for hiding this comment

Wauplin commented Jan 4, 2024

LysandreJik commented Jan 24, 2024

Wauplin commented Jan 24, 2024

Wauplin commented Feb 14, 2024

codecov bot commented Feb 14, 2024 • edited

Codecov Report

LysandreJik commented Feb 16, 2024

Wauplin commented Feb 16, 2024 • edited

pcuenca left a comment

Choose a reason for hiding this comment

Wauplin commented Feb 19, 2024

LysandreJik commented Feb 22, 2024

LysandreJik left a comment

Choose a reason for hiding this comment

Wauplin commented Feb 22, 2024

julien-c left a comment

Choose a reason for hiding this comment

Wauplin commented Feb 23, 2024

Framework-agnostic `split_state_dict_into_shards` helper #1938

Framework-agnostic `split_state_dict_into_shards` helper #1938

Wauplin commented Dec 22, 2023 •

edited

muellerzr left a comment •

edited

codecov bot commented Feb 14, 2024 •

edited

Wauplin commented Feb 16, 2024 •

edited