Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AcceleratorState object has no attribute distributed_type. #2786

Open
2 of 4 tasks
evelinamorim opened this issue May 16, 2024 · 2 comments
Open
2 of 4 tasks

AcceleratorState object has no attribute distributed_type. #2786

evelinamorim opened this issue May 16, 2024 · 2 comments

Comments

@evelinamorim
Copy link

System Info

accelerate-0.30.1
Google Colab
numpy-1.25.2
torch-2.2.1+cu121

Python 3.10.12

Regarding the accelerate configuration, I am using trainer which employs accelerate inside it, and I do not touch the configuration.

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • My own task or dataset (give details below)

Reproduction

The args.json file employed below is available to download at: https://drive.google.com/file/d/1H2MstSq_oz7Xv7spMZCppf39fHGs5rW0/view?usp=drive_link.

The dataset specified in the args.json is the file: https://drive.google.com/file/d/18OVilNSqQQogSMiepe87vtNmzYpCalCs/view?usp=drive_link

In Google Colab, I coded:

!git clone https://github.com/evelinamorim/Seq2seqCoref.git
!pip install -U transformers accelerate

import sys
sys.path.insert(1, "Seq2seqCoref")

from transformers import HfArgumentParser, set_seed
from transformers import AutoModelForSeq2SeqLM, \
    DataCollatorForSeq2Seq, AutoConfig, AutoTokenizer
from transformers.integrations import TensorBoardCallback

from arguments import DataArguments, ModelArguments, CorefTrainingArguments \
    as TrainingArguments
from constants import SPEAKER_START, SPEAKER_END, MENTION_START, MENTION_END, \
    COPY, CLUSTER_NEW, CLUSTERS, SENTENCE_START, SENTENCE_END, SPECIAL_IDS, \
    NON_INT_SPECIAL_IDS, MARK_SPECIAL_IDS, MENTION_END_NON_INT_SPECIAL_IDS, \
    MENTION_ENDS
from data import CorefDataset
from trainer import CorefTrainer
import os

parser = HfArgumentParser(
        (ModelArguments, DataArguments, TrainingArguments))
model_args, data_args, training_args = parser.parse_json_file(
        json_file=os.path.abspath("args.json"))

set_seed(training_args.seed)

# tokenizer setup
tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path)

num_new_tokens = tokenizer.add_tokens([SPEAKER_START, SPEAKER_END,
                                           MENTION_START, MENTION_END,
                                           COPY])
num_new_tokens += tokenizer.add_tokens([SENTENCE_START, SENTENCE_END])

# loading config and model
config = AutoConfig.from_pretrained(model_args.model_name_or_path)
model = AutoModelForSeq2SeqLM.from_pretrained(
        model_args.model_name_or_path, config=config)

# data objects
collator = DataCollatorForSeq2Seq(tokenizer, model=model)
train_set = CorefDataset(tokenizer, data_args, training_args, 'train')

tb_callback = TensorBoardCallback()
trainer = CorefTrainer(
        tokenizer=tokenizer,
        model=model,
        args=training_args,
        train_dataset=train_set,
        #        eval_dataset=dev_set,
        data_collator=collator,
        callbacks=[tb_callback]
    )

trainer.train()

The traceback error is:

AttributeError                            Traceback (most recent call last)
<ipython-input-16-3435b262f1ae> in <cell line: 1>()
----> 1 trainer.train()

5 frames
/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1857                 hf_hub_utils.enable_progress_bars()
   1858         else:
-> 1859             return inner_training_loop(
   1860                 args=args,
   1861                 resume_from_checkpoint=resume_from_checkpoint,

/content/Seq2seqCoref/trainer.py in _inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
    169         self._train_batch_size = batch_size
    170         # Data loader and number of training steps
--> 171         train_dataloader = self.get_train_dataloader()
    172 
    173         # Setting up training control variables:

/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in get_train_dataloader(self)
    877             dataloader_params["prefetch_factor"] = self.args.dataloader_prefetch_factor
    878 
--> 879         return self.accelerator.prepare(DataLoader(train_dataset, **dataloader_params))
    880 
    881     def _get_eval_sampler(self, eval_dataset: Dataset) -> Optional[torch.utils.data.Sampler]:

/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py in prepare(self, device_placement, *args)
   1246                 )
   1247 
-> 1248         if self.distributed_type == DistributedType.DEEPSPEED:
   1249             model_count = 0
   1250             for obj in args:

/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py in distributed_type(self)
    527     @property
    528     def distributed_type(self):
--> 529         return self.state.distributed_type
    530 
    531     @property

/usr/local/lib/python3.10/dist-packages/accelerate/state.py in __getattr__(self, name)
   1074         # so we just modify the error message
   1075         if name in self._known_attrs:
-> 1076             raise AttributeError(
   1077                 f"`AcceleratorState` object has no attribute `{name}`. "
   1078                 "This happens if `AcceleratorState._reset_state()` was called and "

AttributeError: `AcceleratorState` object has no attribute `distributed_type`. This happens if `AcceleratorState._reset_state()` was called and an `Accelerator` or `PartialState` was not reinitialized.

Expected behavior

To train the model at the end of the code.

@muellerzr
Copy link
Collaborator

What is CorefTrainer? Does it make an AcceleratorState or PartialState or something? As the error hints at, somewhere along the line the state was reset without then being called again

@evelinamorim
Copy link
Author

evelinamorim commented May 16, 2024

I am sorry I did not specify CorefTrainer. I am using a custom trainer (you can check in this link ).

This custom trainer is a subclass of the Seq2SeqTrainer. None of the implemented functions in the custom trainer reset AcceleratorState. I went through all the of Seq2SeqTrainer and Trainer, and I was only able to identify the method create_accelerator_and_postprocess that creates the accelerator for a trainer object. I do not know if I must provide some configuration to avoid this error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants