You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)
Reproduction
I used the example on how to use Accelerate with FSDP via SPMD on TPU. When I did so, I ended up with an error raised from dataloader:
0%| | 0/100 [00:00<?, ?it/s]Exception in thread Thread-3 (_loader_worker):
Traceback (most recent call last):
File "/home/amoran/Dev/venv/pt23/lib/python3.10/site-packages/accelerate/data_loader.py", line 464, in __iter__
There seems to be not a single sample in your epoch_iterator, stopping training at step 0! This is expected if you're using an IterableDataset and set num_steps (100) higher than the number of available samples.
next_batch = next(dataloader_iter)
File "/home/amoran/Dev/venv/pt23/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in __next__
data = self._next_data()
File "/home/amoran/Dev/venv/pt23/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 674, in _next_data
{'train_runtime': 0.0043, 'train_samples_per_second': 0.0, 'train_steps_per_second': 23093.844, 'train_loss': 0.0, 'epoch': 0}
index = self._next_index() # may raise StopIteration
File "/home/amoran/Dev/venv/pt23/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 621, in _next_index
0%| | 0/100 [00:00<?, ?it/s] return next(self._sampler_iter) # may raise StopIteration
StopIteration
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
0%| | 0/100 [00:00<?, ?it/s] self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/home/amoran/Dev/venv/pt23/lib/python3.10/site-packages/torch_xla/distributed/parallel_loader.py", line 152, in _loader_worker
_, data = next(data_iter)
File "/home/amoran/Dev/venv/pt23/lib/python3.10/site-packages/accelerate/data_loader.py", line 472, in __iter__
yield current_batch
UnboundLocalError: local variable 'current_batch' referenced before assignment
For clarity, I used this script to reproduce the issue.
Any hint on how to fix this?
Expected behavior
I would expect the trainer to start tuning the model.
The text was updated successfully, but these errors were encountered:
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
I used the example on how to use Accelerate with FSDP via SPMD on TPU. When I did so, I ended up with an error raised from
dataloader
:For clarity, I used this script to reproduce the issue.
Any hint on how to fix this?
Expected behavior
I would expect the trainer to start tuning the model.
The text was updated successfully, but these errors were encountered: