Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

torchtext.datasets - requests.exceptions.ConnectionError #2196

Open
afurkank opened this issue Aug 4, 2023 · 2 comments
Open

torchtext.datasets - requests.exceptions.ConnectionError #2196

afurkank opened this issue Aug 4, 2023 · 2 comments

Comments

@afurkank
Copy link

afurkank commented Aug 4, 2023

馃悰 Bug

Description of the bug

When I try to use Multi30k dataset, I get this error:

requests.exceptions.ConnectionError:
This exception is thrown by __iter__ of HTTPReaderIterDataPipe(skip_on_error=False, source_datapipe=OnDiskCacheHolderIterDataPipe, timeout=None)

To Reproduce

from torchtext.datasets import Multi30k

SRC_LANGUAGE = 'de'
TGT_LANGUAGE = 'en'

train_iter = Multi30k(split='train', language_pair=(SRC_LANGUAGE, TGT_LANGUAGE))

next(iter(train_iter))

Expected behavior

Return a proper iterable where I can iterate over the dataset.

Environment

PyTorch version: 1.13.1+cpu
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Enterprise
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.10.9 | packaged by Anaconda, Inc. | (main, Mar 1 2023, 18:18:15) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22621-SP0
Is CUDA available: False
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: GPU 0: GeForce GTX 1650
Nvidia driver version: 442.23
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=2592
DeviceID=CPU0
Family=198
L2CacheSize=1536
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2592
Name=Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
ProcessorType=3
Revision=

Versions of relevant libraries:
[pip3] flake8==6.0.0
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.5
[pip3] numpydoc==1.5.0
[pip3] torch==1.13.1
[pip3] torchdata==0.5.1
[pip3] torchtext==0.14.1
[conda] Could not collect

Additional context

I've been running into issues with the Multi30K dataset for some time now. The issue that was occurring before was resolved by installing specific versions and combinations of the relevant torch libraries I specified. However, even this solution doesn't work anymore. Can you please fix what's broken with this cursed dataset?

Thank you.

@afurkank afurkank changed the title requests.exceptions.ConnectionError torchtext.datasets - requests.exceptions.ConnectionError Aug 4, 2023
@afurkank
Copy link
Author

afurkank commented Aug 4, 2023

I also tried this:

from torchtext.datasets import Multi30k
from torch.utils.data import DataLoader

datapipe = Multi30k(split='train', language_pair=('de', 'en'))

loader = DataLoader(datapipe, drop_last=True, shuffle=False)

next(iter(loader))

Now I get a different error:

Exception: Could not get the file at http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz. [RequestException] None.
This exception is thrown by __iter__ of HTTPReaderIterDataPipe(source_datapipe=OnDiskCacheHolderIterDataPipe, timeout=None)

Environment is the same. Same error occurs with DataLoader2 as well.

@YancyHuang123
Copy link

network error. Check Internet settings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants