Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

t5_demo can't retrieve CNNDM from drive.google; how to use local copy? #2264

Open
rbelew opened this issue May 10, 2024 · 0 comments
Open

t5_demo can't retrieve CNNDM from drive.google; how to use local copy? #2264

rbelew opened this issue May 10, 2024 · 0 comments

Comments

@rbelew
Copy link

rbelew commented May 10, 2024

🐛 Bug

Describe the bug A clear and concise description of what the bug is.

Following the t5_demo, but when it tries to access the CNN data at https://drive.google.com/uc?export=download&id=0BwmD_VLjROrfTHk4NFg2SndKcjQ

To Reproduce Steps to reproduce the behavior:

  1. Get notebook at t5_demo,

  2. Try to run it. It gets as far as batch = next(iter(cnndm_dataloader)) (https://pytorch.org/text/stable/tutorials/t5_demo.html#generate-summaries) where cnndm_datapipe = CNNDM(split="test") (https://pytorch.org/text/stable/tutorials/t5_demo.html#datasets)

  3. Get error like:

RuntimeError: Google drive link

https://drive.google.com/uc?export=download&id=0BwmD_VLjROrfTHk4NFg2SndKcjQ&confirm=t
internal error: headers don't contain content-disposition. This is
usually caused by using a sharing/viewing link instead of a download
link. Click 'Download' on the Google Drive page, which should
redirect you to a download page, and use the link of that page.

This exception is thrown by iter of
GDriveReaderDataPipe(skip_on_error=False,
source_datapipe=OnDiskCacheHolderIterDataPipe, timeout=None)

Expected behavior

Looking at others with similar error messages makes it seem like there is some timeout issue retrieving from drive.google? So I went and got the cnn_stories.tgz and dailymail_stories.tgz and unpacked them:

.
├── CNNDM
│   ├── cnn
│   │   └── stories
│   └── dailymail
│   └── stories

How can I modify the calls retrieve from my local cache?

Environment

% python collect_env.py
Collecting environment information...
PyTorch version: 2.1.0.post100
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 14.4.1 (arm64)
GCC version: Could not collect
Clang version: 15.0.0 (clang-1500.1.0.2.5)
CMake version: Could not collect
Libc version: N/A

Python version: 3.11.7 | packaged by conda-forge | (main, Dec 23 2023, 14:38:07) [Clang 16.0.6 ] (64-bit runtime)
Python platform: macOS-14.4.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1 Pro

Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.3
[pip3] torch==2.1.0.post100
[pip3] torchaudio==2.1.2
[pip3] torchdata==0.7.1
[pip3] torchtext==0.16.1
[pip3] torchvision==0.16.2
[conda] captum 0.7.0 0 pytorch
[conda] numpy 1.26.2 pypi_0 pypi
[conda] numpy-base 1.26.3 py311hfbfe69c_0
[conda] pytorch 2.1.0 gpu_mps_py311hf322ab5_100
[conda] torch 2.1.2 pypi_0 pypi
[conda] torchaudio 2.1.2 pypi_0 pypi
[conda] torchdata 0.7.1 pypi_0 pypi
[conda] torchtext 0.16.1 pypi_0 pypi
[conda] torchvision 0.16.2 pypi_0 pypi

Additional context Add any other context about the problem here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant