Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wikitext-103 URL is down #2255

Open
albertz opened this issue Apr 9, 2024 · 3 comments
Open

Wikitext-103 URL is down #2255

albertz opened this issue Apr 9, 2024 · 3 comments

Comments

@albertz
Copy link

albertz commented Apr 9, 2024

URL = "https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip"

All links to https://s3.amazonaws.com/research.metamind.io are not working anymore. I get "Access Denied".

@albertz
Copy link
Author

albertz commented Apr 9, 2024

For reference, one copy I found is via pardata:
https://github.com/CODAIT/pardata/blob/1d1600ad3eed6894da7dbddc451cd38aa03c770c/tests/schemata/datasets.yaml#L42C21-L42C99
But it's not exactly the same file (tar.gz instead of zip), but it looks like it has the same content (the files: LICENSE.txt README.txt wiki.test.tokens wiki.train.tokens wiki.valid.tokens).

Another copy of the data is on HuggingFace in various forms, for example: https://huggingface.co/datasets/wikitext

@codes1gn
Copy link

Hi Albertz, I faced exactly same issue on torchtext 0.17.2. Have you got a neat solution to this issue? I found datasets from other sources may need adaption 1by1.

@albertz
Copy link
Author

albertz commented May 13, 2024

I did not found the zip files anywhere. But I was using the tar.gz files instead which I linked above, which seem to contain the same content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants