Fix online dump downloads for the various modes and Closes #869 #872
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It seems that there were changes in the way dumps are structured for wikidata making the way dump files were automatically downloaded not functional anymore.
Here are the fixes implemented, and the reasoning behind it :
DAILY : update the check for an available dump as status.txt now indicated "done:all". See : https://dumps.wikimedia.org/other/incr/wikidatawiki/20240419/status.txt I put startsWith but it might be preferable to check for done:all equality ? I don't know the various possible values but this version should work at least most of the time (compared to none right now).
JSON : The JSON dumps are currently downloaded from this folder : https://dumps.wikimedia.org/other/wikidata/ It is a bit of a mess including both full and incremental (?) dumps. I switch it to use https://dumps.wikimedia.org/wikidatawiki/entities/ (which seems to be equivalent to https://dumps.wikimedia.org/other/wikibase/wikidatawiki/, I don't know if anyone has any insight into which link should be preferred). Moreover, not all shown dates feature the full json dump so I added a check that the file is actually there (I have not found a status.txt or equivalent for the entitity dump)
SITES : Nothing done, haven't tested it but the file it is looking for is still there so it probably still works.
CURRENT / FULL : Both of these look for a -pages-meta-current.xml.bz2 / -pages-meta-history.xml.bz2 inside a dump (like https://dumps.wikimedia.org/wikidatawiki/20240401/). However these files do not seem to be there. Nevertheless, parts of these file ARE present (of form https://dumps.wikimedia.org/wikidatawiki/20240401/wikidatawiki-20240401-pages-meta-current1.xml-p1p441397.bz2 for example) but not the united file. If anyone knows why the full file isn't there I could try to fix the issue