Power Users Can Use Redis to Filter Already Downloaded URLs, Use HashSet for URL Matching From File #1914
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Category
This change is exactly one of the following (please change
[ ]
to[x]
) to indicate which:Description
This feature adds support to use Redis as the mechanism for skipping already downloaded URLs. If you use RipMe over a longer period of time, to download many, many galleries and albums, the url_history.txt file gets quite large. Doing an O(n) scan through the entire list for every URL in a job becomes VERY expensive. My own url_history.txt file is approaching 3 million lines and 130 MB. Using Redis speeds up the ripping process considerably AND allows power users the ability to coordinate jobs running across multiple machines on a network.
Users can optionally add the following lines to the rip.properties file:
If users do not add this configuration, the URL matching algorithm now uses a HashSet. This is memory intensive, but performs faster than the sequential scan.
Note: RipMe will continue to append new lines to the url_history.txt file since this operation does not seem to slow down the job (...at least at the scales that I have encountered)
Note 2: The easiest way to run redis locally is to use docker (Something like
docker run --name my-redis -d -p 6379:6379 redis
). Alternatively you could download and install redis for your OS.Testing
Required verification:
mvn test
(there are no new failures or errors).Optional but recommended: