Feature Request: Automatically rewrite URLs to use alternative frontends for difficult-to-archive sites (e.g. using benbusby/farside) #1319
Labels
expected: maybe someday
size: hard
status: idea-phase
Work is tentatively approved and is being planned / laid out, but is not ready to be implemented yet
touches: API/CLI/user interface
touches: configuration
touches: data/schema/architecture
type: enhancement
why: functionality
Intended to improve ArchiveBox functionality or features
Milestone
Type
What is the problem that your feature request solves
Sites like Facebook, Instagram, Twitter, Tiktok, etc. are difficult to archive and frequently block bot traffic or require logged-in sessions to simply view content.
Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes
Many alternative frontends exist that display social media content with less clutter and in a more easily archivable way. e.g.
twitter.com/ArchiveBoxApp
->nitter.net/ArchiveBoxApp
ArchiveBox should be configurable to rewrite sites the user chooses to use alternative frontends.
Ideally it should be a general solution to URL rewriting and cleanup that can take over from URL_ALLOWLIST/DENYLIST and also handle merging duplicate URLs.
What hacks or alternative solutions have you tried to solve the problem?
Manually replacing URL fragments before piping them in to archivebox:
How badly do you want this new feature?
The text was updated successfully, but these errors were encountered: