Skip to content

Tool for extracting external links of a URL from Internet Archive snapshots

License

Notifications You must be signed in to change notification settings

omilab/internet-archive-link-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Internet Archive Link Extractor

This tool was built to extract external links of a website snapshots in the Internet Archive. The output can be used to perform link analysis on website.

Preparations

  1. Download or clone the project.
  2. Install requirements
pip install -r requirements.txt

Usage

Create file with list of URLs, each URL in new line. Then run the command:

python link_extractor.py -i filename

To get help about the optional parameters run:

python link_extractor.py -h

Output Format

The format of the output file is JSON. Each line in the output file represents one URL from the input file and all the external links that found in each snapshot.

About

Tool for extracting external links of a URL from Internet Archive snapshots

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages