Skip to content
/ undupify Public

[BETA] Heuristic-based tool aiming to remove most of unnecessary URLs in a file.

License

Notifications You must be signed in to change notification settings

Th0h0/undupify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 

Repository files navigation

TL;DR

Undupify allows to get rid of most of irrelevant and identical-in-behavior URLs in a file. Undupify incorporates itself really well in a hacking workflow where you would want to apply an additional layer of filtering to your URLs before sending them to a deep time-consuming vulnerability scan.


Demo

Mask group


Description

When searching vulnerabilities at scale, it is a very frequent practice to retrieve all URLs associated to a company, with tools such as waybackurls or gau, and then perform query-parameters-based filtering, looking for XSS, SQLi, SSRF, etc.

In this context, even though retrieved URLs have been processed by a first layer of filtering, a bunch of URLs would stil remain, and lots of them would be completely irrelevant by basically consisting of a subtle variations of others. Even though they would have some different path names or different parameters’s value, they would be processed by the exact same back-end function. When this happens, we of course don’t want to deal with them multiple times, as they would basically have the same behavior against fuzzing.

This is where Undupify becomes useful : based on heuristics, it attempts to efficiently distinguish which URLs are duplicates of others, and remove them.

To detect whether an analyzed URL is duplicate or unique, the tool currently relies on the following heuristics :

  • Heuristic 1 - If the analyzed URL has a hostname & port that have never been seen on previous URLS, then it should NOT be considered duplicate but unique.
  • Heuristic 2 - If the analyzed URL has the exact same paths and parameters, but not necessarily same parameters’ values, as a previously seen URL, then it should be considered duplicate.
  • Heuristic 3 - If the analyzed URL has the exact same content between its two first path, delimited by /, and the same parameters, as a previously seen URL, then it should be considered duplicate.

Usage

python3 undupify.py -h

This displays help for the tool.

usage: undupify.py [-h] [--file FILE] [--output]

options:
  -h, --help            show this help message and exit
  --file FILE, -f FILE  file containing all URLs to clean
  --output, -o          output file path

Basic use:

python3 undupify.py -f URLs_to_filter.txt

Installation

1 - Clone

git clone https://github.com/Th0h0/undupify.git

2 - Install requirements

cd undupify
pip install regex

License

Undupify is distributed under MIT License.

About

[BETA] Heuristic-based tool aiming to remove most of unnecessary URLs in a file.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages