Skip to content

PaquitoelChocolatero/HFCrypterAnalysis

Repository files navigation

HFCrypterAnalysis

Needed scripts to crawl and scrape Hackforums along with some analysis perfomed in the paper.

Below are the needed steps and cofiguration before running the crawler and how to do it.

Preparing the environment

Dependencies

Operating system

sudo pacman -S tor torsocks privoxy sqlite3 tesseract

Python

pip install -r requirements.txt

Playwright

# Once inside the virtual environment
playwright install

Adding tor to python virtual environment

Add the following lines to start and end tor and privoxy service along the environment

# Add to the end of the activation file
sudo systemctl start tor
sudo systemctl start privoxy


# Add to the end of the deactivate function
sudo systemctl stop tor
sudo systemctl stop privoxy

Privoxy config to use tor

# /etc/privoxy/config
forward-socks5t / 127.0.0.1:9050 .
keep-alive-timeout 600
default-server-timeout 600
socket-timeout 600

Config tor

tor --hash-password <PASS>

# En /etc/tor/torrc
ControlPort 9051
HashedControlPassword <GENERATEDHASH>

Test tor config

# Normal IP
curl http://ifconfig.me

# IP through tor
torify curl http://ifconfig.me
curl -x 127.0.0.1:8118 https://ifconfig.me

Stealthy Crawling using Scrapy, Tor and Privoxy
Tor installation and usage

Initialize the database

sqlite3 hackforums.db < create.sql

Run crawler

The files are in path: HFCrypterAnalysis/hackforums/hackforums/spiders

Before running the crawler create an account in scrapeops and add the aspi key to HFCrypterAnalysis/hackforums/hackforums/settings.py

First run the crawler of the marketplace thread list

# Inside HFCrypterAnalysis/hackforums
scrapy crawl hackforums --nolog

Antibot bypass for scrapy

Now you can run the scraper inside the threads, you need an account for this

# Inside HFCrypterAnalysis/hackforums
python3 hackforums/spiders/posts.py