Skip to content
This repository has been archived by the owner on Aug 10, 2020. It is now read-only.

trandoshan-io/crawler

Repository files navigation

crawler

Build Status Go Report Card Maintainability

Crawler is a Go written program designed to crawl website

features

  • use tor SOCKS proxy to crawl hidden services
  • fast, built using valyala/fasthttp (up to 10x faster than net/http)
  • extract both absolute and relative URLs
  • use scalable messaging protocol (nats)

how it work

  • The Crawler process connect to a nats server (specified by env variable NATS_URI) and set-up a subscriber for message with tag todoSubject
  • When an URL is received the crawler start crawling
  • When crawling is done, the crawler will publish content to nats server with subject contentSubject and found urls with subject doneSubject