Skip to content

robcolburn/console-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Console Crawler

A Node app to crawls a given web site.

npm install -g console-crawler;
console-crawler http://en.wikipedia.org/ --legs=8
console-crawler http://en.wikipedia.org/ --legs=2 --phantom

Quick Set-Up for dev

  1. This is a Node app, so you'll need node/npm to run it.
  2. Clone down the repo
  3. Install the dependencies npm install.
  4. Fire up the crawler.

Or, Copy-Paste


git clone https://github.com/robcolburn/console-crawler;
cd console-crawler;
npm install;
./console-crawler.js http://en.wikipedia.org/ --legs=8;

Notes

  1. On Mac, you'll likely need X-Code Command Line tools installed.

  2. If you'd like to use PhantomJS. You'll need to download PhatomJS, and install it separately since it has it's own binary.

  3. If you need target a different "Host", you may just need to edit your hosts file. For instance, say I wanted to hit 5.5.5.5, but with the host of example.com which isn't ready to go live just yet. I might add the following to my hosts file.

5.5.5.5 example.com

About

Uses the npm crawler module to crawl away

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published