Skip to content

jurriaan/docparser

Repository files navigation

DocParser

Gem Version Build status

DocParser is a web scraping/screen scraping tool.

You can use it to easily scrape information out of HTML documents.

The gem is called docparser. You can find the documentation here.

Features

  • XPath and CSS support through Nokogiri
  • Support for parallel processing of the documents
  • 6 Output formats:
    • CSV
    • XLSX
    • HTML
    • YAML
    • JSON
    • Screen (for debugging and development)
    • And more! (easy to extend)

Installation

Add this line to your application's Gemfile:

gem 'docparser'

And then execute:

bundle

Or install it yourself using:

gem install docparser

Usage

See example.rb

Todo

  • Better examples and documentation

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

Contributors

Thanks