GitHub - zephyrfalcon/magicripper2: Extract Magic the Gathering card info from Gatherer.

zephyrfalcon / magicripper2 Public

Notifications You must be signed in to change notification settings
Fork 3
Star 12

Extract Magic the Gathering card info from Gatherer.

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.gitignore		.gitignore
BUGS		BUGS
README		README
TODO		TODO
VERSIONS		VERSIONS
cardinfo.py		cardinfo.py
gen_xml.py		gen_xml.py
grab_html.py		grab_html.py
grab_images.py		grab_images.py
sanity.py		sanity.py
scan_set.py		scan_set.py
sets.py		sets.py
special.py		special.py
symbols.py		symbols.py
tools.py		tools.py
xmltools.py		xmltools.py
zip-all		zip-all

Repository files navigation

**NOTE**: I am no longer maintaining this. It doesn't seem to be worth the
trouble, because (1) the process is fraught with errors and problems (see
https://news.ycombinator.com/item?id=6300079 for other people's attempts, and
what they found) and (2) there is another project that does the same,
extracting to JSON, and it seems up-to-date: http://mtgjson.com/. I recommend
using this.

-----

This is MagicRipper2, a collection of scripts to extract card info from
Gatherer, the official Magic the Gathering card database.

Requirements:
- Python 2.5+ (but not 3.x)
- BeautifulSoup [http://www.crummy.com/software/BeautifulSoup/]

In short, MagicRipper2 works by extracting the "multiverse" ids of cards in a
certain set, retrieving the HTML for those cards (and storing it locally), and
generating XML based on card data found in those HTML files.

More documentation will be added later (famous last words...)  For now, this
is a quick summary of how to use it:

(In the following, FOO is a code for an expansion set, e.g. ALA for Shards of
Alara, etc. These codes are used by Gatherer. See sets.py.)

$ python scan_set.py FOO

=> produces ids/FOO.txt, a text file with a list of multiverse ids

$ python grab_html.py FOO

=> reads ids/FOO.txt and grabs the HTML for those cards from Gatherer,
producing a directory html/FOO with two files for each card, one with the
original card data, one with updated "Oracle" data.

$ python gen_xml.py FOO

=> reads the HTML in html/FOO, extracts card data, and writes them to an XML
file (xml/FOO.xml).