facebook_scrapper

Web scraper for facebook posts from groups and pages using graph API.

This is a port of facebook page post scrapper from minimaxir.

I've been using this script for a while and been patching things to fit my use case. It ended up beeing a mutant version of the original, so I've made a careless refactoring recently and decided to put on my github.

Also added some extras like e-mail notifications since I run this script overnight/ overweekend on aws. To use e-mail notifications you need a less secure apps option enabled gmail account.

Requires a developer API key and secret from facebook, edit in config.json with your own credentials. If you want to use the e-mail notification, edit in your gmail account and password in the same config file.

I've decided to leave the original comments as is, but added some extra information.

All credit belongs to minimaxir.

example use:

Scraping a single page.

python run.py -i Facebook

Feeding a file of group names seperated by new line.

python run.py -f group_names.txt -t group

see python run.py --help for more information on arguments:

python run.py --help
usage: run.py [-h] [-t {page,group}] [-i ID] [-f FILE] [-s STARTDATE]
              [-e ENDDATE] [-l LIMIT] [-o OUTPATH]

Scrapes posts from facebook pages or groups within a time delta. Required
arguments are: -i page or group id or -r id file url.

optional arguments:
  -h, --help            show this help message and exit
  -t {page,group}, --type {page,group}
                        Type of the target site, Default: page
  -i ID, --id ID        Target ID, string if page, decimal if group
  -f FILE, --file FILE  Read from a text file. Where target ID's are seperated
                        by new line.
  -s STARTDATE, --startDate STARTDATE
                        Starting date for the interval where posts will be
                        scraped in, formatted as YYYY-MM-DD. Default:
                        2016-02-24
  -e ENDDATE, --endDate ENDDATE
                        End date for the interval where posts will be scraped
                        in, formatted as YYYY-MM-DD. Default: datetime.now
  -l LIMIT, --limit LIMIT
                        Max number of statuses to parse per id. Needs to be in
                        intervals of 100 for ease of use. Default: 500,000
  -o OUTPATH, --outPath OUTPATH
                        Output directory to save the resulting csv. Default is
                        out/pages or out/groups.
  -n NOTIFICATIONTARGET, --notificationTarget NOTIFICATIONTARGET
                        Target e-mail address to notify when script finished
                        running.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
out		out
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.json		config.json
group_posts.py		group_posts.py
page_posts.py		page_posts.py
run.py		run.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

out

out

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

config.json

config.json

group_posts.py

group_posts.py

page_posts.py

page_posts.py

run.py

run.py

utils.py

utils.py

Repository files navigation

facebook_scrapper

About

Releases

Packages

Languages

License

umutto/facebook_scrapper

Folders and files

Latest commit

History

Repository files navigation

facebook_scrapper

About

Topics

Resources

License

Stars

Watchers

Forks

Languages