pimmer

Exploratory code for PDF image mining. A multi page PDF will be split and converted to jpeg files that are mined for illustrations and images. Baed on https://github.com/megloff1/image-mining with added PDF splitting, a simple GUI and queue management.

Install

Make sure you have Git and Docker with docker-compose installed.
Get the latest version of this repository: git clone --depth 1 https://github.com/peterk/pimmer.git.
Copy the example_env file to .env and edit settings.
Make sure you have a folder called data in the project root folder (jobs and resulting image files will end up here). You can map output to a different local folder for the worker in docker-compose.yml.
Run docker-compose up -d. Wait a minute until the queue and worker is up.

The service is now running on http://0.0.0.0:7777.

If you are planning on processing a large number of documents you can start more workers with docker-compose up -d --scale worker=5 and then post files with curl to the /process/ endpoint:

curl -v --silent -F "file=@testdata/hat_catalog.pdf" http://0.0.0.0:7777/process/

Please report bugs and feedback in the Github issue tracker.

Example result

A digitized hat catalog like this:

... can result in all the individual hat images:

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
testdata		testdata
web		web
worker		worker
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
env_example		env_example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

testdata

testdata

web

web

worker

worker

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

docker-compose.yml

docker-compose.yml

env_example

env_example

Repository files navigation

pimmer

Install

Example result

About

Releases

Packages

Languages

License

peterk/pimmer

Folders and files

Latest commit

History

Repository files navigation

pimmer

Install

Example result

About

Topics

Resources

License

Stars

Watchers

Forks

Languages