Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for PDF receipts #32

Open
bram-atmire opened this issue Oct 26, 2019 · 1 comment
Open

Support for PDF receipts #32

bram-atmire opened this issue Oct 26, 2019 · 1 comment

Comments

@bram-atmire
Copy link
Contributor

Not sure if this use case is shared among others: I use Scanbot to scan my receipts as multi-page PDFs. Would be great if this tool could work on these pdfs.

Scanbot does a sort of OCR itself, but it doesn't seem to be that good, in the sense that it adds too much noise: a receipt contains so much text, and I'm only interested in the articles, price per article, to see price evolution across multiple weeks.

@bram-atmire bram-atmire changed the title Support for PDF Support for PDF receipts Oct 26, 2019
@mre
Copy link
Member

mre commented Oct 26, 2019

Your use-case makes a lot of sense to me.
We could use pdf2image as a preprocessor before recognizing the text. I think that would be the easiest thing to try.
Alternatively you could try OCRmyPDF to see if it works with your inputs out of the box.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants