Support for PDF receipts #32

bram-atmire · 2019-10-26T13:39:23Z

Not sure if this use case is shared among others: I use Scanbot to scan my receipts as multi-page PDFs. Would be great if this tool could work on these pdfs.

Scanbot does a sort of OCR itself, but it doesn't seem to be that good, in the sense that it adds too much noise: a receipt contains so much text, and I'm only interested in the articles, price per article, to see price evolution across multiple weeks.

mre · 2019-10-26T21:18:09Z

Your use-case makes a lot of sense to me.
We could use pdf2image as a preprocessor before recognizing the text. I think that would be the easiest thing to try.
Alternatively you could try OCRmyPDF to see if it works with your inputs out of the box.

bram-atmire changed the title ~~Support for PDF~~ Support for PDF receipts Oct 26, 2019

mre added enhancement help wanted labels Nov 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for PDF receipts #32

Support for PDF receipts #32

bram-atmire commented Oct 26, 2019

mre commented Oct 26, 2019

Support for PDF receipts #32

Support for PDF receipts #32

Comments

bram-atmire commented Oct 26, 2019

mre commented Oct 26, 2019