Replies: 1 comment 4 replies
-
It is rather unlikely that we will be able to help you with this unless you provide a corresponding PDF file with an offending page for further analysis. |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a pretty generic script designed to extract text from PDFs I receive. Recently, however, two PDFs failed this process. When I tried debugging it, none of the text
page.extract_text()
was returning matched what I could actually see in the document itself.Digging deeper, I discovered three base fonts, which were also embedded:
These fonts are different from what I've had to work with otherwise and I would assume that this is what resulted in the errors. However,
I'm unable to find any references to these fonts online and I'm not sure how to deal with this issue as I'm not particularly well-versed with the PDF standard. What steps should I take to debug this further and how could I resolve this issue?
Beta Was this translation helpful? Give feedback.
All reactions