You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
When I OCR a PDF, I would like to be able to open the PDF and see the OCRed text as a hidden layer.
Describe the solution you'd like
I would like to have an option to output a new PDF file after the "partition" method that will be the original + a hidden text layer of the OCR text.
Hi @punjabdhaputar - could you describe the use case you have in mind for this feature? And do I understand correctly that your proposed solution would output a new PDF rather than a list of Element objects?
Actually I am thinking about another optional argument to the "partition" function like the following:
from unstructured.partition.auto import partition
elements = partition("my_pdf.pdf", path_for_ocr_pdf="ocr_pdf.pdf")
Where the partition function would write out a new PDF with the hidden text OCR layer to "ocr_pdf.pdf".
The use-case I have is to be able to view the PDF with the text layer and be able to highlight specific text (e.g. a small phrase, subset of the previous chunks generated).
Thanks @punjabdhaputar ! Definitely see the use case there. Writing to PDF is outside the scope of what we'd like to do within the partition functions themselves. If you wanted to contribute an elements_to_pdf similar to elements_to_json though we'd be happy to consider that, as long as it doesn't introduce new dependencies.
Is your feature request related to a problem? Please describe.
When I OCR a PDF, I would like to be able to open the PDF and see the OCRed text as a hidden layer.
Describe the solution you'd like
I would like to have an option to output a new PDF file after the "partition" method that will be the original + a hidden text layer of the OCR text.
Additional context
Slack Thread: https://unstructuredw-kbe4326.slack.com/archives/C044N0YV08G/p1715109355171469
The text was updated successfully, but these errors were encountered: