[Question] OCR #397

xuzeyu91 · 2024-04-10T04:06:45Z

Context / Scenario

I referred to this example and wrote an implementation of OCR. Attempting to scan PDF and PDF containing images did not trigger it. I'm not sure if there was anything wrong with the operation

Question

I referred to this example and wrote an implementation of OCR. Attempting to scan PDF and PDF containing images did not trigger it. I'm not sure if there was anything wrong with the operation

lecramr · 2024-04-12T11:37:46Z

Looks like this is currently not possible, see code:
https://github.com/microsoft/kernel-memory/blob/main/service/Core/DataFormats/Pdf/PdfDecoder.cs

Altough we already have (https://github.com/microsoft/kernel-memory/blob/main/service/Abstractions/DataFormats/IOcrEngine.cs) in place, which would be enough for simple text extraction, and UglyToad.PdfPig is able to extract images as experimental feature.

@dluc Wouldn't it be possible to extend "FileContent" with a Array of found Images in the PDF described GPT-4 Vision Api if enabled?

marcominerva · 2024-04-12T11:47:40Z

I think that you can support this scenario when the issue #379 will be completed (currently there is a PR in preview).

With that, you will be able to inject a custom decoder for PDF files.

dluc · 2024-04-16T00:54:41Z

Given that now custom content decoders can be injected, I would first try creating one that replaces the default PDF decoder, and internally does all the work of extracting text and text from images. E.g. you can create a decoder that depends on the existing image decoder to parse images, and return all the text at the end, without the need to revisit the FileContent class (for now).

xuzeyu91 added the question Further information is requested label Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] OCR #397

[Question] OCR #397

xuzeyu91 commented Apr 10, 2024

lecramr commented Apr 12, 2024

marcominerva commented Apr 12, 2024

dluc commented Apr 16, 2024

[Question] OCR #397

[Question] OCR #397

Comments

xuzeyu91 commented Apr 10, 2024

Context / Scenario

Question

lecramr commented Apr 12, 2024

marcominerva commented Apr 12, 2024

dluc commented Apr 16, 2024