Recognize japanese symbols in two screenshots #4102

superbonaci · 2023-07-19T09:52:34Z

Current Behavior

Recognize the symbols.

Expected Behavior

Recognize the symbols in these two screenshots.
Original pictures from Dragon Ball episode 1:

After some perspective correction (maybe helps?):

Suggested Fix

Recognize the symbols.

tesseract -v

tesseract 5.3.2
leptonica-1.82.0
libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 2.1.5.1) : libpng 1.6.40 : libtiff 4.5.1 : zlib 1.2.11 : libwebp 1.3.1 : libopenjp2 2.5.0
Found NEON
Found libarchive 3.6.2 zlib/1.2.11 liblzma/5.4.1 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.4
Found libcurl/7.88.1 SecureTransport (LibreSSL/3.3.6) zlib/1.2.11 nghttp2/1.51.0

Operating System

macOS 13 Ventura

Other Operating System

No response

uname -a

No response

Compiler

No response

CPU

No response

Virtualization / Containers

No response

Other Information

No response

amitdo · 2024-03-16T22:33:21Z

Tesseract's layout analysis was designed to deal with simple layouts of books, magazines, newspapers and documents.

For any image that Tesseract completely fails to recognize, or fails to recognize some areas in the image, it is recommended to use a different tool to clean the image for Tesseract and make it easier for Tesseract to recognize text.

https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html

In your case, you should give Tesseract just the letters without the frame around them.

superbonaci · 2024-03-17T12:03:22Z

No result either with the improved picture:

% tesseract -l jpn result.png result.txt
Empty page!!
Empty page!!
% tesseract -l script/Japanese result.png result.txt
Empty page!!
Empty page!!

amitdo · 2024-03-17T12:11:24Z

Did you try with different psm values?

superbonaci · 2024-03-17T15:56:18Z

Still no luck, but Google Lens finds it fine:
https://ja.wikipedia.org/wiki/%E5%80%92%E7%A6%8F

amitdo closed this as completed Mar 16, 2024

amitdo added the layout analysis label Mar 16, 2024

amitdo reopened this Mar 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recognize japanese symbols in two screenshots #4102

Recognize japanese symbols in two screenshots #4102

superbonaci commented Jul 19, 2023 •

edited

amitdo commented Mar 16, 2024 •

edited

superbonaci commented Mar 17, 2024

amitdo commented Mar 17, 2024

superbonaci commented Mar 17, 2024 •

edited

Recognize japanese symbols in two screenshots #4102

Recognize japanese symbols in two screenshots #4102

Comments

superbonaci commented Jul 19, 2023 • edited

Current Behavior

Expected Behavior

Suggested Fix

tesseract -v

Operating System

Other Operating System

uname -a

Compiler

CPU

Virtualization / Containers

Other Information

amitdo commented Mar 16, 2024 • edited

superbonaci commented Mar 17, 2024

amitdo commented Mar 17, 2024

superbonaci commented Mar 17, 2024 • edited

superbonaci commented Jul 19, 2023 •

edited

amitdo commented Mar 16, 2024 •

edited

superbonaci commented Mar 17, 2024 •

edited