-
Notifications
You must be signed in to change notification settings - Fork 9.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recognize japanese symbols in two screenshots #4102
Comments
Tesseract's layout analysis was designed to deal with simple layouts of books, magazines, newspapers and documents. For any image that Tesseract completely fails to recognize, or fails to recognize some areas in the image, it is recommended to use a different tool to clean the image for Tesseract and make it easier for Tesseract to recognize text. https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html In your case, you should give Tesseract just the letters without the frame around them. |
Did you try with different psm values? |
Still no luck, but Google Lens finds it fine: |
Current Behavior
Recognize the symbols.
Expected Behavior
Recognize the symbols in these two screenshots.
Original pictures from Dragon Ball episode 1:
After some perspective correction (maybe helps?):
Suggested Fix
Recognize the symbols.
tesseract -v
tesseract 5.3.2
leptonica-1.82.0
libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 2.1.5.1) : libpng 1.6.40 : libtiff 4.5.1 : zlib 1.2.11 : libwebp 1.3.1 : libopenjp2 2.5.0
Found NEON
Found libarchive 3.6.2 zlib/1.2.11 liblzma/5.4.1 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.4
Found libcurl/7.88.1 SecureTransport (LibreSSL/3.3.6) zlib/1.2.11 nghttp2/1.51.0
Operating System
macOS 13 Ventura
Other Operating System
No response
uname -a
No response
Compiler
No response
CPU
No response
Virtualization / Containers
No response
Other Information
No response
The text was updated successfully, but these errors were encountered: