Skip to content
/ TED Public

TED (TED Enhances Digitization) - Software to facilitate OCR on incunables.

License

Notifications You must be signed in to change notification settings

janwieners/TED

Repository files navigation

TED (TED Enhances Digitization)

I developed TED for my 2008 submitted Magister Artium thesis "Zur Erweiterungsfähigkeit bestehender OCR Verfahren auf den Bereich extrem früher Drucke" in which I facilitated Optical Character Recognition (OCR) on the digital images of incunables from the project "Verteilte Digitale Inkunabelbibliothek".

The character recognition process is based on a Self Organizing Map (SOM / Kohonen-Map) which works with digital images, intensively prepared by the following operations:

  • Image conversion
  • Binarization (many different algorithms: simple binarization by threshold to Otsu's Method)
  • Median and kFill filtering
  • Automatically cutting and deskewing of the image
  • Edge detection
  • Object / glyph isolation and recognition
  • Clustering of isolated glyphs with self organizing map

About

TED (TED Enhances Digitization) - Software to facilitate OCR on incunables.

Resources

License

Stars

Watchers

Forks

Packages

No packages published