Project details for The GIDOC prototype

Screenshot The GIDOC prototype 1.1

by nserrano - April 9, 2010, 12:57:00 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ]

view (3 today), download ( 0 today ), 0 subscriptions


Transcription of handwritten text in (old) documents is an important, time-consuming task for digital libraries. It might be carried out by first processing all document images off-line, and then manually supervising system transcriptions to edit incorrect parts. However, current techniques for automatic page layout analysis, text line detection and handwriting recognition are still far from perfect, and thus post-editing system output is not clearly better than simply ignoring it.

A more effective approach to transcribe old text documents is to follow an interactive- predictive paradigm in which both, the system is guided by the user, and the user is assisted by the system to complete the transcription task as efficiently as possible. Following this approach, a system prototype called GIDOC (Gimp-based Interactive transcription of old text DOCuments) has been developed to provide user-friendly, integrated support for interactive-predictive layout analysis, line detection and handwriting transcription.

GIDOC is designed to work with (large) collections of homogeneous documents, that is, of similar structure and writing styles. They are annotated sequentially, by (par- tially) supervising hypotheses drawn from statistical models that are constantly updated with an increasing number of available annotated documents. And this is done at different annotation levels. For instance, at the level of page layout analysis, GIDOC uses a novel text block detection method in which conventional, memoryless techniques are improved with a “history” model of text block positions. Similarly, at the level of text line image transcription, GIDOC includes a handwriting recognizer which is steadily improved with a growing number of (partially) supervised transcriptions.

Changes to previous version:

Updated version for mloss 2010

BibTeX Entry: Download
Corresponding Paper BibTeX Entry: Download
Supported Operating Systems: Linux, Mac Os X
Data Formats: Xcf
Tags: Machine Learning, Icml2010, Handwriting Recognition, Language Modelling, Pattern Recognition
Archive: download here


No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.