<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>mloss.org The GIDOC prototype</title><link>http://mloss.org</link><description>Updates and additions to The GIDOC prototype</description><language>en</language><lastBuildDate>Fri, 09 Apr 2010 12:57:00 -0000</lastBuildDate><item><title>The GIDOC prototype 1.1</title><link>http://mloss.org/software/view/245/</link><description>&lt;html&gt;&lt;p&gt;Transcription of handwritten text in (old) documents is an important, time-consuming
   task for digital libraries. It might be carried out by first processing all document images
   off-line, and then manually supervising system transcriptions to edit incorrect parts.
   However, current techniques for automatic page layout analysis, text line detection and
   handwriting recognition are still far from perfect, and thus post-editing system
   output is not clearly better than simply ignoring it.
&lt;/p&gt;
&lt;p&gt;A more effective approach to transcribe old text documents is to follow an interactive-
   predictive paradigm in which both, the system is guided by the user, and the user is
   assisted by the system to complete the transcription task as efficiently as possible.
   Following this approach, a system prototype called GIDOC (Gimp-based Interactive
   transcription of old text DOCuments) has been developed to provide user-friendly, integrated support for interactive-predictive layout analysis, line detection and handwriting
   transcription.
&lt;/p&gt;
&lt;p&gt;GIDOC is designed to work with (large) collections of homogeneous documents,
   that is, of similar structure and writing styles. They are annotated sequentially, by (par-
   tially) supervising hypotheses drawn from statistical models that are constantly updated
   with an increasing number of available annotated documents. And this is done at different annotation levels. For instance, at the level of page layout analysis, GIDOC uses
   a novel text block detection method in which conventional, memoryless techniques are
   improved with a “history” model of text block positions. Similarly, at the level
   of text line image transcription, GIDOC includes a handwriting recognizer which is
   steadily improved with a growing number of (partially) supervised transcriptions.
&lt;/p&gt;&lt;/html&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Nicolas Serrano, Daniel Perez, Lionel Tarazon, Oriol Ramos, Adria Gimenez, Jesus Andres, Alfons Juan</dc:creator><pubDate>Fri, 09 Apr 2010 12:57:00 -0000</pubDate><comments>http://mloss.org/software/rss/comments/245</comments><guid>http://mloss.org/software/view/245/</guid><category>machine learning</category><category>icml2010</category><category>handwriting recognition</category><category>language modelling</category><category>pattern recognition</category></item></channel></rss>