ML for Reading Order Detection in Document Image Understanding 49
continues on the next page, it is suggested to look for a text, such as ‘continued
on next page’.
The usage of linguistic information has also been proposed by Aiello et al.
[10], who described a document analysis system for logical labelling and read-
ing order extraction of broad classes of documents. Each document object is
described by means of both attributes (i.e., aspect ratio, area ratio, font size
ratio, font style, content size, number of lines) and spatial relations (defined as
extensions of Allen’s interval relations [11]). Only objects labelled with some
logical labels (title and body) are considered for reading order. More precisely,
two distinct reading orders are first detected for the document object types
Title and Body, and then they are combined using a Title-Body connection
rule. This rule connects one Title with the left-most top-most Body object, sit-
uated below the Title. Each reading order is determined in two steps. Initially,
spatial information on the document objects is exploited by a spatial reasoner
which solves a constraint-satisfaction problem, where constraints correspond
to general document encoding rules (e.g., “in the Western-culture, documents
are usually read top-bottom and left-right”). The output of the spatial rea-
soner is a (cyclic) graph where edges represent instances of the partial ordering
relation BeforeInReading. A reading order is then defined as a full path in this
graph, and is determined by means of an extension of a standard topological
sort [12]. Due to the generality of the document encoding rule used by the
spatial reasoner, it is likely that one obtains more than one reading order, es-
pecially for complex documents with many blocks. For this reason, a natural
language processor is used in the second step of the proposed method. The
goal is that of disambiguating between different reading orders on the basis
of textual information of logical objects. This step works by computing prob-
abilities of sequences of words obtained by joining document objects which
are candidates to be followed in reading. The best aspect of this work is the
generality of the approach due to the generality of the knowledge adopted in
reasoning.
Topological sorting is also exploited in the approach proposed by Breuel
[13]. In particular, reading order is defined the basis of text lines segments,
which are pairwise compared on the basis of four simple rules in order to de-
termine a partial order. Then a topological sorting algorithm is applied to find
at least one global order consistent with this partial order. Columns, para-
graphs, and other layout features are determined on the basis of the spatial
arrangement of text line segments in reading order. For instance, paragraph
boundaries are indicated by relative indentation of consecutive text lines in
reading order.
All approaches reported above reflect a clear domain specificity. For in-
stance, the classification of blocks as “title” and “body” is appropriate for
magazine articles, but not for administrative documents. Moreover, the doc-
ument encoding rules appropriate for Western-style documents are different
for Japanese papers. Surprisingly, there is no work, to the best of our knowl-
edge, that handles the reading order problem by resorting to machine learning