基于学习的固定点模型：文档图像中表格识别新方法

位图解析

需积分: 10 119 浏览量更新于2024-09-09 收藏 2.76MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

资源详情

资源推荐

Table Extraction from Document Images using Fixed Point

Model

Anukriti Bansal

∗

IIT Delhi

anukriti1107@gmail.com

Gaurav Harit

IIT Jodhpur

gharit@iitj.ac.in

Sumantra Dutta Roy

IIT Delhi

sumantra@ee.iitd.ac.in

ABSTRACT

The paper presents a novel learning-based framework to

identify tables from scanned document images. The ap-

proach is designed as a structured labeling problem, which

learns the layout of the document and labels its various en-

tities as table header, table trailer, table cell and non-table

region. We develop features which encode the foreground

block characteristics and the contextual information. These

features are provided to a ﬁxed point model which learns the

inter-relationship between the blocks. The ﬁxed point model

attains a contraction mapping and provides a unique label to

each block. We compare the results with Condition Random

Fields(CRFs). Unlike CRFs, the ﬁxed point model captures

the context information in terms of the neighbourhood lay-

out more eﬃciently. Experiments on the images picked from

UW-III (University of Washington) dataset, UNLV dataset

and our dataset consisting of document images with multi-

column page layout, show the applicability of our algorithm

in layout analysis and table detection.

Keywords

Table recognition, Fixed Point Model, Structured labeling,

Conditional Random Fields, Layout analysis

1. INTRODUCTION

Tables present in documents are often used to compactly

communicate important information in rows and columns.

To automatically extract this information by digitization of

paper documents, the tabular structures need to be identi-

ﬁed and the layout and inter-relationship between the table

elements need to be preserved for subsequent analysis. The

problem of table detection is challenging due to a wide range

of layouts and random positioning of table elements. Algo-

rithms for table detection have been proposed by authors in

the past, but the problem of correctly localizing the tabu-

lar structure from a wide variety of documents, remains a

∗

Corresponding author

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are not

made or distributed for proﬁt or commercial advantage and that copies bear

this notice and the full citation on the ﬁrst page. Copyrights for components

of this work owned by others than ACM must be honored. Abstracting with

credit is permitted. To copy otherwise, or republish, to post on servers or to

redistribute to lists, requires prior speciﬁc permission and/or a fee. Request

permissions from Permissions@acm.org.

ICVGIP ’14, December 14-18 2014, Bangalore, India

http://dx.doi.org/10.1145/2683483.2683550

challenging task.

In this work we learn the layout of a document image

by extracting the attributes of foreground and background

regions and modeling the correlations between them. Us-

ing these attributes, a ﬁxed point model captures the con-

text and learns the inter-relationships between diﬀerent fore-

ground and background document entities to assign them a

unique label which can be, table header, table trailer, table

cell and non-table region. Regions which get table related

labels are clustered together to extract a table.

The Fixed Point Model as proposed by Li et al [18] has

been used for the task of structured labeling by capturing

the correlation between the observed data. The structured

input is denoted as a graph with nodes and edges. The ob-

jective of structured labeling task is to jointly assign the

labels to all the nodes of a graph. In computer vision, the

structured input comprises the set of inputs of all the pixels

and the structured output constitutes the set of labels as-

signed to those pixels. Edges between the nodes are used to

model the correlations among the nodes. The Fixed point

model captures the neighborhood information and models

the correlation between the diﬀerent nodes to predict the

label of each node. Markov random ﬁelds (MRF) [10] and

conditional random ﬁelds (CRF) [17] are also used to model

the inter-relationships of structural labels. However, due

to heavy computational burden in the training and testing

stages, MRF and CRF are often modeled to capture a few

neighborhood interactions, limiting their modeling capabil-

ities. The motivation to use ﬁxed point model for the prob-

lem of table detection arises from the need to model the spa-

tial inter-dependencies of diﬀerent elements of a document

image. The ﬁxed point model utilizes the context informa-

tion and attains a contraction mapping to assign a unique

label to each element of document image. The ﬁnal labeling

helps extract the table regions. This can facilitate applica-

tions such as searching, indexing and information retrieval.

A subset of the authors have previously used a ﬁxed point

model for article extraction [1].

1.1 Related Work

Several interesting survey papers [11] [7] [23] [20] [28] [36]

[27] have been published on table structure analysis and lay-

out analysis related work in the last two decades. Layout

analysis is a major step in identifying any physical or log-

ical document entity. In this section, we review the litera-

ture related to the use of machine learning-based methods

for layout analysis, speciﬁcally for extracting tables. Table

extraction has been attempted on scanned images [13] [32]

下载后可阅读完整内容，剩余7页未读，立即下载

ynzheng_abcft

粉丝: 0
资源: 1

基于学习的固定点模型：文档图像中表格识别新方法

C语言位图算法详解

位图图像灰度化的方法及编程实现

android使用JNI图片转黑白图片实现点阵图（二值化）

单色位图抗锯齿算法使用

位图转矢量图 有哪些算法

使用C语言实现一个位图算法地实例

使用C语言实现一个位图算法的实例

bmp单色位图抗锯齿算法MFC代码实现

单色位图抗锯齿算法MFC代码实现

MFC 解析PDF生成单色位图

24位图32位图 旋转90 c++

BMP文件有多种不同的格式，比如1位图、4位图、8位图、24位图

MFC 怎么把24位图中不是纯黑的点去掉高效算法代码实现

MFC 将一张单色位图图像数据加上抗锯齿算法并生成图片

C++ 怎么把24位图中不是纯黑的点去掉高效算法代码实现

C++BMP位图*85

MFC 将一张单色位图数据图像加上抗锯齿算法后生成图片

MFC 将一张24位图转换成单色位图代码

三种位图的常见方法，深入分析位图结构

如何读取24色真彩色位图信息

最新资源

位图转矢量图有哪些算法

24位图32位图旋转90 c++