没有合适的资源?快使用搜索试试~ 我知道了~
首页基于学习的固定点模型:文档图像中表格识别新方法
基于学习的固定点模型:文档图像中表格识别新方法
需积分: 10 6 下载量 119 浏览量
更新于2024-09-09
收藏 2.76MB PDF 举报
本文档探讨了一种创新的位图解析算法,特别针对扫描文档图像中的表格识别。该方法将表格检测视为一个结构化标注问题,旨在精确、可靠地识别文档中的元素,如表头、表尾、单元格和非表格区域。研究者们提出了基于固定点模型(Fixed Point Model)的解决方案,它不仅关注前景块的特性,还考虑了上下文信息。 固定点模型的设计核心在于开发一组特征,这些特征能够编码区块的特性,如形状、颜色分布和纹理,以及它们在页面布局中的相互关系。通过学习这些特征之间的内在联系,模型能够形成一个收缩映射,从而为每个区块分配唯一的标签。这种方法与传统的条件随机场(Conditional Random Fields, CRFs)相比,更有效地捕捉了邻域布局的上下文信息,提高了识别准确性和鲁棒性。 实验部分,研究人员在包括UW-III(华盛顿大学)数据集、UNLV数据集以及他们自建的文档图像数据集中进行了验证。通过对比实验结果,固定点模型展示了优于CRF的性能,尤其是在处理复杂文档布局和表格多样性的情况下,其错误率显著降低,对于实际应用中的文档自动化处理具有重要意义。 这份文档提供了一种新颖的位图解析策略,利用固定点模型结合深度学习技术,显著提升了表格从扫描文档图像中准确提取的能力,对提高文本信息的自动处理效率和质量具有关键价值。这对于电子文档管理、信息检索和光学字符识别等领域都具有实用价值。
资源详情
资源推荐
Table Extraction from Document Images using Fixed Point
Model
Anukriti Bansal
∗
IIT Delhi
anukriti1107@gmail.com
Gaurav Harit
IIT Jodhpur
gharit@iitj.ac.in
Sumantra Dutta Roy
IIT Delhi
sumantra@ee.iitd.ac.in
ABSTRACT
The paper presents a novel learning-based framework to
identify tables from scanned document images. The ap-
proach is designed as a structured labeling problem, which
learns the layout of the document and labels its various en-
tities as table header, table trailer, table cell and non-table
region. We develop features which encode the foreground
block characteristics and the contextual information. These
features are provided to a fixed point model which learns the
inter-relationship between the blocks. The fixed point model
attains a contraction mapping and provides a unique label to
each block. We compare the results with Condition Random
Fields(CRFs). Unlike CRFs, the fixed point model captures
the context information in terms of the neighbourhood lay-
out more efficiently. Experiments on the images picked from
UW-III (University of Washington) dataset, UNLV dataset
and our dataset consisting of document images with multi-
column page layout, show the applicability of our algorithm
in layout analysis and table detection.
Keywords
Table recognition, Fixed Point Model, Structured labeling,
Conditional Random Fields, Layout analysis
1. INTRODUCTION
Tables present in documents are often used to compactly
communicate important information in rows and columns.
To automatically extract this information by digitization of
paper documents, the tabular structures need to be identi-
fied and the layout and inter-relationship between the table
elements need to be preserved for subsequent analysis. The
problem of table detection is challenging due to a wide range
of layouts and random positioning of table elements. Algo-
rithms for table detection have been proposed by authors in
the past, but the problem of correctly localizing the tabu-
lar structure from a wide variety of documents, remains a
∗
Corresponding author
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear
this notice and the full citation on the first page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specific permission and/or a fee. Request
permissions from Permissions@acm.org.
ICVGIP ’14, December 14-18 2014, Bangalore, India
Copyright 2014 ACM 978-1-4503-3061-9/14/12...$15.00
http://dx.doi.org/10.1145/2683483.2683550
challenging task.
In this work we learn the layout of a document image
by extracting the attributes of foreground and background
regions and modeling the correlations between them. Us-
ing these attributes, a fixed point model captures the con-
text and learns the inter-relationships between different fore-
ground and background document entities to assign them a
unique label which can be, table header, table trailer, table
cell and non-table region. Regions which get table related
labels are clustered together to extract a table.
The Fixed Point Model as proposed by Li et al [18] has
been used for the task of structured labeling by capturing
the correlation between the observed data. The structured
input is denoted as a graph with nodes and edges. The ob-
jective of structured labeling task is to jointly assign the
labels to all the nodes of a graph. In computer vision, the
structured input comprises the set of inputs of all the pixels
and the structured output constitutes the set of labels as-
signed to those pixels. Edges between the nodes are used to
model the correlations among the nodes. The Fixed point
model captures the neighborhood information and models
the correlation between the different nodes to predict the
label of each node. Markov random fields (MRF) [10] and
conditional random fields (CRF) [17] are also used to model
the inter-relationships of structural labels. However, due
to heavy computational burden in the training and testing
stages, MRF and CRF are often modeled to capture a few
neighborhood interactions, limiting their modeling capabil-
ities. The motivation to use fixed point model for the prob-
lem of table detection arises from the need to model the spa-
tial inter-dependencies of different elements of a document
image. The fixed point model utilizes the context informa-
tion and attains a contraction mapping to assign a unique
label to each element of document image. The final labeling
helps extract the table regions. This can facilitate applica-
tions such as searching, indexing and information retrieval.
A subset of the authors have previously used a fixed point
model for article extraction [1].
1.1 Related Work
Several interesting survey papers [11] [7] [23] [20] [28] [36]
[27] have been published on table structure analysis and lay-
out analysis related work in the last two decades. Layout
analysis is a major step in identifying any physical or log-
ical document entity. In this section, we review the litera-
ture related to the use of machine learning-based methods
for layout analysis, specifically for extracting tables. Table
extraction has been attempted on scanned images [13] [32]
下载后可阅读完整内容,剩余7页未读,立即下载
ynzheng_abcft
- 粉丝: 0
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 李兴华Java基础教程:从入门到精通
- U盘与硬盘启动安装教程:从菜鸟到专家
- C++面试宝典:动态内存管理与继承解析
- C++ STL源码深度解析:专家级剖析与关键技术
- C/C++调用DOS命令实战指南
- 神经网络补偿的多传感器航迹融合技术
- GIS中的大地坐标系与椭球体解析
- 海思Hi3515 H.264编解码处理器用户手册
- Oracle基础练习题与解答
- 谷歌地球3D建筑筛选新流程详解
- CFO与CIO携手:数据管理与企业增值的战略
- Eclipse IDE基础教程:从入门到精通
- Shell脚本专家宝典:全面学习与资源指南
- Tomcat安装指南:附带JDK配置步骤
- NA3003A电子水准仪数据格式解析与转换研究
- 自动化专业英语词汇精华:必备术语集锦
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功