专利文本中语义角色标注的改进与应用

109 浏览量更新于2024-08-29 收藏 344KB PDF 举报

本文研究了语义角色标注（Semantic Role Labeling, SRL）在专利知识提取中的应用。SRL是一项关键任务，它的目标是识别动词谓词的论元并为其赋予具有语义意义的标签，这对于信息抽取、问答系统以及机器翻译等领域至关重要。然而，在处理专利文本时，现有的SRL工具往往由于专利摘要中句子较长而表现出性能不足。针对这一问题，研究者提出了一种新的方法，即首先将专利摘要中的每个句子分解成更简单的结构，然后对简化后的句子进行语义角色标注。这样做有助于提高系统的处理效率和准确性。作者强调，通过这种方法，能够更好地理解和解析专利文本中的复杂关系，从而更有效地提取出其中蕴含的专利知识。具体实施上，文章构建了一个针对专利文本特性的SRL模型，该模型利用频繁使用的词汇的语义信息和框架来辅助角色标注过程。这包括识别专利中的技术术语、实体关系以及关键动作，如发明目的、技术方案和效果等。通过这种方法，作者证明了他们的策略能够在专利领域显著提升SRL系统的性能，为专利检索、分析和智能决策提供了强有力的支持。总结来说，这篇研究论文深入探讨了如何将SRL技术与专利文本的特点相结合，通过优化句子结构处理和利用词汇语义，以实现专利知识的高效提取。这对于推动专利信息的自动化处理和理解，促进科研人员快速获取有价值的技术信息具有重要意义。未来的研究可能进一步探索如何将这些方法扩展到专利全文，以挖掘更深层面的专利知识。

Research of Semantic Role Labeling and Application in

Patent knowledge Extraction

Ling’en Meng

Institute of Scientific and Technical

Information of China, Beijing

mengle2013@istic.ac.cn

Yanqing He

Institute of Scientific and Technical

Information of China, Beijing

heyq@istic.ac.cn

Ying Li

Institute of Scientific and Technical

Information of China, Beijing

liying@istic.ac.cn

ABSTRACT

Semantic Role Labeling (SRL) is a leading task of identifying

arguments for a predicate and assigning semantically meaningful

labels to them. SRL is crucial to information extraction, question

answering, and machine translation. When applied to patent text,

existing tools for SRL have unsatisfying performance because of

long sentences. To improve performance in patent SRL systems,

this study separates each sentence in patent abstracts into a

simpler structure, and then labels semantic roles for the simplified

sentence. At last, semantic information and semantic framework

for frequently used words are used to extract patent knowledge.

Our work demonstrates that the method used in this article can

improve the performance in SRL system and obtain beneficial

knowledge from patents.

Categories and Subject Descriptors

I.2.7 [Computing Methodologies]: Language Constructs and

Features –Language parsing and understanding, Text analysis.

General Terms

Algorithms, Experimentation, Languages

Keywords

Semantic role labeling, Patent text, Patent knowledge extraction

1. INTRODUCTION

Semantic Role Labeling is the process of annotating the predicate-

argument structure in text with semantic labels. SRL includes two

sub-tasks: the identification of syntactic constituents that are

semantic roles probably, and the labeling of those constituents

with the correct semantic role

[1]

. Most of current researches on

SRL focus on using supervised learning method including

generative model and discriminate model. The generative model is

firstly used in the SRL classification model. This model has fast

training rate and the dependence on the training corpus is not

strong. But the poor description ability and strong assumption of

features independence lead to unsatisfactory performance.

Discriminate models directly estimate the final goal of

optimization-- conditional probability. The process is usually

performed by iterative methods to find some optimized

coefficients. Discriminant models generally include linear

interpolation, SVM

[2]

, Perceptron

[3]

, SNoW(Sparse Network of

Winnows)

[4]

, Boosting

[5]

, Maximum Entropy, Decision tree,

Random forest

[6]

, etc. Combining the results produced by multiple

classifiers is a development direction and can obtain better results

than any one classifier. These supervised learning methods above

are often dependent on the effect of syntactic parsing and accurate

annotation of SRL. It is widely used in information extraction,

question answering, and machine translation.

SRL has the vital significance in shallow semantic parsing for text

information, especially patent texts. Patent texts contain useful

information about technologies. Analyzing patent texts can master

the present situation of patent texts, predict the hotspot timely and

grasp the trend of the technology. The existing patent platforms

Patsnap (http://cn.patsnap.com/), TechGlory (Patent risk controls

and competitive intelligence analysis system. http://www.tek-

glory.cn/), and Wang Xuefeng

[7]

use a manually annotated corpus,

they have high cost and low speed. Researchers also adopt

automatic extraction method to obtain key information from

patent texts. Jiang Caihong

[8]

constructs an ontology and writes

rules for patent knowledge extraction. Zhai Dongsheng

[9]

uses

ontology knowledge and semantic inference measure to construct

a reference network of patent.

This article introduces SRL information combined with a

semantic framework rules to extract patent technical topic from

patent abstract. As we all know, patent text usually has the

characteristic of long sentences with complex structures. As SRL

systems are ported into patent texts, they get poor results and

affect the effectiveness of the semantic analysis and knowledge

extraction. Compare the following examples:

Long sentence:

A plurality of resonance units are arranged [ARGM-TMP

in the shell], wherein one end of each resonance unit is fixed on

the inner wall at one side of the shell.

Simplified sentence:

A plurality of resonance units are arranged [ARGM-LOC in

the shell]

one end of each resonance unit is fixed on the inner wall at o

ne side of the shell.

It‘s obviously that the sematic tag ARGM-TMP (ARGM-TMP

represents time, more details in 2.2) in long sentence is wrong.

The correct tag is ARGM-LOC (ARGM-LOC represents location)

in the simplified sentence. To resolve the above problem, our

approach separates each long complicated sentence in patent

abstracts into a simpler structure, then labels semantic roles for

the simplified sentences, finally, synthesizes all the semantic

labels and semantic framework to extract patent topic. Finally,

Copying permitted for private and academic purposes.

This volume is published and copyrighted by its editors.

Published at Ceur-ws.org

Proceedings of the First International Workshop on Patent Mining and Its

Applications (IPAMIN) 2014. Hildesheim. Oct. 7th. 2014.

At KONVENS´14, October 8-10, 2014, Hildesheim, Germany.

下载后可阅读完整内容，剩余3页未读，立即下载

weixin_38741317

粉丝: 3
资源: 905

专利文本中语义角色标注的改进与应用

"深度学习在机器视觉中的应用：北京邮电大学计算机学院课程总结

"突现算法分析文档流：爆发和层次结构在文本挖掘中的应用

双语语义角色标签推理：双重分解与双向投影方法

A New Method of Creating Patent Technology-Effect Matrix Based on Semantic Role Labeling

swirl semantic role labeling system-开源

Chinese Semantic Role Labeling Based On Genetic Algorithm

encoding sentences with graph convolutional networks for semantic role labeling

Construction and Application of the Knowledge Base of Chinese Multi-word Expressions

A Role of Ontology in Enhancing Semantic Search: the EvOQS Framework and its Initial Validation

Practical Semantic Web and Linked Data Application

最新资源