神经网络在异构数据库语义集成中的应用

需积分: 1 134 浏览量更新于2024-09-16 收藏 1.37MB PDF 举报

"《Semantic Integration in Heterogeneous Databases Using Neural Networks》是关于利用神经网络在异构数据库中实现语义集成的研究论文。该方法旨在解决不同数据库中等价属性匹配的问题，通过提取并表达信息的语义，将这些语义作为元数据，并识别语义上等价的数据元素。研究提出了一种结合分类器和神经网络的流程，自动发现等效数据元素匹配的知识，而非预先编程。" 在这篇论文中，作者Wen-Syan Li和Chris Clifton探讨了在整合异构数据库时遇到的关键挑战——语义集成。异构数据库通常包含来自不同来源、结构各异的数据，因此，确定哪些字段指向相同的信息是一项关键任务。信息的意义可能体现在数据库模型、概念模式、应用程序或数据内容中。集成过程首先涉及从这些不同的源提取语义，然后将语义表达为元数据。元数据是描述数据的数据，它有助于理解和管理信息的含义。接下来，论文提出使用一个分类器对属性进行分类，依据是它们的字段规范和数据值。这个分类器有助于理解属性的特征和上下文。然后，引入神经网络来识别相似的属性。神经网络是一种模仿人脑工作原理的计算模型，能够学习并识别模式。在这种情况下，神经网络被训练以学习从元数据中“发现”的等效数据元素匹配规则，而不是依赖于预设的匹配逻辑。这种自学习的方法增加了系统适应性和灵活性，使其能处理更复杂的语义关系。 1. 引言部分指出，联邦数据库开发中的一个主要问题是语义集成，即找出等价字段的等价性。传统的解决方案往往需要手动定义匹配规则，这种方法不仅耗时，而且难以应对不断变化和扩展的数据环境。 2. 方法论上，论文提出的方法包括两个阶段：属性分类和神经网络训练。分类阶段利用特征来区分不同的属性，而神经网络训练阶段则通过学习元数据中的模式来自动识别潜在的匹配。 3. 应用场景可能包括跨组织的数据共享、大数据分析和信息融合，这些都需要有效地集成和理解来自多个异构源的数据。 4. 论文可能还涵盖了实验结果和评估，展示神经网络在实际数据集上的性能，以及与传统方法相比的优势。通过这种方式，神经网络技术的应用为语义集成提供了一种自动化和自适应的解决方案，降低了人工干预的需求，提高了数据集成的效率和准确性。这种方法对于处理大规模、复杂和动态的异构数据环境尤其有价值，可以促进更有效的数据管理和决策支持。

ing a lexicon of synonyms. It is assumed that some

classes or at least some of their attributes and/or rela-

tionships are assigned with meaningful names in a pre-

integration phrase. Therefore, the knowledge about

the terminological relationship between the names can

be used as an indicator of the real world correspon-

dence between the objects. In pre-integration, object

equivalence (or degree of similarity) is calculated by

comparing the aspects of each object and computing

a weighted probability of similarity and dissimilarity.

Sheth and Larson [SL90]

noted that comparison of

the schema objects is difficult unless the related in-

formation is represented in a similar form in different

schemas.

2.1 Existing Approaches

In [DKM+93] it is noted that semantics are embod-

ied in four places: The database model, conceptual

schema, application programs and minds of users. An

automatic semantic integration procedure can only

make use of information contained in the first two. We

further break this into three parts: The names of at-

tributes (obtained from the schema); attribute values

and domains (obtained from the data contents); and

field specifications (from the schema,‘or in some cases

from automated inspection of the data). We detail

these approaches below.

2.1.1 Comparing attribute names

Systems have been developed to automate database

integration. One that has addressed the problem of at-

tribute equivalence is MUVIS (Multi-User View Inte-

gration System) [HR90]. MUVIS is a knowledge based

system for view integration. It assists database design-

ers in representing user views and integrating these

views into a global conceptual view. MUVIS deter-

mines the degree of similarity and dissimilarity of two

objects during a

p-e-integmtion

phrase l.

The similarity and dissimilarity in MUVIS is primarily

based on comparing the

field names

of the attributes.

Object equivalence is determined by comparing the as-

pects of each (such as class names, member names,

and attribute names) and computing a weighted value

for similarity and dissimilarity. A recommendation is

then produced as to how the integration should be per-

formed.

Most automated tools developed to assist designers in

establishing object correspondences by comparing at-

tribute names work well for homonyms (same name

for different data), as users are shown the false match.

However, different objects can have different synonyms

‘Since, in the real world, se-tics of terms may vary, the

relationship between two attributes is usually fuzzy. Therefore,

a degree of similarity and diasimikity has a strength of [O,l].

that are not easily detected by inspection. This shifts

the problem to building the synonym lexicon. Even

a synonym lexicon has limitations because it is diffi-

cult for database designers to define a field name by

using only the words that can be found in a dictio-

nary or abbreviations carrying unambiguous meanings

and in some cases, it is difficult to use

single word

rather than a phrase to name a field. These reasons

make it expensive to build a system of this approach.

Sheth and Larson [SL90] also pointed out that com-

pletely automatic determination of attribute relation-

ships through searching a synonym lexicon is not pos-

sible because it would require that all of the semantics

of schema be completely specified. Also, current se-

mantic (or other) data models are not able to capture

a real-world state completely and interpretations of

real-world state change over time.

2.1.2 Comparing attribute values and do-

mains using data contents

Another approach of determining attribute equiva-

lence is comparing attribute domains. Larson et. al.

[LNE89, NB86] and Sheth et. al. [SLCN88] discussed

how relationships and entity sets can be integrated pri-

marily based on their domain relationships: EQUAL,

CONTAINS, OVERLAP, CONTAINED-IN, and DIS-

JOINT. Determining such relationships can be time

consuming and tedious [SL90]. If each schema has

100 entity types, and an average of five attributes per

entity type, then 250,099 pairs of attributes must be

considered (for each attribute in one schema, a poten-

tial relationship with each attribute in other schemas

should be considered). Another problem with their

approach is poor tolerance of faults. Small amounts of

incorrect data may lead the system to draw a wrong

conclusion on domain relationships.

In the tool developed to perform schema integration

described in [SLCN88],

a heuristic algorithm is given

to identify pairs of entity types and relationship types

that are related by EQUAL, CONTAINS, OVERLAP,

and CONTAINED-IN domain relationships. Sheth

and Gala [SG89] also argued that this task cannot

be automated, and hence we may need to depend on

heuristics to identify a small number of attribute pairs

that may be potentially related by a relationship other

than DISJOINT.

2.1.3 Comparing field specifications

In [NB86] the characteristics of attributes discussed

are uniqueness, cardinality, domain, semantic integrity

constraints, security constraints, allowable operations,

and scale.

In our prior work [LC93], we presented

a technique which utilizes these field specifications to

determine the similarity and dissimilarity of a pair of

剩余11页未读，继续阅读

dsblijun

粉丝: 0
资源: 2

神经网络在异构数据库语义集成中的应用

"深度学习与语义分析：方法与应用

陌陌社交推荐：模型化召回技术的探索与实践

深度视觉语义对齐：生成图像描述的新方法

Semantic Breast Tumor Segmentation by CNN: Semantic Breast Tumor Segmentation by Convolutional Neural Network in MRI 图像-matlab开发

Semantic Segmentation using Adversarial Networks.pdf

Supervised Sequence Labelling with Recurrent Neural Networks

Deep Learning and Convolutional Neural Networks for Medical Image Computing

Peer-to-Peer Semantic Integration of XML and RDF.pdf

Semantic Web Services for Web Databases

Image object detection and semantic segmentation based on convolutional neural network

最新资源