开放获取资源元数据采集方法探究
摘要:为更加有效的完成开放获取资源采集,本文通过对专家遴选的开放获取资源进
行调研分析,梳理出开放获取资源具有元数据描述粒度细、元数据描述复杂和注重数据质量
等特点。通过对当前信息资源采集方法和采集系统的特点分析,以及其在开放获取资源采集
上的应用分析,总结出当前的方法和系统主要存在两个问题:(1)局限性;(2)数据采集不全
面。最终研究提出了基于页面结构检查机制的开放获取资源元数据采集框架,并实践证明该
框架能有效满足开放获取资源采集需求。
关键字:开放获取资源,元数据采集,Web 信息采集
Open Access Resource Metadata Extraction Method Research
Abstract: In order to complete the extraction of open access resources metadata more effectively,
we do the research on open access resources selected by experts, and sort out the open access
resources metadata have some characteristics, such as metadata descript more detail, metadata
description complex and focus more on data quality. Through the comparison of current methods
and systems of metadata extraction, and analysis the application of method and system on open
access resource, we summarize that there are two main problems: (1) limitations; (2) data collection
is not comprehensive. Then we put forward a new extraction method which based on web structure
checking mechanism of open access resource metadata. And practice shows that the method can
effectively meet the requirements of open access resource extraction.
Keywords: Open Access Resource; Metadata Extraction; Web Information Extraction;