Python的NLTK库实现NLP实体抽取的案例
时间: 2023-11-05 08:23:02 浏览: 245
下面是一个使用NLTK库实现NLP实体抽取的简单案例,我们将从一段英文文本中抽取出人名、组织机构和地理位置等实体:
```python
import nltk
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag
from nltk.chunk import ne_chunk
# 加载文本数据
text = "John is the CEO of Apple Corp. located in California, USA."
# 分词和词性标注
tokens = word_tokenize(text)
pos_tags = pos_tag(tokens)
# 命名实体识别
tree = ne_chunk(pos_tags)
# 遍历命名实体树,抽取实体
entities = []
for subtree in tree.subtrees():
if subtree.label() in ['ORGANIZATION', 'PERSON', 'GPE']:
entity = ' '.join([leaf[0] for leaf in subtree.leaves()])
entities.append((entity, subtree.label()))
# 打印抽取出来的实体和实体类型
print(entities)
```
执行结果如下:
```
[('John', 'PERSON'), ('Apple Corp.', 'ORGANIZATION'), ('California', 'GPE'), ('USA', 'GPE')]
```
其中,`word_tokenize`和`pos_tag`用于进行分词和词性标注,`ne_chunk`用于进行命名实体识别,`subtree.label()`用于获取命名实体的类型,最后遍历命名实体树,抽取出实体和实体类型。
阅读全文