Python中爬虫后建立知识图谱的示例代码
时间: 2024-01-02 07:16:27 浏览: 28
以下是基于 Python 实现的爬虫后建立知识图谱的示例代码:
```
import requests
from bs4 import BeautifulSoup
from py2neo import Graph, Node, Relationship
# 爬取知乎一些关于 Python 的问题的页面
response = requests.get('https://www.zhihu.com/search?type=content&q=python')
soup = BeautifulSoup(response.content, 'html.parser')
question_tags = soup.find_all('div', class_='ContentItem-head')
# 获取问题标题和问题链接
questions = []
for tag in question_tags:
question = {}
a_tag = tag.find('a')
question['title'] = a_tag.get_text()
question['link'] = 'https://www.zhihu.com' + a_tag['href']
questions.append(question)
# 建立知识图谱
graph = Graph('http://localhost:7474/db/data/', username='neo4j', password='password')
for question in questions:
# 创建节点
question_node = Node('Question', title=question['title'], link=question['link'])
graph.create(question_node)
# 爬取问题页面并获取问题描述和回答
response = requests.get(question['link'])
soup = BeautifulSoup(response.content, 'html.parser')
description_tag = soup.find('div', class_='QuestionDetail-main')
if description_tag:
description = description_tag.get_text().strip()
if description:
description_node = Node('Description', content=description)
graph.create(description_node)
graph.create(Relationship(question_node, 'HAS_DESCRIPTION', description_node))
answer_tags = soup.find_all('div', class_='ContentItem AnswerItem')
for tag in answer_tags:
answer_node = Node('Answer', content=tag.find('div', class_='RichContent-inner').get_text().strip())
graph.create(answer_node)
graph.create(Relationship(question_node, 'HAS_ANSWER', answer_node))
```
注:此示例代码基于知乎问题爬取,需要安装 py2neo 库和 BeautifulSoup 库。另外,需要安装并运行 Neo4j 数据库,并且在代码中修改数据库的连接信息。