python爬取知乎问答
时间: 2023-10-06 10:05:54 浏览: 188
要爬取知乎问答,可以通过以下步骤:
1. 安装 `requests` 和 `beautifulsoup4` 库:打开命令行,输入以下命令安装:
```
pip install requests beautifulsoup4
```
2. 打开知乎网站,找到要爬取的问答页面,例如:https://www.zhihu.com/question/123456789。
3. 使用 `requests` 库获取该页面的 HTML 内容:
```python
import requests
url = 'https://www.zhihu.com/question/123456789'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
html = response.text
```
4. 使用 `beautifulsoup4` 库解析 HTML 内容,获取问答的标题和内容:
```python
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
title = soup.find('h1', class_='QuestionHeader-title').text.strip()
content = soup.find('div', class_='QuestionRichText').text.strip()
```
5. 获取所有回答的内容:
```python
answers = []
for answer in soup.find_all('div', class_='List-item'):
answer_content = answer.find('div', class_='RichContent-inner').text.strip()
answers.append(answer_content)
```
完整代码示例:
```python
import requests
from bs4 import BeautifulSoup
url = 'https://www.zhihu.com/question/123456789'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
title = soup.find('h1', class_='QuestionHeader-title').text.strip()
content = soup.find('div', class_='QuestionRichText').text.strip()
answers = []
for answer in soup.find_all('div', class_='List-item'):
answer_content = answer.find('div', class_='RichContent-inner').text.strip()
answers.append(answer_content)
print(title)
print(content)
print(answers)
```
阅读全文