python爬虫爬取网页新闻
时间: 2023-12-22 08:30:16 浏览: 172
以下是使用Python爬虫爬取网页新闻的基本步骤:
1. 导入所需的库:
```python
import requests
from bs4 import BeautifulSoup
import csv
```
2. 发送网络请求并获取网页内容:
```python
url = "网页链接"
response = requests.get(url)
html_content = response.text
```
3. 使用BeautifulSoup解析网页内容:
```python
soup = BeautifulSoup(html_content, 'html.parser')
```
4. 使用选择器(如re、xpath等)定位到新闻信息所在的HTML元素:
```python
news_elements = soup.select("选择器")
```
5. 提取新闻信息并保存到CSV文件中:
```python
with open('news.csv', 'w', newline='', encoding='utf-8') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['新闻标题', '发布时间', '新闻链接', '阅读次数', '新闻来源'])
for element in news_elements:
title = element.select_one("选择器").text
publish_time = element.select_one("选择器").text
news_link = element.select_one("选择器")['href']
read_count = element.select_one("选择器").text
news_source = element.select_one("选择器").text
writer.writerow([title, publish_time, news_link, read_count, news_source])
```
请注意,上述代码中的"选择器"需要根据具体网页的HTML结构进行调整。
阅读全文