python爬虫抓取网页数据源码
时间: 2024-05-20 13:08:48 浏览: 136
python网页爬虫源代码
Python爬虫抓取网页数据的源码可以分为以下几个步骤:
1. 导入需要的库
```
import requests
from bs4 import BeautifulSoup
```
2. 发送请求并获取响应数据
```
url = 'http://example.com'
response = requests.get(url)
```
3. 解析HTML页面数据
```
soup = BeautifulSoup(response.text, 'html.parser')
```
4. 提取所需数据
```
data = []
for item in soup.find_all('div', {'class': 'item'}):
title = item.find('h2', {'class': 'title'}).text.strip()
link = item.find('a')['href']
date = item.find('span', {'class': 'date'}).text.strip()
data.append({'title': title, 'link': link, 'date': date})
```
5. 存储数据到文件或数据库等位置
```
import csv
with open('data.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=['title', 'link', 'date'])
writer.writeheader()
for d in data:
writer.writerow(d)
```
阅读全文