首页python爬虫抓取网页数据源码

python爬虫抓取网页数据源码

时间: 2024-05-20 10:08:48 浏览: 142

Python爬虫抓取网页数据的源码可以分为以下几个步骤：

导入需要的库

import requests
from bs4 import BeautifulSoup

发送请求并获取响应数据

url = 'http://example.com'
response = requests.get(url)

解析HTML页面数据

soup = BeautifulSoup(response.text, 'html.parser')

提取所需数据

data = []
for item in soup.find_all('div', {'class': 'item'}):
 title = item.find('h2', {'class': 'title'}).text.strip()
 link = item.find('a')['href']
 date = item.find('span', {'class': 'date'}).text.strip()
 data.append({'title': title, 'link': link, 'date': date})

存储数据到文件或数据库等位置 ``` import csv

with open('data.csv', 'w', newline='', encoding='utf-8') as f: writer = csv.DictWriter(f, fieldnames=['title', 'link', 'date']) writer.writeheader() for d in data: writer.writerow(d) ```

阅读全文