python爬虫抓取网页数据源码
时间: 2024-05-20 10:08:48 浏览: 142
Python爬虫抓取网页数据的源码可以分为以下几个步骤:
导入需要的库
import requests from bs4 import BeautifulSoup
发送请求并获取响应数据
url = 'http://example.com' response = requests.get(url)
解析HTML页面数据
soup = BeautifulSoup(response.text, 'html.parser')
提取所需数据
data = [] for item in soup.find_all('div', {'class': 'item'}): title = item.find('h2', {'class': 'title'}).text.strip() link = item.find('a')['href'] date = item.find('span', {'class': 'date'}).text.strip() data.append({'title': title, 'link': link, 'date': date})
存储数据到文件或数据库等位置 ``` import csv
with open('data.csv', 'w', newline='', encoding='utf-8') as f: writer = csv.DictWriter(f, fieldnames=['title', 'link', 'date']) writer.writeheader() for d in data: writer.writerow(d) ```
阅读全文