Python3.9通过xpath获取中关村网站平板电脑各项数据参数并保存到数据库中
时间: 2024-03-29 21:41:52 浏览: 112
简单应用Xpath来爬取动态网页某氪的实战例子
好的,这是一道比较复杂的问题,需要分步骤来回答。首先,我们需要准备以下工具:
1. Python 3.9
2. 数据库(如MySQL)
3. 网页爬虫工具(如requests、BeautifulSoup、lxml)
接下来,我们需要编写以下步骤的代码:
1. 发送请求获取网页数据
使用requests库发送HTTP请求,获取中关村网站平板电脑数据页面的HTML源码。
```python
import requests
url = 'http://detail.zol.com.cn/tablepc/'
response = requests.get(url)
html = response.text
```
2. 解析HTML源码
使用BeautifulSoup库解析HTML源码,获取需要的数据。
```python
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
```
3. 使用xpath定位数据
使用xpath定位需要的数据,并将其存储到一个数据列表中。
```python
data_list = []
# 型号
models = soup.xpath('//h3[@class="product-model"]/a/text()')
data_list.append(models)
# 价格
prices = soup.xpath('//b[@class="price-type"]/text()')
data_list.append(prices)
# CPU
cpus = soup.xpath('//ul[@class="product-detail"]/li[1]/span[2]/text()')
data_list.append(cpus)
# 内存
memories = soup.xpath('//ul[@class="product-detail"]/li[2]/span[2]/text()')
data_list.append(memories)
# 存储
storages = soup.xpath('//ul[@class="product-detail"]/li[3]/span[2]/text()')
data_list.append(storages)
# 屏幕尺寸
screens = soup.xpath('//ul[@class="product-detail"]/li[4]/span[2]/text()')
data_list.append(screens)
# 操作系统
oss = soup.xpath('//ul[@class="product-detail"]/li[5]/span[2]/text()')
data_list.append(oss)
# 重量
weights = soup.xpath('//ul[@class="product-detail"]/li[6]/span[2]/text()')
data_list.append(weights)
```
4. 存储数据到数据库中
使用MySQL数据库,将数据存储到表中。
```python
import pymysql
db = pymysql.connect(host='localhost', user='root', password='password', database='test', charset='utf8')
cursor = db.cursor()
for i in range(len(models)):
sql = """
insert into tablet (model, price, cpu, memory, storage, screen, os, weight)
values (%s, %s, %s, %s, %s, %s, %s, %s)
"""
cursor.execute(sql, (models[i], prices[i], cpus[i], memories[i], storages[i], screens[i], oss[i], weights[i]))
db.commit()
db.close()
```
完整代码如下:
```python
import requests
from bs4 import BeautifulSoup
import pymysql
url = 'http://detail.zol.com.cn/tablepc/'
response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, 'lxml')
data_list = []
# 型号
models = soup.xpath('//h3[@class="product-model"]/a/text()')
data_list.append(models)
# 价格
prices = soup.xpath('//b[@class="price-type"]/text()')
data_list.append(prices)
# CPU
cpus = soup.xpath('//ul[@class="product-detail"]/li[1]/span[2]/text()')
data_list.append(cpus)
# 内存
memories = soup.xpath('//ul[@class="product-detail"]/li[2]/span[2]/text()')
data_list.append(memories)
# 存储
storages = soup.xpath('//ul[@class="product-detail"]/li[3]/span[2]/text()')
data_list.append(storages)
# 屏幕尺寸
screens = soup.xpath('//ul[@class="product-detail"]/li[4]/span[2]/text()')
data_list.append(screens)
# 操作系统
oss = soup.xpath('//ul[@class="product-detail"]/li[5]/span[2]/text()')
data_list.append(oss)
# 重量
weights = soup.xpath('//ul[@class="product-detail"]/li[6]/span[2]/text()')
data_list.append(weights)
db = pymysql.connect(host='localhost', user='root', password='password', database='test', charset='utf8')
cursor = db.cursor()
for i in range(len(models)):
sql = """
insert into tablet (model, price, cpu, memory, storage, screen, os, weight)
values (%s, %s, %s, %s, %s, %s, %s, %s)
"""
cursor.execute(sql, (models[i], prices[i], cpus[i], memories[i], storages[i], screens[i], oss[i], weights[i]))
db.commit()
db.close()
```
以上就是通过Python3.9使用xpath获取中关村网站平板电脑各项数据参数并保存到数据库中的步骤。
阅读全文