python数据爬取美食网站xpath解析并将美食数据存入csv文件按照人气数据分析可视化
时间: 2024-12-22 14:16:41 浏览: 3
Python的数据爬虫结合XPath解析通常用于从网页结构中提取所需的信息。例如,你可以使用`requests`库获取美食网站的HTML内容,然后利用`BeautifulSoup`或`lxml`等库解析XPath表达式来定位美食信息,如名称、图片、评分和评论数等。
以下是简单的步骤:
1. **安装必要的库**:
```
pip install requests beautifulsoup4 pandas lxml
```
2. **编写爬虫代码**:
```python
import requests
from bs4 import BeautifulSoup
import csv
import pandas as pd
# 获取网页内容
url = "美食网站地址" # 替换为你想抓取的网站URL
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
# 使用XPath选择器找到数据节点
xpath_rules = {
'美食名称': '//div[@class="food-name"]/h2/text()',
'图片链接': '//div[@class="food-image"]//img/@src',
'评分': '//div[@class="rating"]/span[@class="score"]/text()',
'评论数': '//div[@class="reviews-count"]/span/text()'
}
data = []
for rule, selector in xpath_rules.items():
elements = soup.select(selector)
if elements:
data.append([rule] + [element.get_text().strip() for element in elements])
# 存储到CSV文件
with open('美食数据.csv', 'w', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(['字段'] + xpath_rules.keys())
writer.writerows(data)
# 数据分析和可视化
df = pd.read_csv('美食数据.csv')
df['人气'] = df['评分'].astype(float) * df['评论数']
df_sorted = df.sort_values(by=['人气'], ascending=False)
df_sorted.plot(kind='bar', x='美食名称', y='人气', title='按人气排序的美食')
```
阅读全文