我要用python爬取https://www.fliggy.com/travel/destination?city=上海导入标题标题链接 价格评分评论时间地址等数据保存D:\X17\X17\新建文件夹”下新建一个excel表中,请给我完整代码
时间: 2024-12-08 20:14:21 浏览: 10
在Python中,我们可以使用`requests`库来获取网页内容,然后使用`BeautifulSoup`库解析HTML并提取所需的数据。最后,我们会用`pandas`库将数据保存到Excel表格中。请确保已经安装了这三个库,如果没有,可以运行以下命令安装:
```bash
pip install requests beautifulsoup4 pandas
```
这里是一个基本的示例代码,它会按照您的需求从Fliggy网站抓取信息,并将其保存到指定的Excel文件中:
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd
# 定义URL
url = "https://www.fliggy.com/travel/destination?city=上海"
# 发送GET请求
response = requests.get(url)
# 检查请求是否成功
if response.status_code == 200:
# 解析HTML
soup = BeautifulSoup(response.text, 'html.parser')
# 提取所需字段,这通常需要查看HTML结构才能确定准确的标签
titles = [title.text for title in soup.find_all('h3', class_='product-title')]
links = [a['href'] for a in soup.find_all('a', class_='product-name__link')]
prices = [span.text for span in soup.find_all('span', class_='price')]
ratings = [span.text for span in soup.find_all('div', class_='rating')]
timestamps = [time.text for time in soup.find_all('time', class_='time')]
locations = [location.text for location in soup.find_all('div', class_='location')]
# 创建字典列表存储数据
data_list = list(zip(titles, links, prices, ratings, timestamps, locations))
# 将数据转换为DataFrame
df = pd.DataFrame(data_list, columns=['标题', '链接', '价格', '评分', '评论时间', '地址'])
# 保存到Excel
file_path = r"D:\X17\X17\新建文件夹\fliggy_data.xlsx"
df.to_excel(file_path, index=False)
else:
print(f"请求失败,状态码:{response.status_code}")
阅读全文