生成酒店网站评论爬取的paython代码
时间: 2024-10-27 19:12:11 浏览: 20
在Python中编写用于抓取酒店网站评论的脚本通常会涉及网络请求、HTML解析和数据存储等步骤。这里是一个简单的例子,我们将使用`requests`库获取网页内容,`BeautifulSoup`库解析HTML,以及`pandas`库对数据进行管理。假设我们想要从一个网页列表中提取评论:
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd
# 定义要抓取的URL列表
urls = ['https://www.example.com/hotel_reviews_1', 'https://www.example.com/hotel_reviews_2', ...]
def fetch_reviews(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
# 查找评论部分的HTML标签,这取决于网站结构
review_elements = soup.find_all('div', class_='review') or soup.find_all('p', class_='comment')
reviews = []
for element in review_elements:
title = element.find('span', class_='title').text if element.find('span', class_='title') else ''
content = element.find('span', class_='content').text if element.find('span', class_='content') else ''
rating = element.find('span', class_='rating').text if element.find('span', class_='rating') else None
reviews.append({'Title': title, 'Content': content, 'Rating': rating})
return reviews
all_reviews = []
for url in urls:
all_reviews.extend(fetch_reviews(url))
# 将所有评论合并到一个DataFrame中
comments_df = pd.DataFrame(all_reviews)
# 存储数据,如CSV文件
comments_df.to_csv('hotel_reviews.csv', index=False)
阅读全文