首页生成酒店网站评论爬取的paython代码

生成酒店网站评论爬取的paython代码

时间: 2024-10-27 19:12:11 浏览: 20

在Python中编写用于抓取酒店网站评论的脚本通常会涉及网络请求、HTML解析和数据存储等步骤。这里是一个简单的例子，我们将使用`requests`库获取网页内容，`BeautifulSoup`库解析HTML，以及`pandas`库对数据进行管理。假设我们想要从一个网页列表中提取评论： ```python import requests from bs4 import BeautifulSoup import pandas as pd # 定义要抓取的URL列表 urls = ['https://www.example.com/hotel_reviews_1', 'https://www.example.com/hotel_reviews_2', ...] def fetch_reviews(url): response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') # 查找评论部分的HTML标签，这取决于网站结构 review_elements = soup.find_all('div', class_='review') or soup.find_all('p', class_='comment') reviews = [] for element in review_elements: title = element.find('span', class_='title').text if element.find('span', class_='title') else '' content = element.find('span', class_='content').text if element.find('span', class_='content') else '' rating = element.find('span', class_='rating').text if element.find('span', class_='rating') else None reviews.append({'Title': title, 'Content': content, 'Rating': rating}) return reviews all_reviews = [] for url in urls: all_reviews.extend(fetch_reviews(url)) # 将所有评论合并到一个DataFrame中 comments_df = pd.DataFrame(all_reviews) # 存储数据，如CSV文件 comments_df.to_csv('hotel_reviews.csv', index=False)

阅读全文