首页在携程官网中，用python爬取遗爱湖景区评论（文本）网址：https://you.ctrip.com/sight/huanggang859/127001.html?renderPlatform= 利用Beautiful Soup 和requests

在携程官网中，用python爬取遗爱湖景区评论（文本）网址：https://you.ctrip.com/sight/huanggang859/127001.html?renderPlatform= 利用Beautiful Soup 和requests

时间: 2024-12-16 07:22:51 浏览: 52

在Python中，我们可以使用`requests`库来发送HTTP请求获取网页内容，然后使用`BeautifulSoup`库解析HTML文档并提取我们需要的信息。以下是基本步骤： 1. 安装必要的库（如果你还没安装的话）： ```bash pip install requests beautifulsoup4 ``` 2. 导入所需的库，并发送GET请求获取网页源码： ```python import requests from bs4 import BeautifulSoup url = "https://you.ctrip.com/sight/huanggang859/127001.html?renderPlatform=" response = requests.get(url) html_content = response.text ``` 3. 使用BeautifulSoup解析HTML内容，找到包含评论部分的标签。通常这可能是通过查找特定class名、id名或数据属性（如"data-comment-list"）： ```python soup = BeautifulSoup(html_content, 'lxml') comment_list = soup.find('div', {'data-comment-list': True}) ``` 4. 分离和清洗评论。这可能需要遍历子元素（例如`<p>`标签），获取其中的文字内容： ```python comments = [] for comment in comment_list.find_all('p'): cleaned_text = comment.get_text().strip() # 去除换行符和空格 if cleaned_text: # 检查是否为空 comments.append(cleaned_text) ``` 5. 最后，你可以打印或保存评论列表： ```python for i, comment in enumerate(comments, start=1): print(f"评论 {i}: {comment}")

阅读全文