pycharm爬取京东评论并清洗
时间: 2023-12-16 18:28:09 浏览: 32
为了爬取京东评论并清洗,可以按照以下步骤进行操作:
1. 首先,需要安装requests和beautifulsoup4这两个Python库。可以使用以下命令进行安装:
```shell
pip install requests
pip install beautifulsoup4
```
2. 接下来,需要编写Python代码来爬取京东评论。可以参考以下代码:
```python
import requests
from bs4 import BeautifulSoup
def get_comments(url):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
comments = soup.find_all('div', class_='comment-item')
return comments
```
3. 然后,需要编写代码来清洗评论数据。可以参考以下代码:
```python
def clean_comments(comments):
cleaned_comments = []
for comment in comments:
try:
content = comment.find('div', class_='comment-content').get_text().strip()
star = comment.find('div', class_='comment-star').find('span')['class'][1][7]
cleaned_comments.append((content, star))
except:
continue
return cleaned_comments
```
4. 最后,可以将清洗后的评论数据保存到CSV文件中。可以参考以下代码:
```python
import csv
def save_comments_to_csv(comments, filename):
with open(filename, 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(['评论内容', '评分'])
writer.writerows(comments)
```