上述代码并未正确爬取信息
时间: 2023-07-23 11:11:15 浏览: 84
非常抱歉,我在修改代码时出现了错误。请使用以下修改后的代码来正确爬取评论信息,并将其存储在comments.xlsx中:
```python
import time
import openpyxl
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.common import NoSuchElementException
from selenium.webdriver.common.by import By
def is_element_present(edge, how, what):
try:
edge.find_element(by=how, value=what)
except NoSuchElementException:
return False
return True
website = 'https://www.mafengwo.cn'
url = "https://www.mafengwo.cn/jd/10783/gonglve.html"
edge = webdriver.Edge()
edge.get(url)
pages = []
comments_data = []
while True:
flag = is_element_present(edge, By.CSS_SELECTOR, 'a.pg-next')
print(flag)
if flag:
next_page = edge.find_element(By.CSS_SELECTOR, "a.pg-next")
html = edge.page_source
soup = BeautifulSoup(html, "html.parser")
aa = soup.select("ul.scenic-list>li>a ")
for a in aa:
href = a.get("href")
pages.append(website + href)
print(a.get("title"))
edge.execute_script("arguments[0].click()", next_page)
time.sleep(5)
else:
break
for x in pages:
edge.get(x)
print(x)
detail_html = edge.page_source
print(detail_html)
soup = BeautifulSoup(detail_html, "html.parser")
# 获取总评论数
total_comments = soup.select("p.comment-total span")[0].get_text()
print("总评论数:", total_comments)
# 获取好评数
good_comments = soup.select("div.score-legend > span:nth-of-type(1) > strong")[0].get_text()
print("好评数:", good_comments)
# 获取中评数
medium_comments = soup.select("div.score-legend > span:nth-of-type(2) > strong")[0].get_text()
print("中评数:", medium_comments)
# 获取差评数
bad_comments = soup.select("div.score-legend > span:nth-of-type(3) > strong")[0].get_text()
print("差评数:", bad_comments)
comments_data.append([total_comments, good_comments, medium_comments, bad_comments])
time.sleep(10)
# 将爬取的信息存入comments.xlsx文件中
wb = openpyxl.Workbook()
ws = wb.active
ws.append(["总评论数", "好评数", "中评数", "差评数"])
for data in comments_data:
ws.append(data)
wb.save("comments.xlsx")
wb.close()
```
这次修改后的代码应该能够正确地爬取评论信息并将其存储在comments.xlsx文件中。如果您还有其他问题,请随时提问。非常抱歉给您带来的困扰。
阅读全文