爬取某股吧内部细致评论
时间: 2023-09-11 13:03:19 浏览: 110
好的,如果你想爬取某股吧内部更细致的评论,可以使用selenium模拟点击评论,然后再爬取评论的详细内容。以下是爬取的代码示例:
```python
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
# 股票代码
stock_code = '600519'
# 股吧URL
url = f'https://guba.eastmoney.com/list,{stock_code}.html'
# 设置浏览器选项
options = Options()
options.add_argument('--headless') # 无头模式
options.add_argument('--disable-gpu') # 禁用GPU加速
# 初始化浏览器驱动
driver_path = 'chromedriver.exe' # Chrome驱动程序路径
driver = webdriver.Chrome(executable_path=driver_path, options=options)
# 请求页面内容
driver.get(url)
# 点击“查看更多”按钮,展开评论
button = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, 'articlehPages')))
driver.execute_script("arguments[0].scrollIntoView();", button)
driver.execute_script("arguments[0].click();", button)
# 等待评论加载完成
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, 'articleh')))
# 解析页面内容
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
comments = soup.find_all('div', {'class': 'articleh'})
for comment in comments:
# 获取评论内容
content = comment.find('span', {'class': 'l3'})
if content:
# 点击评论,展开详细内容
comment_link = comment.find('a', {'class': 'linkblack'})
if comment_link:
driver.execute_script("arguments[0].scrollIntoView();", comment_link)
driver.execute_script("arguments[0].click();", comment_link)
# 等待评论详细内容加载完成
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, 'stockcodec')))
# 解析评论详细内容
comment_html = driver.page_source
comment_soup = BeautifulSoup(comment_html, 'html.parser')
comment_content = comment_soup.find('div', {'class': 'stockcodec'})
if comment_content:
print(comment_content.text.strip())
# 关闭浏览器驱动
driver.quit()
```
以上代码会输出该股票吧中所有用户的评论详细内容,你可以根据需要对代码进行修改和优化。请注意,爬取网站的数据时请遵循相关法律法规和网站的规定,不要进行恶意爬取。
阅读全文