首页爬取一万条微博新闻评论的代码

爬取一万条微博新闻评论的代码

时间: 2023-06-27 14:04:38 浏览: 118

由于微博的限制，无法直接获取全部评论，需要通过模拟用户操作来获取评论。以下是使用 Python 和 Selenium 库爬取一万条微博新闻评论的示例代码： ```python from selenium import webdriver from selenium.webdriver.common.keys import Keys from selenium.webdriver.common.action_chains import ActionChains import time # 设置 ChromeOptions options = webdriver.ChromeOptions() options.add_argument('--no-sandbox') options.add_argument('--disable-dev-shm-usage') options.add_argument('--headless') # 启动 Chrome 浏览器 browser = webdriver.Chrome(options=options) # 打开微博新闻页面 browser.get('https://weibo.com/ttarticle/p/show?id=2309404658191947859681') # 等待页面加载完成 time.sleep(5) # 模拟鼠标滚动，加载更多评论 for i in range(50): actions = ActionChains(browser) actions.send_keys(Keys.END) actions.perform() time.sleep(2) # 获取所有评论框 comment_boxes = browser.find_elements_by_xpath('//div[@class="WB_text"]') # 获取前一万条评论文本 comments = [] for box in comment_boxes: comments.append(box.text) if len(comments) >= 10000: break # 关闭浏览器 browser.quit() # 输出评论文本 for comment in comments: print(comment) ``` 以上代码模拟打开微博新闻页面，通过模拟鼠标滚动加载更多评论，然后获取前一万条评论文本。注意，由于微博的反爬虫机制，可能需要在代码中添加一些延时等待，或者使用代理 IP 等方法来提高成功率。

阅读全文