首页selenium爬虫微博

selenium爬虫微博

时间: 2023-12-01 17:43:21 浏览: 106

以下是使用Selenium爬取微博的步骤： 1.安装Selenium和ChromeDriver ```shell pip install selenium ``` ChromeDriver下载地址：http://chromedriver.chromium.org/downloads 2.导入Selenium库和时间库 ```python from selenium import webdriver import time ``` 3.设置ChromeDriver路径和浏览器选项 ```python chrome_options = webdriver.ChromeOptions() chrome_options.add_argument('--headless') # 无头模式，不打开浏览器界面 chrome_options.add_argument('--disable-gpu') # 禁用GPU加速 chrome_options.add_argument('--no-sandbox') # 沙盒模式 chrome_options.add_argument('--disable-dev-shm-usage') # 禁用/dev/shm使用 driver = webdriver.Chrome(executable_path='/path/to/chromedriver', chrome_options=chrome_options) ``` 4.打开微博登录页面并登录 ```python driver.get('https://weibo.com/login.php') time.sleep(10) # 等待页面加载完成 driver.find_element_by_name('username').send_keys('your_username') # 输入用户名 driver.find_element_by_name('password').send_keys('your_password') # 输入密码 driver.find_element_by_class_name('W_btn_a').click() # 点击登录按钮 time.sleep(10) # 等待页面加载完成 ``` 5.搜索关键词并获取微博内容和评论 ```python driver.get('https://s.weibo.com/weibo?q=your_keyword') # 搜索关键词 time.sleep(10) # 等待页面加载完成 weibo_list = driver.find_elements_by_xpath('//div[@class="content"]/p[@class="txt"]') # 获取微博内容 comment_list = driver.find_elements_by_xpath('//div[@class="content"]/div[@class="card-act"]/ul/li[2]/a') # 获取评论数 for i in range(len(weibo_list)): print('微博内容：', weibo_list[i].text) print('评论数：', comment_list[i].text) ```

阅读全文