selenium爬虫实例
时间: 2023-05-25 07:03:07 浏览: 140
以下是使用Selenium模拟浏览器爬取腾讯课堂中的课程信息的示例:
```python
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from bs4 import BeautifulSoup
# 启动浏览器
driver = webdriver.Chrome()
# 打开腾讯课堂
driver.get("https://ke.qq.com/")
# 等待搜索框加载完成
try:
element_present = EC.presence_of_element_located((By.ID, 'js-header-search-input'))
WebDriverWait(driver, 10).until(element_present)
except TimeoutException:
print("Timed out waiting for page to load")
# 模拟搜索
search_box = driver.find_element_by_id('js-header-search-input')
search_box.send_keys('Python')
search_box.send_keys(Keys.RETURN)
# 等待搜索结果页面加载完成
try:
element_present = EC.presence_of_element_located((By.XPATH, '//div[@class="skill-cards-wrap"]'))
WebDriverWait(driver, 10).until(element_present)
except TimeoutException:
print("Timed out waiting for page to load")
# 获取搜索结果的HTML代码
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
# 解析HTML代码并打印课程信息
course_list = soup.find_all('div', {'class': 'item-box'})
for course in course_list:
title = course.find('a', {'class': 'item-tt-link'}).get_text()
teacher = course.find('div', {'class': 'item-line item-line--middle'}).find_all('a')[1].get_text()
price = course.find('div', {'class': 'line-cell item-line--price'}).find('span', {'class': 'line-cell__main'})
if price:
price = price.get_text()
else:
price = '免费'
print('标题:{},讲师:{},价格:{}'.format(title, teacher, price))
# 关闭浏览器
driver.quit()
```
上述代码模拟打开腾讯课堂的网站,搜索Python课程,然后解析HTML代码,获取课程的标题、讲师和价格信息,最后在控制台打印出来。
阅读全文