首页python爬虫+基础

python爬虫+基础

时间: 2023-12-22 16:29:46 浏览: 35

Python爬虫是指使用Python编写的程序，用于自动化地从互联网上获取数据。它可以模拟人的行为，访问网页、提取数据、保存数据等。下面是Python爬虫的基础知识和示例代码： 1. 使用requests库发送HTTP请求获取网页内容： ```python import requests url = 'https://www.example.com' response = requests.get(url) html = response.text print(html) ``` 2. 使用BeautifulSoup库解析网页内容： ```python from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'html.parser') title = soup.title.text print(title) ``` 3. 使用正则表达式提取特定信息： ```python import re pattern = r'<a href="(.*?)">(.*?)</a>' links = re.findall(pattern, html) for link in links: print(link[0], link[1]) ``` 4. 使用XPath提取特定信息： ```python from lxml import etree tree = etree.HTML(html) titles = tree.xpath('//h2/text()') for title in titles: print(title) ``` 5. 使用Selenium库模拟浏览器行为： ```python from selenium import webdriver driver = webdriver.Chrome() driver.get(url) element = driver.find_element_by_xpath('//input[@name="username"]') element.send_keys('admin') ``` 6. 使用Scrapy框架进行高级爬虫开发： ```python import scrapy class MySpider(scrapy.Spider): name = 'example' start_urls = ['https://www.example.com'] def parse(self, response): title = response.css('h1::text').get() yield {'title': title} ```