python爬虫+基础
时间: 2023-12-22 16:29:46 浏览: 35
Python爬虫是指使用Python编写的程序,用于自动化地从互联网上获取数据。它可以模拟人的行为,访问网页、提取数据、保存数据等。下面是Python爬虫的基础知识和示例代码:
1. 使用requests库发送HTTP请求获取网页内容:
```python
import requests
url = 'https://www.example.com'
response = requests.get(url)
html = response.text
print(html)
```
2. 使用BeautifulSoup库解析网页内容:
```python
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
title = soup.title.text
print(title)
```
3. 使用正则表达式提取特定信息:
```python
import re
pattern = r'<a href="(.*?)">(.*?)</a>'
links = re.findall(pattern, html)
for link in links:
print(link[0], link[1])
```
4. 使用XPath提取特定信息:
```python
from lxml import etree
tree = etree.HTML(html)
titles = tree.xpath('//h2/text()')
for title in titles:
print(title)
```
5. 使用Selenium库模拟浏览器行为:
```python
from selenium import webdriver
driver = webdriver.Chrome()
driver.get(url)
element = driver.find_element_by_xpath('//input[@name="username"]')
element.send_keys('admin')
```
6. 使用Scrapy框架进行高级爬虫开发:
```python
import scrapy
class MySpider(scrapy.Spider):
name = 'example'
start_urls = ['https://www.example.com']
def parse(self, response):
title = response.css('h1::text').get()
yield {'title': title}
```