python爬虫简单万能代码
时间: 2023-05-04 11:04:55 浏览: 681
万能图片爬虫python
5星 · 资源好评率100%
Python语言是一种功能十分强大的语言,同时也具备着强大的网页抓取功能。以下是一份简单易懂的Python爬虫万能代码:
```
import requests
from bs4 import BeautifulSoup
url = '要爬取的网址'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
# 爬取所有的a标签
for link in soup.find_all('a'):
print(link.get('href'))
# 爬取某一类标签
for tag in soup.find_all('标签名', class_='class名'):
print(tag.text.strip())
# 爬取正文
main_content = soup.find('div', id='正文id名')
print(main_content.text)
# 带参数的请求
params = {'key1': 'value1', 'key2': 'value2'}
r = requests.get(url, params=params)
# 带headers的请求
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
r = requests.get(url, headers=headers)
# POST请求
data = {'key1': 'value1', 'key2': 'value2'}
r = requests.post(url, data=data)
# 保存图片或文件
image_url = '要保存的图片或文件的网址'
r = requests.get(image_url)
with open('image.jpg', 'wb') as f:
f.write(r.content)
# 反爬虫机制
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3',
'Referer': '要模拟的Referer',
'Cookie': '要模拟的Cookie'
}
r = requests.get(url, headers=headers)
```
以上就是Python爬虫简单万能代码的介绍,虽然简单,但是包含了很多爬虫中常用的功能,如需更详细的代码请自行查询文档。同时在使用爬虫程序的过程中需要注意遵循爬虫道德规范。
阅读全文