首页用python的requests数据库做一个爬https://www.csdn.net网站的爬虫

用python的requests数据库做一个爬https://www.csdn.net网站的爬虫

时间: 2024-01-15 10:05:11 浏览: 98

基于python的网页获取爬虫

一个相对标准的简单爬虫，根据用户输入获取网页源码并保存在本地。通过requests库获取到网页源码，decode解码，然后打开文件将解码后的文档以用户输入的保存文件名保存在与该爬虫相同目录下。

好的，那我们来编写一个简单的Python程序，使用requests库爬取CSDN网站首页的标题信息。 ```python import requests from bs4 import BeautifulSoup url = 'https://www.csdn.net/' res = requests.get(url) soup = BeautifulSoup(res.text, 'html.parser') titles = soup.find_all('div', class_='title') for title in titles: print(title.text.strip()) ``` 这段代码使用requests库向CSDN网站的首页发出请求，并使用BeautifulSoup库解析网页内容，最后获取网页中所有class为"title"的div标签，并输出它们的文本内容。需要注意的是，由于CSDN网站使用了反爬措施，所以我们需要在请求头中加入一些信息，以模拟浏览器的请求。可以将上述代码中的`res = requests.get(url)`改为以下代码： ```python headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} res = requests.get(url, headers=headers) ``` 这样就可以愉快地爬取CSDN网站的信息了。

阅读全文