try: res = requests.get(url=URL, headers=headers) res.encoding = res.apparent_encoding img_info = re.findall('<a href="(.?)" title=".?target="_blank"><img src="(.?)" alt=".?', res.text) saved_count = 0 # 记录已保存的图片数量 for link, title in img_info: link = "http://www.netbian.com" + link res1 = requests.get(url=link, headers=headers) res1.encoding = res1.apparent_encoding img_url = re.findall('target="_blank"><img src="(.?)" alt=".?', res1.text)

时间: 2024-04-19 22:26:12 浏览: 283

这段代码是一个简单的爬虫程序，使用requests库发送请求并解析网页内容，提取图片链接。具体的步骤如下： 1. 使用requests库发送GET请求，获取目标网页的内容。 2. 设置编码方式，确保正确解析网页内容。 3. 使用正则表达式提取图片链接的相关信息，存储在img_info列表中。 4. 遍历img_info列表，获取每张图片的详细信息。 5. 根据提取到的链接，发送请求获取图片的二进制数据。 6. 可以将图片保存到本地，或者进行其他处理。需要注意的是，该代码只提供了基本的爬取图片链接的功能，并没有完整的保存图片到本地的代码。你可以根据需要自行编写保存图片的逻辑。另外，爬取图片时请遵守相关法律法规，不得侵犯他人权益。

# --coding = utf-8-- import requests import re import os url = 'https://pic.netbian.com/' headers = { 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 Edg/114.0.1823.67' } requests = requests.get(url=url, headers=headers) requests = encoding = requests.apparent_encoding parr = re.compile('src="(/u.?)".alt="(.?)"') image = parr.findall(requests.text) path = '测试' if not os.path.isdir(path): os.mkdir(path) for i in image: link = i[0] # 获取链接 name = i[1] # 获取名字 with open(path + "/{}.jpg".format(name), "wb") as img: res = requests.get("https://pic.netbian.com/" + link) img.write(res.content) img.close() print(name+".jpg 获取成功....")

根据你提供的代码，问题出现在以下两行： ```python requests = encoding = requests.apparent_encoding image = parr.findall(requests.text) ``` 解决这个问题的方法是修改变量名，因为你在代码中重复使用了`requests`变量，将其覆盖为字符串类型，而不是之前的`requests`库对象。你可以将这两行代码修改为以下形式： ```python response = requests.get(url=url, headers=headers) text_content = response.text image = parr.findall(text_content) ``` 这样，你就创建了一个名为`response`的变量来存储请求的响应，然后使用`.text`属性获取响应的文本内容，并将其传递给`parr.findall()`函数进行匹配。请记住，在修改代码之前，建议备份原始代码。完成修改后，你应该不再遇到`AttributeError: 'str' object has no attribute 'text'`错误。

import requests from bs4 import BeautifulSoup def get_html(url): headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} try: response = requests.get(url, headers=headers) response.raise_for_status() response.encoding = response.apparent_encoding return response.text except: return "" def parse_html(html): soup = BeautifulSoup(html, 'html.parser') book_list = soup.find_all('li', class_='subject-item') for book in book_list: title = book.find('div', class_='info').find('a')['title'] author_info = book.find('div', class_='pub').get_text().split('/') author = author_info[0].strip() publisher = author_info[-3].strip() print('书名：', title) print('作者：', author) print('出版社：', publisher) print('--------------------------') if name == 'main': for i in range(0,1000,20): url = 'https://book.douban.com/tag/%E5%8E%86%E5%8F%B2?start='+str(i)+'&type=T' html = get_html(url) parse_html(html)输出结果转为excle表格

好的，你可以使用pandas库将输出的结果转换成Excel表格。以下是修改后的代码： ```python import requests from bs4 import BeautifulSoup import pandas as pd def get_html(url): headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3' } try: response = requests.get(url, headers=headers) response.raise_for_status() response.encoding = response.apparent_encoding return response.text except: return "" def parse_html(html): soup = BeautifulSoup(html, 'html.parser') book_list = soup.find_all('li', class_='subject-item') data = [] for book in book_list: title = book.find('div', class_='info').find('a')['title'] author_info = book.find('div', class_='pub').get_text().split('/') author = author_info[0].strip() publisher = author_info[-3].strip() data.append([title, author, publisher]) return data if __name__ == '__main__': data = [] for i in range(0, 1000, 20): url = 'https://book.douban.com/tag/%E5%8E%86%E5%8F%B2?start=' + str(i) + '&type=T' html = get_html(url) data += parse_html(html) # 将数据转换成DataFrame，并输出到Excel表格 df = pd.DataFrame(data, columns=['书名', '作者', '出版社']) df.to_excel('history_books.xlsx', index=False) ``` 在这段代码中，我们在`parse_html`函数中将每本书的信息存储在一个列表中，然后在主函数中将所有列表合并成一个大列表。最后，使用pandas库将这个大列表转换成DataFrame，并输出到Excel表格中。输出的Excel表格文件名为`history_books.xlsx`，不包含行索引。

阅读全文

相关推荐

Python程序设计：requests模块.pptx

Python requests获取网页常用方法解析

01-Requests库入门.pdf

python爬虫.docx

python爬虫小实例.docx

Python基于requests库爬取网站信息

最新推荐

基于微信小程序的校园论坛；微信小程序；云开发；云数据库；云储存；云函数；纯JS无后台；全部资料+详细文档+高分项目.zip

单电阻采样 基于单电阻采样的相电流重构算法 keil完整工程 单电阻采样 f103的单电阻，完整工程，带文档，带硬件资料 f3平台的单电阻完整工程，代码详细注释 还有微芯的单电阻smo代码加文档

GitHub图片浏览插件：直观展示代码中的图像

管理建模和仿真的文件

【OPPO手机故障诊断专家】：工程指令快速定位与解决

求[100，900]之间相差为12的素数对（注：要求素数对的两个素数均在该范围内）的个数

Android IPTV项目：直播频道的实时流媒体实现

"互动学习：行动中的多样性与论文攻读经历"

【OPPO手机工程模式终极指南】：掌握这些秘籍，故障排查不再难！

前端在json文件里写模板，可以换行 有空格现在在文本框的时候

单电阻采样基于单电阻采样的相电流重构算法 keil完整工程单电阻采样 f103的单电阻，完整工程，带文档，带硬件资料 f3平台的单电阻完整工程，代码详细注释还有微芯的单电阻smo代码加文档

前端在json文件里写模板，可以换行有空格现在在文本框的时候