import requests from bs4 import BeautifulSoup import threading import time headers = { "User-Agent": 'Mozilla/5.0 (Windows NT 6.1; WOW64) ' 'AppleWebKit/537.36 (KHTML, like Gecko)' 'Chrome/90.0.4430.212 Safari/537.36' } def download(url): start_time = time.time() # 记录开始时间 response = requests.get(url, headers=headers).text soup = BeautifulSoup(response, features='lxml') src = soup.find_all('img') imagesrc = soup.find_all('img', width="100") for s in imagesrc: with open("{}.jpg".format(s.get('alt')), 'wb') as file: image = requests.get(s.get('src')).content file.write(image) print("正在下载" + s.get('alt') + '.jpg') end_time = time.time() # 记录结束时间 print("线程 {} 运行时间为：{} 秒".format(threading.current_thread().name, end_time - start_time)) threads = [] for x in range(10): url = "https://movie.douban.com/top250?start={}&filter=".format(x * 25) thread = threading.Thread(target=download, args=(url,), name="Thread-{}".format(x+1)) threads.append(thread) thread.start() for thread in threads: thread.join()改为单线程

import requests from bs4 import BeautifulSoup import threading headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 6.1; WOW64) ' 'AppleWebKit/537.36 (KHTML, like Gecko)' 'Chrome/90.0.4430.212 Safari/537.36'} # 定义下载函数 def download(url): response = requests.get(url, headers=headers).text soup = BeautifulSoup(response, features='lxml') src = soup.find_all('img') imagesrc = soup.find_all('img', width="100") for s in imagesrc: with open("{}.jpg".format(s.get('alt')), 'wb') as file: image = requests.get(s.get('src')).content file.write(image) print("正在下载" + s.get('alt') + '.jpg') # 开10个线程下载 threads = [] for x in range(10): url = "https://movie.douban.com/top250?start={}&filter=".format(x * 25) thread = threading.Thread(target=download, args=(url,)) threads.append(thread) thread.start() # 等待所有线程结束 for thread in threads: thread.join()加个显示运行时间的代码

"User-Agent": 'Mozilla/5.0 (Windows NT 6.1; WOW64) ' 'AppleWebKit/537.36 (KHTML, like Gecko)' 'Chrome/90.0.4430.212 Safari/537.36' } # 定义下载函数 def download(url): start_time = time.time() # ...

import requests from bs4 import BeautifulSoup headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.50'} url='http://www.biquge5200.cc' url1='http://www.biquge5200.cc/0_111/' resoponse=requests.get(url1,headers=headers,timeout=50) html=BeautifulSoup(resoponse.text,"html.parser") href=html.find('div',{'id':'list'}) href_list=[] for i in href.find_all("dd"): a=i.find('a') href=a['href'] urls=url+href href_list.append(urls) for h in href_list: try: responses=requests.get(h,headers=headers,timeout=50) htmls=BeautifulSoup(responses.text,'html.parser') title=htmls.find('div',{'class':'bookname'}).h1.text content=htmls.find('div',{'id':'content'}).text with open(f'D:\python案例\高武：神话最强传说\{title}.txt','w',encoding='utf-8')as f: f.write(content) print(title,'下载完成') except: continue 添加多个线程

'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.50' } url = 'http://www.biquge5200.cc' url1 = '...

python-requests-百度热搜关键字爬虫

标题“python-requests-百度热搜关键字爬虫”指的是一个使用Python编程语言，结合requests库来抓取百度搜索引擎上的热门搜索关键词的项目。requests库是Python中广泛使用的HTTP客户端库，用于发送HTTP请求，如GET和...

Python中的网络爬虫：Requests与BeautifulSoup

# 1. 网络爬虫概述 ## 1.1 什么是网络爬虫？网络爬虫是一种自动化程序，可以模拟人类在网络上浏览、访问和提取信息的...4. 循环执行：根据需求，循环执行以上步骤，爬取多个网页的数据。 ## 1.2 网络爬虫的应用领

优化BeautifulSoup爬虫的策略：减少网络请求次数

[优化BeautifulSoup爬虫的策略：减少网络请求次数](https://img-blog.csdnimg.cn/20190615235856212.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9pY29kZS5ibG9nLmNzZG4...

实战演练：从零开始用BeautifulSoup构建电商评论爬虫

[实战演练：从零开始用BeautifulSoup构建电商评论爬虫](https://img-blog.csdnimg.cn/20190120164642154.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0...

项目案例分析：BeautifulSoup在自动化新闻聚合器中的应用

![python库文件学习之BeautifulSoup](https://img-blog.csdnimg.cn/20200129111729962.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5...## 1.1 BeautifulSoup概述 BeautifulSoup是一个Python库，用于解析HTML

Python扩展库全面指南：除了BeautifulSoup，这些库也值得一试

[Python扩展库全面指南：除了BeautifulSoup，这些库也值得一试](https://img-blog.csdnimg.cn/img_convert/b5b8c6df4302386f8362b6774fbbc5c9.png) # 1. Python扩展库概述在如今的IT行业中，Python已经成为了最...

如何有效处理BeautifulSoup爬虫中的HTTP错误

![如何有效处理BeautifulSoup爬虫中的HTTP错误](https://img-blog.csdnimg.cn/20190616000240297.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,...其中，1xx表示信息，2xx表示成功，3xx表示重定向，4xx

如何防止BeautifulSoup爬虫陷入死循环的技术手段

![如何防止BeautifulSoup爬虫陷入死循环的技术手段]...发送请求是爬取网页的第一步，可以使用Python的requests库进行。解析网页内容主要通过BeautifulSoup提供的方法，如find、find_

揭秘Python Requests库：从入门到精通的实战宝典

Python Requests库简介** Requests库是Python中一个用于发送HTTP请求的强大且易于使用的库。它简化了HTTP请求和响应处理，为开发人员提供了与Web服务交互的便捷方式。Requests库具有以下主要特性： - **易于使用...

探索Python爬虫：利用Requests库进行简单网页数据抓取

# 1. Python爬虫简介 ## 1.1 什么是爬虫爬虫（Web Spider）是指按照一定的规则，自动地抓取互联网信息的程序或者脚本。它可以按照一定的规则，自动地抓取互联网信息，广泛应用于搜索引擎、数据分析、信息监测等...

使用BeautifulSoup进行网页链接爬取时的常见挑战与解决

![使用BeautifulSoup进行网页链接爬取时的常见挑战与解决]...# 1. 引言在当今信息爆炸的时代，网页链接爬取成为获取大量数据的重要手段。BeautifulSoup作为一个强大的Python库，能够解析网页内容，提取有用信息，为...

Python Requests库高级用法：探索隐藏的特性和扩展功能

![Python Requests库高级用法：探索隐藏的特性和扩展功能]... Requests库基础 Requests库是Python中一个强大的HTTP库，用于发送HTTP请求并处理响应。它提供了丰富的功能，使开发人员能够

Python网络编程：使用socket和requests处理网络请求的终极教程

[Python网络编程：使用socket和requests处理网络请求的终极教程](https://img-blog.csdnimg.cn/5dc57445225a4fdfb394147729d481c3.png) # 1. Python网络编程基础 Python网络编程是利用Python语言与网络进行交互和...

Python Requests库与大数据联姻：处理来自Web的大量数据，游刃有余

Python Requests库简介 Requests库是一个用于Python编程语言的HTTP库，它简化了发送HTTP请求和处理HTTP响应的过程。Requests库具有以下特点： - **易于使用：**Requests库提供了直观且易于使用的API，使开发人员...

响应处理专家：requests库中处理内容的高级技巧（响应大师）

在本章节中，我们将探讨如何使用Python的Requests库来发送Web请求。Requests是一个简单易用的HTTP库，能够支持多种网络操作。 ## 基础知识 Requests库简化了网络请求的发送过程。它允许我们以一种非常直观的方式...

相关推荐

解决Python调试import requests报错及环境变量配置

随机User-Agent策略：防止爬虫的简单方法

Python工具Bloodhound-import：导入Bloodhound数据至Neo4j

python-requests-百度热搜关键字爬虫

Python中的网络爬虫：Requests与BeautifulSoup

优化BeautifulSoup爬虫的策略：减少网络请求次数

实战演练：从零开始用BeautifulSoup构建电商评论爬虫

项目案例分析：BeautifulSoup在自动化新闻聚合器中的应用

Python扩展库全面指南：除了BeautifulSoup，这些库也值得一试

如何有效处理BeautifulSoup爬虫中的HTTP错误

如何防止BeautifulSoup爬虫陷入死循环的技术手段

揭秘Python Requests库：从入门到精通的实战宝典

探索Python爬虫：利用Requests库进行简单网页数据抓取

使用BeautifulSoup进行网页链接爬取时的常见挑战与解决

Python Requests库高级用法：探索隐藏的特性和扩展功能

Python网络编程：使用socket和requests处理网络请求的终极教程

Python Requests库与大数据联姻：处理来自Web的大量数据，游刃有余

响应处理专家：requests库中处理内容的高级技巧（响应大师）

最新推荐

Python爬取当当、京东、亚马逊图书信息代码实例

毕设和企业适用springboot企业数据管理平台类及跨境电商管理平台源码+论文+视频.zip

基于net的超市管理系统源代码（完整前后端+sqlserver+说明文档+LW）.zip

LABVIEW程序实例-公式节点.zip

毕设和企业适用springboot社交应用平台类及用户数据分析平台源码+论文+视频.zip

Windows平台下的Fastboot工具使用指南

管理建模和仿真的文件

DLMS规约深度剖析：从基础到电力通信标准的全面掌握

修改代码，使其正确运行

Python机器学习基础入门与项目实践