Python 2.7 urllib2模块详解教程

需积分: 41 122 浏览量更新于2024-09-09 收藏 140KB PDF 举报

“Python 2.7 urllib2的详细使用教程” Python 2.7 的 `urllib2` 模块是用于获取网络资源的核心工具，它提供了简单的接口来请求各种不同协议下的URL。本教程将深入讲解如何使用 `urllib2` 进行网络资源的抓取。 2. 请求URL `urllib2` 主要通过 `urlopen()` 函数工作，这个函数可以用于打开并读取一个URL。你可以直接传入URL字符串，例如： ```python import urllib2 response = urllib2.urlopen('http://example.com') ``` `urlopen()` 返回一个 `Response` 对象，可以从该对象中获取网页内容。 2.1 数据如果你需要向服务器发送数据，例如POST请求，可以使用 `Request` 类，像这样： ```python data = urllib.urlencode({'key': 'value'}) request = urllib2.Request('http://example.com', data) response = urllib2.urlopen(request) ``` 2.2 头部信息为了设置HTTP头部信息，可以在创建 `Request` 对象时添加 `headers` 参数： ```python headers = {'User-Agent': 'Mozilla/5.0'} request = urllib2.Request('http://example.com', headers=headers) ``` 3. 异常处理 `urllib2` 抛出 `URLError` 和 `HTTPError` 异常来处理请求中的错误。 3.1 URLError 当发生网络问题（如超时或无法连接）时，会抛出 `URLError`。 ```python try: response = urllib2.urlopen('http://nonexistent.example.com') except urllib2.URLError as e: print("Error:", e.reason) ``` 3.2 HTTPError 如果服务器返回非200状态码，会抛出 `HTTPError`。这个异常通常包含服务器返回的错误代码。 ```python try: response = urllib2.urlopen('http://example.com', 'forbidden_data') except urllib2.HTTPError as e: print("HTTP Error:", e.code) ``` 3.3 错误处理封装为了更优雅地处理错误，你可以编写一个函数来包装 `urlopen()`： ```python def open_url(url): try: return urllib2.urlopen(url) except urllib2.URLError as e: print("URL Error:", e) except urllib2.HTTPError as e: print("HTTP Error:", e) ``` 4. info() 和 geturl() `Response` 对象的 `info()` 方法返回HTTP响应头信息，而 `geturl()` 返回实际访问的URL，以防重定向。 5. 打开器和处理器 `OpenerDirector` 是 `urllib2` 提供的一个高级接口，可以管理多个处理器（Handler）。例如，你可以使用 `ProxyHandler` 处理代理，`HTTPBasicAuthHandler` 处理基本认证： ```python proxy_handler = urllib2.ProxyHandler({'http': 'http://proxy.example.com:8080'}) auth_handler = urllib2.HTTPBasicAuthHandler() opener = urllib2.build_opener(proxy_handler, auth_handler) urllib2.install_opener(opener) # 现在的请求都会使用设置好的处理器 response = urllib2.urlopen('http://protected.example.com') ``` 6. 基本认证对于需要用户名和密码的网站，可以使用 `HTTPBasicAuthHandler` 处理器： ```python password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm() password_mgr.add_password(None, 'http://protected.example.com', 'username', 'password') auth_handler = urllib2.HTTPBasicAuthHandler(password_mgr) opener = urllib2.build_opener(auth_handler) ``` 7. 代理使用 `ProxyHandler` 来指定HTTP代理： ```python proxy_handler = urllib2.ProxyHandler({'http': 'http://proxy.example.com:8080'}) opener = urllib2.build_opener(proxy_handler) urllib2.install_opener(opener) ``` 8. Sockets和套接字层 `urllib2` 内部使用了Python的 `socket` 模块来处理网络通信。你可以自定义套接字行为，例如设置超时时间： ```python import socket socket.setdefaulttimeout(5) # 设置全局超时时间为5秒 response = urllib2.urlopen('http://slow.example.com') ``` 这个简要教程涵盖了 `urllib2` 的基本用法，包括请求URL、处理数据和头部、异常、认证以及代理。然而，实际使用中可能需要根据具体需求进行更复杂的定制。了解这些概念后，你就可以开始编写更复杂的网络爬虫或者API客户端了。

HOWTO Fetch Internet Resources

Using urllib2

Release 2.7.13

Guido van Rossum

and the Python development team

March 30, 2017

Python Software Foundation

Email: docs@python.org

Contents

1 Introduction 2

2 Fetching URLs 2

2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Handling Exceptions 4

3.1 URLError . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3.2 HTTPError . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Error Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.3 Wrapping it Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Number 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Number 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 info and geturl 7

5 Openers and Handlers 7

6 Basic Authentication 8

7 Proxies 9

8 Sockets and Layers 9

9 Footnotes 9

Index 10

Author Michael Foord

Note: There is a French translation of an earlier revision of this HOWTO, available at urllib2 - Le Manuel

manquant.

下载后可阅读完整内容，剩余9页未读，立即下载

隐含词汇

粉丝: 0
资源: 5

Python 2.7 urllib2模块详解教程

Python urllib urllib2 urllib模块安装说明

python urllib3

python写的urllib2下载文件基于ntlm

python 2.7

Python 2.7 Script

IDLE PYthon2.7

python2.7&wxPython

python 2.7 资源包

python2.7.zip

python2.7基础语法操作

最新资源