首页pythonpython爬取网站资源

pythonpython爬取网站资源

时间: 2023-09-24 13:13:50 浏览: 123

python 爬取网页

python 爬取网页爬取豆瓣网top250电影

要使用 Python 爬取网站资源，可以使用 Python 中的 requests 库和 BeautifulSoup 库。首先，使用 requests 库发送 HTTP 请求，获取网页的 HTML 内容。例如： ```python import requests url = 'http://example.com' response = requests.get(url) html_content = response.content ``` 然后，使用 BeautifulSoup 库解析 HTML 内容，提取需要的信息。例如： ```python from bs4 import BeautifulSoup soup = BeautifulSoup(html_content, 'html.parser') title = soup.title.string links = soup.find_all('a') ``` 其中，`title` 变量保存网页标题，`links` 变量保存所有链接元素。如果需要爬取的是图片或其他二进制文件，可以使用 requests 库的 `content` 属性获取二进制内容。例如： ```python url = 'http://example.com/image.jpg' response = requests.get(url) image_content = response.content ``` 然后，将 `image_content` 写入本地文件即可。需要注意的是，爬取网站资源需要遵守相关法律法规和网站的规定，不要进行非法爬虫行为。同时，爬虫程序也需要注意代码质量和效率，不要给目标网站带来过大的负担。

阅读全文