用python 爬取豆瓣网Top250电影图片,将电影图片获取后存储到films_pic文件夹中;
时间: 2024-04-04 21:29:36 浏览: 12
好的,这是一个比较简单的爬虫任务,可以使用`requests`库和`beautifulsoup4`库来实现。以下是代码实现的步骤:
1. 导入所需库
```python
import os
import requests
from bs4 import BeautifulSoup
```
2. 确定要爬取的网页链接
```python
url = 'https://movie.douban.com/top250'
```
3. 发送请求获取网页内容,并用 BeautifulSoup 解析
```python
resp = requests.get(url)
soup = BeautifulSoup(resp.text, 'html.parser')
```
4. 找到电影图片的标签,并获取图片链接
```python
img_tags = soup.select('.pic img')
img_links = [tag['src'] for tag in img_tags]
```
5. 创建文件夹用来存储电影图片
```python
if not os.path.exists('films_pic'):
os.mkdir('films_pic')
```
6. 遍历图片链接,下载并保存图片
```python
for i, link in enumerate(img_links):
img_resp = requests.get(link)
with open(f'films_pic/{i+1}.jpg', 'wb') as f:
f.write(img_resp.content)
```
完整代码如下:
```python
import os
import requests
from bs4 import BeautifulSoup
url = 'https://movie.douban.com/top250'
resp = requests.get(url)
soup = BeautifulSoup(resp.text, 'html.parser')
img_tags = soup.select('.pic img')
img_links = [tag['src'] for tag in img_tags]
if not os.path.exists('films_pic'):
os.mkdir('films_pic')
for i, link in enumerate(img_links):
img_resp = requests.get(link)
with open(f'films_pic/{i+1}.jpg', 'wb') as f:
f.write(img_resp.content)
```
注意:请在下载图片时尊重豆瓣网站的版权,勿用于商业用途。