帮我写一个完整的python爬取豆瓣top100电影的海报的完整代码
时间: 2024-11-03 11:16:06 浏览: 39
在Python中,我们可以使用requests库来获取网页内容,然后BeautifulSoup库解析HTML并提取我们需要的数据,最后使用`urllib.request.urlretrieve`保存图片到本地。这里是一个简单的示例,用于爬取豆瓣Top100电影的海报链接:
```python
import requests
from bs4 import BeautifulSoup
import os
# 豆瓣Top100电影URL
url = "https://movie.douban.com/top250"
def download_image(title, img_url):
# 定义保存路径,如果文件夹不存在就创建
save_path = f"images/{title}.jpg"
if not os.path.exists("images"):
os.makedirs("images")
try:
response = requests.get(img_url, stream=True)
response.raise_for_status() # 检查请求是否成功
with open(save_path, 'wb') as f:
for chunk in response.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
print(f"海报下载完成: {save_path}")
except Exception as e:
print(f"下载失败: {e}")
def main():
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
# 解析电影标题及海报链接
movie_containers = soup.find_all('div', class_='hd')
for container in movie_containers:
title = container.find('span', itemprop='name').text.strip()
img_url = container.find('img', alt=True)['src']
# 下载海报
download_image(title, img_url)
if __name__ == "__main__":
main()
```
阅读全文