基于DrissionPage、yt_dlp、BeautifulSoup、requests等工具,爬取 B站https://www.bilibili.com/v/popular/weekly/?num=1的所有视频,并保存到本地。
时间: 2024-09-29 10:07:31 浏览: 82
要基于DrissionPage、yt_dlp、BeautifulSoup、requests等Python库爬取B站热门周榜视频并保存到本地,你需要先安装所需的库,例如drissipage用于渲染JavaScript内容,yt_dlp用于下载视频,BeautifulSoup用于解析HTML,requests负责发送HTTP请求。以下是步骤概述:
1. **安装所需库**:
```bash
pip install drission requests beautifulsoup4 yt-dlp
```
2. **导入必要的模块**:
```python
import os
import requests
from bs4 import BeautifulSoup
from drission_page import render
from yt_dlp import YoutubeDL
```
3. **设置yt_dlp下载选项**,为了防止版权问题,可以设置私人模式(`force私人=True`):
```python
ydl_options = {
'format': 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]',
'outtmpl': '%(id)s.%(ext)s',
'restrictfilenames': True,
'noplaylist': True,
'nocheckcertificate': True,
'ignoreerrors': True,
'logtostderr': False,
'forcethumbnail': True,
'playlistend': 'all',
'cachedir': 'videos', # 本地视频存储目录
'private': True
}
```
4. **编写函数获取视频列表和URL**:
```python
def get_html(url):
rendered_html = render(requests.get(url).text)
return rendered_html
def get_video_urls(html):
soup = BeautifulSoup(html, 'html.parser')
video_elements = soup.find_all('li', {'class': 'item'}) # 假设视频元素有特定CSS类名
urls = [element.find('a')['href'] for element in video_elements]
return urls
def download_videos(urls):
with YoutubeDL(ydl_options) as ydl:
for url in urls:
ydl.download([url])
# 获取B站周榜页面
bilibili_url = "https://www.bilibili.com/v/popular/weekly/?num=1"
html = get_html(bilibili_url)
# 解析并下载视频
urls = get_video_urls(html)
download_videos(urls)
阅读全文