爬取豆瓣音乐250的歌名,歌手,发行时间,音乐类型,评论人数,评论。并保存到csv文件
时间: 2024-03-20 13:41:06 浏览: 212
利用python爬取豆瓣音乐TOP250的数据----最终结果生成一个music.csv文件
好的,我可以帮你实现爬取豆瓣音乐 top250 的歌名、歌手、发行时间、音乐类型、评论人数和评论,并将数据保存到 CSV 文件中。
代码如下:
```python
import csv
import requests
from lxml import etree
def get_music_info(music_url):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(music_url, headers=headers)
html = response.text
tree = etree.HTML(html)
name = tree.xpath('//h1/span/text()')[0] # 获取歌名
artist = tree.xpath('//span[@class="pl"]/a/text()')[0] # 获取歌手
info = tree.xpath('//div[@id="info"]')[0] # 获取信息块
pub_date = info.xpath('.//span[@class="pl"][contains(text(),"时间")]/following-sibling::text()')[0] # 获取发行时间
music_type = info.xpath('.//span[@class="pl"][contains(text(),"流派")]/following-sibling::text()')[0] # 获取音乐类型
comment_num = tree.xpath('//div[@id="comments-section"]/h2/span/a/text()')[0] # 获取评论人数
comments = tree.xpath('//div[@class="comment"]/p/span/text()') # 获取评论
return name, artist, pub_date, music_type, comment_num, comments
def save_to_csv(music_info_list):
with open('douban_music_top250.csv', 'w', encoding='utf-8', newline='') as f:
writer = csv.writer(f)
writer.writerow(['歌名', '歌手', '发行时间', '音乐类型', '评论人数', '评论'])
for music_info in music_info_list:
writer.writerow(music_info)
def main():
url = 'https://music.douban.com/top250'
music_urls = []
for start in range(0, 250, 25):
music_url = f'{url}?start={start}'
music_urls.append(music_url)
music_info_list = []
for music_url in music_urls:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(music_url, headers=headers)
html = response.text
tree = etree.HTML(html)
music_urls = tree.xpath('//div[@class="pl2"]/a/@href') # 获取每个音乐详情页面的链接
for music_url in music_urls:
music_info = get_music_info(music_url)
music_info_list.append(music_info)
save_to_csv(music_info_list)
if __name__ == '__main__':
main()
```
这样,就可以爬取豆瓣音乐 top250 的歌名、歌手、发行时间、音乐类型、评论人数和评论,并将数据保存到 CSV 文件中了。注意,豆瓣的反爬虫机制比较严格,建议加上适当的延时和随机 User-Agent 来避免被封禁。
阅读全文