python爬虫豆瓣音乐
时间: 2023-08-28 15:19:10 浏览: 105
好的,以下是一个Python实现的豆瓣音乐爬虫示例:
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd
# 爬取豆瓣音乐Top250的前10页
urls = ['https://music.douban.com/top250?start={}'.format(i * 25) for i in range(10)]
# 采集页面信息并解析
music_infos = []
for url in urls:
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
items = soup.select('div#content div.article table tbody tr')
for item in items:
name = item.select('td a')[0].text.strip()
artist = item.select('td:nth-child(2)')[0].text.strip()
rating = float(item.select('td span.rating_num')[0].text.strip())
comment_count = int(item.select('td div.star span:nth-child(4)')[0].text.strip()[:-3])
music_infos.append((name, artist, rating, comment_count))
# 清洗和去重数据
cleaned_music_infos = list(set([(name.strip(), artist.strip(), rating, comment_count) for name, artist, rating, comment_count in music_infos if name.strip() and artist.strip()]))
# 存储数据到CSV文件
df = pd.DataFrame(cleaned_music_infos, columns=['name', 'artist', 'rating', 'comment_count'])
df.to_csv('douban_music_top250.csv', index=False)
```
这个示例爬取豆瓣音乐Top250的前10页,获取每首音乐的名称、艺术家、评分和评论数,并将其存储到CSV文件中。您可以根据实际需求进行修改和扩展。
阅读全文