请用Python写出一段代码爬取豆瓣电影TOP30数据
时间: 2023-10-07 18:06:45 浏览: 42
好的,以下是用Python爬取豆瓣电影TOP30数据的代码:
```python
import requests
from bs4 import BeautifulSoup
url = 'https://movie.douban.com/top250'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
movies = soup.select('.info')
for movie in movies:
title = movie.select_one('.title').text.strip()
rating = movie.select_one('.rating_num').text.strip()
quote = movie.select_one('.quote').text.strip()
print(f'Title: {title}\nRating: {rating}\nQuote: {quote}\n')
```
运行代码后,会输出豆瓣电影TOP30中每一部电影的标题、评分和经典台词。
相关问题
帮我写一段代码,用python爬取豆瓣电影top250可视化
好的,我可以帮你写一个简单的Python程序用来爬取豆瓣电影Top250,并将数据可视化。首先,你需要安装一些Python库,包括 requests、beautifulsoup4 和 matplotlib。你可以在命令行中使用以下命令来安装这些库:
```
pip install requests beautifulsoup4 matplotlib
```
接下来,你可以使用以下代码来爬取豆瓣电影Top250,并将数据可视化:
```python
import requests
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt
# 爬取豆瓣电影Top250
url = 'https://movie.douban.com/top250'
movies = []
for i in range(10):
start = i * 25
r = requests.get(url + '?start=' + str(start))
soup = BeautifulSoup(r.text, 'html.parser')
for movie in soup.find_all('div', class_='info'):
title = movie.find('span', class_='title').text
rating = movie.find('span', class_='rating_num').text
movies.append((title, float(rating)))
# 可视化数据
movies.sort(key=lambda x: x[1], reverse=True)
top_10_movies = movies[:10]
titles = [movie[0] for movie in top_10_movies]
ratings = [movie[1] for movie in top_10_movies]
plt.bar(titles, ratings)
plt.xticks(rotation=45)
plt.xlabel('电影名称')
plt.ylabel('评分')
plt.title('豆瓣电影Top10')
plt.show()
```
这个程序会爬取豆瓣电影Top250的前10部电影,并将它们的名称和评分用柱状图可视化出来。你可以根据需要调整代码,比如修改爬取的电影数量、改变图表的样式等等。
用python写一段代码,爬取豆瓣电影信息,进行简单的数据分析并可视化展示
下面是一个使用Python爬取豆瓣电影信息、进行简单数据分析和可视化展示的示例代码:
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd
import matplotlib.pyplot as plt
# 爬取电影信息
def get_movie_info():
url = 'https://movie.douban.com/top250'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
movie_info = []
for i in range(0, 250, 25):
params = {
'start': str(i),
'filter': ''
}
response = requests.get(url, headers=headers, params=params)
soup = BeautifulSoup(response.text, 'html.parser')
movies = soup.find_all('div', class_='item')
for movie in movies:
name = movie.find('span', class_='title').text
rating_num = movie.find('span', class_='rating_num').text
comment_num = movie.find('div', class_='star').find_all('span')[-1].text[:-3]
movie_info.append((name, float(rating_num), int(comment_num)))
return movie_info
# 进行数据分析
def data_analysis(movie_info):
df = pd.DataFrame(movie_info, columns=['name', 'rating_num', 'comment_num'])
df['rating_num'].plot(kind='hist', bins=20)
plt.title('Rating Distribution')
plt.xlabel('Rating')
plt.ylabel('Frequency')
plt.show()
df.plot(kind='scatter', x='rating_num', y='comment_num')
plt.title('Rating vs. Comment')
plt.xlabel('Rating')
plt.ylabel('Comment')
plt.show()
if __name__ == '__main__':
movie_info = get_movie_info()
data_analysis(movie_info)
```
该代码使用requests和BeautifulSoup库爬取豆瓣电影Top250的电影信息,包括电影名称、评分和评论数,并将其存储在一个列表中。随后使用pandas和matplotlib库进行数据分析和可视化展示,分别展示了电影评分的分布和评分与评论数之间的关系。