帮我写一段用python爬取电影数据代码,并进行重复值和缺失值分析
时间: 2023-06-08 19:07:13 浏览: 115
好的,以下是用Python爬取电影数据并进行重复值和缺失值分析的代码:
```
import pandas as pd
import requests
from bs4 import BeautifulSoup
# 爬取电影数据
def get_movie_data():
url = 'https://www.imdb.com/chart/top'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
movies = soup.select('td.titleColumn')
crew = [a.attrs.get('title') for a in soup.select('td.titleColumn a')]
ratings = [b.attrs.get('data-value') for b in soup.select('td.posterColumn span[name="ir"]')]
data = []
for i in range(len(movies)):
movie_string = movies[i].get_text().strip().replace("\n", "")
movie_title = (' '.join(movie_string.split())).replace(".", "")
movie_crew = crew[i]
movie_rating = ratings[i]
movie = {'Title': movie_title, 'Crew': movie_crew, 'Rating': movie_rating}
data.append(movie)
return data
# 创建数据框
data = pd.DataFrame(get_movie_data())
# 重复值分析
duplicates = data[data.duplicated()]
print('重复值数量:', len(duplicates))
# 缺失值分析
missing = data.isnull().sum()
print('缺失值数量:', missing)
# 输出数据框
print(data.head())
```
注意:此代码仅供参考,具体根据网站的结构和数据特征进行相应的修改和调整。
阅读全文
相关推荐
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![](https://img-home.csdnimg.cn/images/20250102104920.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![rar](https://img-home.csdnimg.cn/images/20241231044955.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![-](https://img-home.csdnimg.cn/images/20241231044930.png)
![-](https://img-home.csdnimg.cn/images/20241231045053.png)
![-](https://img-home.csdnimg.cn/images/20241231045053.png)
![-](https://img-home.csdnimg.cn/images/20241231045021.png)
![-](https://img-home.csdnimg.cn/images/20241231044901.png)
![-](https://img-home.csdnimg.cn/images/20241226111658.png)
![-](https://img-home.csdnimg.cn/images/20241226111658.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)