读取"ramen-rating.csv"文件,并进行数据预处理; 统计各个国家拉面品牌数量,并绘制排名前10位国家的直方图; 找出各个国家最受欢迎的拉面品牌和包装类型; 统计各个国家各个品牌的stars平均值; 给出一些交叉列联表,行列自己选。
时间: 2024-05-09 14:20:10 浏览: 60
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# 读取文件
ramen = pd.read_csv("ramen-rating.csv")
# 数据预处理
# 将"Top Ten"列中的空值填充为"Unknown"
ramen["Top Ten"].fillna("Unknown", inplace=True)
# 统计各个国家拉面品牌数量
country_count = ramen["Country"].value_counts()
# 绘制排名前10位国家的直方图
plt.bar(country_count[:10].index, country_count[:10].values)
plt.xticks(rotation=45)
plt.title("Top 10 Countries with Most Ramen Brands")
plt.xlabel("Country")
plt.ylabel("Number of Brands")
plt.show()
# 找出各个国家最受欢迎的拉面品牌和包装类型
# 最受欢迎的品牌
popular_brand = ramen.groupby("Country")["Stars", "Brand"].apply(lambda x: x[x.Stars == x.Stars.max()].iloc[0])
print("Most Popular Brands in Each Country:\n", popular_brand[["Brand", "Stars"]])
# 最受欢迎的包装类型
popular_style = ramen.groupby("Country")["Stars", "Style"].apply(lambda x: x[x.Stars == x.Stars.max()].iloc[0])
print("Most Popular Styles in Each Country:\n", popular_style[["Style", "Stars"]])
# 统计各个国家各个品牌的stars平均值
brand_avg_stars = ramen.groupby(["Country", "Brand"])["Stars"].mean()
print("Average Stars for Each Brand in Each Country:\n", brand_avg_stars)
# 交叉列联表
# 1. 国家和包装类型的交叉列联表
ct_style = pd.crosstab(ramen["Country"], ramen["Style"])
print("Cross-tabulation of Country and Style:\n", ct_style)
# 2. 国家和评分等级的交叉列联表
ct_rating = pd.crosstab(ramen["Country"], pd.cut(ramen["Stars"], bins=[0, 3, 3.5, 4, 4.5, 5], labels=["<3", "3-3.5", "4-4.5", ">4.5", "5"]))
print("Cross-tabulation of Country and Rating:\n", ct_rating)
```
阅读全文