首页如何按照1.5IQR原则判断shiyan2数据集中Age、Fare两个字段有无异常值，分别输出去掉Age、Fare、以及所有异常值的记录的三组数据。

如何按照1.5IQR原则判断shiyan2数据集中Age、Fare两个字段有无异常值，分别输出去掉Age、Fare、以及所有异常值的记录的三组数据。

时间: 2024-03-01 11:50:08 浏览: 145

可以按照以下步骤判断并输出数据集中 Age、Fare 字段的异常值： 1. 导入数据集 ```python import pandas as pd df = pd.read_csv('shiyan2.csv') ``` 2. 计算异常值 ```python def IQR_Outliers(Ser): """Ser：进行异常值分析的DataFrame的某一列""" Low = Ser.quantile(0.25) - 1.5 * (Ser.quantile(0.75) - Ser.quantile(0.25)) Up = Ser.quantile(0.75) + 1.5 * (Ser.quantile(0.75) - Ser.quantile(0.25)) index = (Ser < Low) | (Ser > Up) Outliers = Ser.loc[index] return Outliers age_outliers = IQR_Outliers(df['Age']) fare_outliers = IQR_Outliers(df['Fare']) ``` 3. 输出去掉 Age、Fare 字段的记录 ```python df.drop(['Age', 'Fare'], axis=1, inplace=True) ``` 4. 输出去掉所有异常值的记录 ```python df_no_outliers = df[~df['Age'].isin(age_outliers) & ~df['Fare'].isin(fare_outliers)] ``` 这里使用了 `isin()` 函数，将 `~` 取反操作符用于选择不在 `age_outliers` 和 `fare_outliers` 中的行。最后，重新赋值给 `df_no_outliers` 变量，就可以得到去掉所有异常值的记录的数据集。 5. 输出 Age、Fare 字段去掉异常值的记录 ```python df_age_no_outliers = df.drop(age_outliers.index) df_fare_no_outliers = df.drop(fare_outliers.index) ``` 其中，`index` 属性可以获取 Series 中所有异常值的索引，然后可以利用 `drop()` 函数删除对应的行。最后，分别赋值给 `df_age_no_outliers` 和 `df_fare_no_outliers` 变量，就可以得到去掉 Age、Fare 字段异常值的记录的数据集。

阅读全文