There are duplicate values, missing values, and unreasonable values in the data. Please handle them as required: Duplicate values: name: if there are two names, keep the first one. Missing values:name: drop the line. gender: fill in Unknown. age: fill in 0. community: fill in Unknown. Unreasonable values:age: if the age is less than 0, fill in 0. Please output average age of each gender in each community. Input The input is the path name of the file. 1filename = input() Output print the output dataframe as sample. Samples sample.csv 1name,gender,age,community 2name2,F,31,F 3name5,M,12,F 4name6,F,88,E 5name3,M,18,G 6name9,F,85,D 7name5,M,87,G 8,M,26,C 9name2,F,60,F 10name2,F,89,F 11name0,F,15,E 12 Input sample.csv Output avg_age community gender D F 85.0 E F 51.5 F F 31.0 M 12.0 G M 18.0 use pandas in python
时间: 2023-12-24 10:04:14 浏览: 62
Sure! Here's the code to handle the data as required:
```python
import pandas as pd
# Read the input file
filename = input()
df = pd.read_csv(filename)
# Drop duplicate values
df.drop_duplicates(subset='name', keep='first', inplace=True)
# Handle missing values
df.dropna(subset=['name'], inplace=True)
df.fillna(value={'gender': 'Unknown', 'age': 0, 'community': 'Unknown'}, inplace=True)
# Handle unreasonable values
df.loc[df['age'] < 0, 'age'] = 0
# Calculate average age by gender and community
avg_age = df.groupby(['community', 'gender']).agg({'age': 'mean'}).reset_index()
# Print the output dataframe
print(avg_age)
```
This code reads the input file, handles duplicate values, missing values, and unreasonable values as required, calculates the average age by gender and community, and outputs the result as a dataframe. Let me know if you have any questions!
阅读全文