write a python code:City staff need to conduct statistics on the population information of the city, including name, gender, age, and community. There are duplicate values, missing values, and unreasonable values in the data. Please handle them as required: Duplicate values: name: if there are two names, keep the first one. Missing values: name: drop the line. gender: fill in Unknown. age: fill in 0. community: fill in Unknown. Unreasonable values: age: if the age is less than 0, fill in 0. Please output average age of each gender in each community.
时间: 2024-01-22 14:18:58 浏览: 131
使用python写的简单精确年龄计算应用
Here is the Python code to handle the given requirements:
```python
import pandas as pd
# read the input data file
df = pd.read_csv('population_info.csv')
# remove duplicates based on name
df.drop_duplicates(subset=['name'], keep='first', inplace=True)
# drop rows with missing name values
df.dropna(subset=['name'], inplace=True)
# fill missing gender and community values with Unknown
df.fillna(value={'gender': 'Unknown', 'community': 'Unknown'}, inplace=True)
# fill negative age values with 0
df.loc[df['age'] < 0, 'age'] = 0
# calculate average age of each gender in each community
avg_age = df.groupby(['community', 'gender'])['age'].mean()
# print the result
print(avg_age)
```
Note: This code assumes that the input data file is in CSV format and is named "population_info.csv". You need to replace this with the actual file name.
阅读全文