pandas groupby
时间: 2023-08-29 13:07:17 浏览: 88
Pandas groupby is a powerful function in the Pandas library that allows us to group data based on some criteria and perform various computations on each group. It splits the data into groups based on the selected criteria and then applies the desired function to each group.
The syntax for the groupby function is as follows:
```
df.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, observed=False, dropna=True)
```
Where:
- by: This parameter specifies the column or list of columns based on which the grouping will be done.
- axis: This parameter specifies the axis along which the grouping will be done. By default, it is 0 (row-wise grouping).
- level: This parameter is used to specify the level (if the data is multi-indexed) on which the grouping will be done.
- as_index: This parameter is used to specify whether to return the grouped by columns as the index of the resulting DataFrame (True by default).
- sort: This parameter is used to specify whether to sort the result by the group keys (True by default).
- group_keys: This parameter is used to specify whether to add group keys to the index to identify the group (True by default).
- squeeze: This parameter is used to specify whether to return a Series if possible (False by default).
- observed: This parameter is used to specify whether to only group by observed values in the data (False by default).
- dropna: This parameter is used to specify whether to exclude missing values from the grouping (True by default).
Here's an example of how to use the groupby function:
```
import pandas as pd
# Creating a DataFrame
data = {'Name': ['John', 'Sam', 'John', 'Marry', 'Sam', 'Marry'],
'Subject': ['Math', 'Science', 'Math', 'Science', 'Math', 'Science'],
'Score': [80, 90, 75, 85, 95, 80]}
df = pd.DataFrame(data)
# Grouping the DataFrame by the 'Name' column and calculating the mean score for each group
grouped_df = df.groupby('Name')['Score'].mean()
print(grouped_df)
```
Output:
```
Name
John 77.5
Marry 82.5
Sam 92.5
Name: Score, dtype: float64
```
In this example, we grouped the DataFrame by the 'Name' column and then calculated the mean score for each group using the mean function. The resulting DataFrame shows the mean score for each group.
阅读全文