Imagine that we want to determine whether unemployment was high (> 6.5), medium (4.5 < x <= 6.5), or low (<= 4.5) for each state and each month. Write a Python function that takes a single number as an input and outputs a single string noting if that number is high, medium, or low. Pass your function to applymap (quiz: why applymap and not agg or apply?) and save the result in a new DataFrame called unemp_bins. (Challenging) This exercise has multiple parts: Use another transform on unemp_bins to count how many times each state had each of the three classifications. - Hint 1: Will this value counting function be a Series or scalar transform? - Hint 2: Try googling "pandas count unique value" or something similar to find the right transform. Construct a horizontal bar chart of the number of occurrences of each level with one bar per state and classification (21 total bars). (Challenging) Repeat the previous step, but count how many states had each classification in each month. Which month had the most states with high unemployment? What about medium and low?
时间: 2024-02-14 12:08:03 浏览: 93
Sure, I can help you with that. Here is the Python function that takes a single number as an input and outputs a single string noting if that number is high, medium, or low:
```python
def classify_unemployment(x):
if x > 6.5:
return 'high'
elif x <= 4.5:
return 'low'
else:
return 'medium'
```
We use `applymap()` instead of `agg()` or `apply()` because we want to apply the function element-wise to each element in the DataFrame.
Here's how you can use this function with `applymap()` to create a new DataFrame called `unemp_bins`:
```python
unemp_bins = df.applymap(classify_unemployment)
```
To count how many times each state had each of the three classifications, we can use the `value_counts()` function on each column of `unemp_bins`. Since `value_counts()` returns a Series, we can use `apply()` and pass `value_counts` as a parameter to apply the function to each column:
```python
state_counts = unemp_bins.apply(lambda x: x.value_counts())
```
To construct a horizontal bar chart of the number of occurrences of each level with one bar per state and classification, we can use the `plot()` method on the `state_counts` DataFrame. We set `kind='barh'` to create a horizontal bar chart:
```python
state_counts.plot(kind='barh', title='Unemployment Classification by State')
```
To count how many states had each classification in each month, we can use `groupby()` on the `unemp_bins` DataFrame, grouping by the columns 'month' and the classification ('high', 'medium', or 'low'). Then we can count the number of unique states in each group using the `nunique()` function:
```python
month_counts = unemp_bins.groupby(['month', 'high', 'medium', 'low']).agg({'state': 'nunique'})
```
To find out which month had the most states with high, medium, and low unemployment, we can use the `idxmax()` function on each column of `month_counts`. This returns the index (month, classification) of the maximum value in each column:
```python
high_month = month_counts.loc[month_counts['high'].idxmax()].name[0]
medium_month = month_counts.loc[month_counts['medium'].idxmax()].name[0]
low_month = month_counts.loc[month_counts['low'].idxmax()].name[0]
```
This will give you the month with the most states with high, medium, and low unemployment.
阅读全文