df_country = df['音乐类型'].str.split(' ',expand = True) ct1 = df_country.apply(pd.value_counts).fillna('0') ct1 = ct1.astype(np.int64) ct1['count'] = ct1.sum(axis = 1) ct1.sort_values('count', ascending = False, inplace = True) ct1['count'].plot.bar(figsize = (12,5), color = 'royalblue', legend = True, width = 0.6)
时间: 2024-01-08 14:01:54 浏览: 71
这段代码的作用是将数据框(df)中的"音乐类型"列按照空格进行拆分,并统计每种音乐类型的出现次数。首先,通过`str.split(' ',expand = True)`将"音乐类型"列拆分成多列。然后,使用`pd.value_counts`函数计算每种音乐类型出现的次数,并用`fillna('0')`将缺失值填充为0。接下来,将结果转换为整数类型(`astype(np.int64)`),并计算每种音乐类型的总次数并添加到结果中(`ct1['count'] = ct1.sum(axis = 1)`)。最后,按照总次数降序排列(`ct1.sort_values('count', ascending = False, inplace = True)`),并使用柱状图展示每种音乐类型的出现次数(`ct1['count'].plot.bar(figsize = (12,5), color = 'royalblue', legend = True, width = 0.6)`)。该图形的尺寸为12x5,颜色为royalblue,柱状图的宽度为0.6。
相关问题
import numpy as np from datetime import datetime, timedelta import pandas as pd test = pd.read_excel("C:\\Users\\1data\\Desktop\\DBM成本收入核算\\test.xlsx") start_date = pd.to_datetime(test['合同开始日期']) end_date = pd.to_datetime(test['合同截止日期']) test['合同周期月数'] = round((end_date - start_date) / np.timedelta64(1, 'M')) start_date_col = '合同开始日期' end_date_col = '合同截止日期' new_col = '日期' for index, row in test.iterrows(): start_date = pd.to_datetime(row[start_date_col]) end_date = pd.to_datetime(row[end_date_col]) date_list = [] if start_date.day <= 15: while start_date <= end_date: date_list.append(start_date) start_date = start_date + timedelta(days=30) else: start_date = start_date + timedelta(days=30) while start_date <= end_date + timedelta(days=30): date_list.append(start_date) start_date = start_date + timedelta(days=30) test.loc[index, new_col] = ','.join([str(date.date()) for date in date_list]) df_tmp=test['日期'].str.split(',',expand=True) df_tmp=df_tmp.stack() df_tmp = df_tmp.reset_index(level=1,drop=True) df_tmp.name='日期' df_new = test.drop(['日期'], axis=1).join(df_tmp).reset_index().drop(columns='index') print(df_new) df_new.to_excel('income_test.xlsx',index=False) 将这段代码导出的excel中合同开始日期字段不显示时分秒且新增日期字段的取年月的字段合同周期年月
要将导出的 Excel 中的合同开始日期字段只显示年月日,可以使用 pandas 的 to_datetime 方法中的 strftime 函数将日期格式化为字符串再导出到 Excel 中。修改代码如下:
```
import numpy as np
from datetime import datetime, timedelta
import pandas as pd
test = pd.read_excel("C:\\Users\\1data\\Desktop\\DBM成本收入核算\\test.xlsx")
start_date = pd.to_datetime(test['合同开始日期'])
end_date = pd.to_datetime(test['合同截止日期'])
test['合同周期月数'] = round((end_date - start_date) / np.timedelta64(1, 'M'))
start_date_col = '合同开始日期'
end_date_col = '合同截止日期'
new_col = '日期'
for index, row in test.iterrows():
start_date = pd.to_datetime(row[start_date_col])
end_date = pd.to_datetime(row[end_date_col])
date_list = []
if start_date.day <= 15:
while start_date <= end_date:
date_list.append(start_date)
start_date = start_date + timedelta(days=30)
else:
start_date = start_date + timedelta(days=30)
while start_date <= end_date + timedelta(days=30):
date_list.append(start_date)
start_date = start_date + timedelta(days=30)
# 格式化日期字符串为年月日
date_str_list = [date.strftime('%Y-%m-%d') for date in date_list]
test.loc[index, new_col] = ','.join(date_str_list)
# 将日期字段拆分为多列
df_tmp = test['日期'].str.split(',', expand=True)
df_tmp = df_tmp.stack()
df_tmp = df_tmp.reset_index(level=1, drop=True)
df_tmp.name = '日期'
# 将拆分后的日期列与原始 DataFrame 合并
df_new = pd.concat([test.drop(['日期'], axis=1), df_tmp], axis=1)
df_new = df_new.reset_index().drop(columns='index')
# 计算合同周期年月
df_new['合同周期年月'] = df_new['合同周期月数'].apply(lambda x: f"{int(x//12)}年{int(x%12)}月")
# 导出到 Excel
df_new.to_excel('income_test.xlsx', index=False)
```
这样导出的 Excel 文件中的合同开始日期字段只显示年月日,而新增日期字段的取年月的字段合同周期年月也被计算出来了。
请详细解释一下这段代码,每一句给上相应的详细注解:sub['t'] = 0 submission = [] for f in test: df = pd.read_csv(f) df.set_index('Time', drop=True, inplace=True) df['Id'] = f.split('/')[-1].split('.')[0] # df = df.fillna(0).reset_index(drop=True) df['Time_frac']=(df.index/df.index.max()).values#currently the index of data is actually "Time" df = pd.merge(df, tasks[['Id','t_kmeans']], how='left', on='Id').fillna(-1) # df = pd.merge(df, subjects[['Id','s_kmeans']], how='left', on='Id').fillna(-1) df = pd.merge(df, metadata_complex[['Id','Subject']+['Visit','Test','Medication','s_kmeans']], how='left', on='Id').fillna(-1) df_feats = fc.calculate(df, return_df=True, include_final_window=True, approve_sparsity=True, window_idx="begin") df = df.merge(df_feats, how="left", left_index=True, right_index=True) df.fillna(method="ffill", inplace=True) # res = pd.DataFrame(np.round(reg.predict(df[cols]).clip(0.0,1.0),3), columns=pcols) res_vals=[] for i_fold in range(N_FOLDS): res_val=np.round(regs[i_fold].predict(df[cols]).clip(0.0,1.0),3) res_vals.append(np.expand_dims(res_val,axis=2)) res_vals=np.mean(np.concatenate(res_vals,axis=2),axis=2) res = pd.DataFrame(res_vals, columns=pcols) df = pd.concat([df,res], axis=1) df['Id'] = df['Id'].astype(str) + '_' + df.index.astype(str) submission.append(df[scols]) submission = pd.concat(submission) submission = pd.merge(sub[['Id']], submission, how='left', on='Id').fillna(0.0) submission[scols].to_csv('submission.csv', index=False)
这段代码的作用是生成一个提交文件(submission.csv),其中包含了对一组测试数据进行预测的结果。下面是每一句代码的详细注解:
```
sub['t'] = 0
```
在这一行代码中,创建了一个名为sub的pandas DataFrame对象,并且给其增加了一个名为t的列,初始值为0。
```
submission = []
```
这一行代码创建一个空列表submission。
```
for f in test:
df = pd.read_csv(f)
df.set_index('Time', drop=True, inplace=True)
df['Id'] = f.split('/')[-1].split('.')[0]
df = df.fillna(0).reset_index(drop=True)
df['Time_frac']=(df.index/df.index.max()).values
df = pd.merge(df, tasks[['Id','t_kmeans']], how='left', on='Id').fillna(-1)
df = pd.merge(df, subjects[['Id','s_kmeans']], how='left', on='Id').fillna(-1)
df = pd.merge(df, metadata_complex[['Id','Subject']+['Visit','Test','Medication','s_kmeans']], how='left', on='Id').fillna(-1)
```
这一段代码是对测试数据进行预处理,包括读取CSV文件、设置索引、填充缺失值、添加新的列、合并多个数据框等。
```
df_feats = fc.calculate(df, return_df=True, include_final_window=True, approve_sparsity=True, window_idx="begin")
df = df.merge(df_feats, how="left", left_index=True, right_index=True)
df.fillna(method="ffill", inplace=True)
```
这里是调用了fc.calculate函数,对测试数据进行特征提取,并将结果合并到原始数据框df中。
```
res_vals=[]
for i_fold in range(N_FOLDS):
res_val=np.round(regs[i_fold].predict(df[cols]).clip(0.0,1.0),3)
res_vals.append(np.expand_dims(res_val,axis=2))
res_vals=np.mean(np.concatenate(res_vals,axis=2),axis=2)
res = pd.DataFrame(res_vals, columns=pcols)
```
这一段代码是对测试数据进行预测,包括调用预训练的模型在测试数据上进行预测,并将结果保存在一个名为res的pandas DataFrame对象中。
```
df = pd.concat([df,res], axis=1)
df['Id'] = df['Id'].astype(str) + '_' + df.index.astype(str)
submission.append(df[scols])
```
这里将预测结果与测试数据框合并,并且将Id列和索引列组合成一个新的Id列,最后将预测结果保存到submission列表中。
```
submission = pd.concat(submission)
submission = pd.merge(sub[['Id']], submission, how='left', on='Id').fillna(0.0)
submission[scols].to_csv('submission.csv', index=False)
```
这里将所有测试数据的预测结果合并成一个pandas DataFrame对象,再与之前创建的sub对象合并,最后将最终的预测结果保存到submission.csv文件中。
阅读全文