date_history = pd.DataFrame(data_integrity.iloc[:, 0]) dataset_history = pd.DataFrame(data_integrity.iloc[:, 1]) # 数据划分完,数据格式转换 dataset_history = [each[0] for each in np.array(dataset_history.iloc[:, [0]]).tolist()] # 历史数据 data_history_conversion = [{i: dataset_history[i * 96:96 * (i + 1)]} for i in range(int(len(dataset_history) / 96))] # 历史数据转化 data_pre = [each for each in np.array(dataset_pre).tolist()] # 预测所需要的今日数据 data_pre_conversion = np.array([float(item) for item in data_pre[0:96]]) # 预测所需要的今日数据的格式转化 代码优化
时间: 2024-01-14 17:04:51 浏览: 74
可以将第一行和第二行合并,即:
```
date_history = pd.DataFrame(data_integrity.iloc[:, 0])
dataset_history = [each[0] for each in np.array(data_integrity.iloc[:, 1]).tolist()]
```
第三行可以使用列表推导式简化,即:
```
data_history_conversion = [{i: dataset_history[i * 96:96 * (i + 1)]} for i in range(len(dataset_history) // 96)]
```
第四行可以直接将`dataset_pre`转换为`numpy array`,即:
```
data_pre_conversion = np.array(dataset_pre[0:96], dtype=float)
```
这样就可以避免使用`for`循环了。
相关问题
time_new_list= data_integrity.iloc[-1, 0] minutes = int(time_new_list.minute) + int(time_new_list.hour) * 60 interval_length = ((minutes / 15) + 1) if interval_length == 96: dataset_pre = data_integrity.load.tail(int(interval_length)) else: dataset_pre = (data_integrity.load[:-int(interval_length)]).tail(96) date_history = pd.DataFrame(data_integrity.iloc[:, 0]) dataset_history = pd.DataFrame(data_integrity.iloc[:, 1]) # 数据划分完,数据格式转换 dataset_history = [each[0] for each in np.array(dataset_history.iloc[:, [0]]).tolist()] # 历史数据 data_history_conversion = [{i: dataset_history[i * 96:96 * (i + 1)]} for i in range(int(len(dataset_history) / 96))] # 历史数据转化 data_pre = [each for each in np.array(dataset_pre).tolist()] # 预测所需要的今日数据 data_pre_conversion = np.array([float(item) for item in data_pre[0:96]]) # 预测所需要的今日数据的格式转化 代码优化
以下是代码优化后的版本:
```
# 计算时间间隔
time_new_list = data_integrity.iloc[-1, 0]
minutes = time_new_list.minute + time_new_list.hour * 60
interval_length = (minutes // 15) + 1
# 获取预测数据集
if interval_length == 96:
dataset_pre = data_integrity.load.tail(interval_length)
else:
dataset_pre = data_integrity.load[:-interval_length].tail(96)
# 获取历史数据集
date_history = data_integrity.iloc[:, 0]
dataset_history = data_integrity.iloc[:, 1].iloc[:, 0].tolist()
# 将历史数据转换为字典格式
data_history_conversion = [
{i: dataset_history[i * 96 : 96 * (i + 1)]} for i in range(len(dataset_history) // 96)
]
# 获取预测所需要的今日数据
data_pre = dataset_pre.values.tolist()
data_pre_conversion = np.array(data_pre[0:96], dtype=float)
```
优化后的代码主要做了以下几个方面的改进:
1. 使用了整除符号 "//" 来计算时间间隔,避免了使用 int() 函数进行类型转换。
2. 使用了 pandas 库的 tail() 函数来获取最后几行数据,使代码更加简洁。
3. 将历史数据集的转换方式改为了使用列表推导式,避免了使用 for 循环。
4. 使用了 numpy 库的 array 函数来将预测数据转换为数组格式,避免了使用列表推导式和 for 循环。同时,指定了数组的数据类型为 float,避免了使用 float() 函数进行类型转换。
import pandas as pd import math as mt import numpy as np from sklearn.model_selection import train_test_split from Recommenders import SVDRecommender triplet_dataset_sub_song_merged = triplet_dataset_sub_song_mergedpd triplet_dataset_sub_song_merged_sum_df = triplet_dataset_sub_song_merged[['user','listen_count']].groupby('user').sum().reset_index() triplet_dataset_sub_song_merged_sum_df.rename(columns={'listen_count':'total_listen_count'},inplace=True) triplet_dataset_sub_song_merged = pd.merge(triplet_dataset_sub_song_merged,triplet_dataset_sub_song_merged_sum_df) triplet_dataset_sub_song_merged['fractional_play_count'] = triplet_dataset_sub_song_merged['listen_count']/triplet_dataset_sub_song_merged small_set = triplet_dataset_sub_song_merged user_codes = small_set.user.drop_duplicates().reset_index() song_codes = small_set.song.drop_duplicates().reset_index() user_codes.rename(columns={'index':'user_index'}, inplace=True) song_codes.rename(columns={'index':'song_index'}, inplace=True) song_codes['so_index_value'] = list(song_codes.index) user_codes['us_index_value'] = list(user_codes.index) small_set = pd.merge(small_set,song_codes,how='left') small_set = pd.merge(small_set,user_codes,how='left') mat_candidate = small_set[['us_index_value','so_index_value','fractional_play_count']] data_array = mat_candidate.fractional_play_count.values row_array = mat_candidate.us_index_value.values col_array = mat_candidate.so_index_value.values data_sparse = coo_matrix((data_array, (row_array, col_array)),dtype=float) K=50 urm = data_sparse MAX_PID = urm.shape[1] MAX_UID = urm.shape[0] recommender = SVDRecommender(K) U, S, Vt = recommender.fit(urm) Compute recommendations for test users uTest = [1,6,7,8,23] uTest_recommended_items = recommender.recommend(uTest, urm, 10) Output recommended songs in a dataframe recommendations = pd.DataFrame(columns=['user','song', 'score','rank']) for user in uTest: rank = 1 for song_index in uTest_recommended_items[user, 0:10]: song = small_set.loc[small_set['so_index_value'] == song_index].iloc[0] # Get song details recommendations = recommendations.append({'user': user, 'song': song['title'], 'score': song['fractional_play_count'], 'rank': rank}, ignore_index=True) rank += 1 display(recommendations)这段代码报错了,为什么?给出修改后的 代码
代码中的错误是使用了未定义的模块和类。需要先安装相应的模块并导入相应的类。
以下是修改后的代码:
```python
import pandas as pd
import numpy as np
from scipy.sparse import coo_matrix
from sklearn.model_selection import train_test_split
from Recommenders.SVDRecommender import SVDRecommender # 导入SVDRecommender类
triplet_dataset_sub_song_merged = pd.read_csv('triplet_dataset_sub_song_merged.csv') # 读取数据
triplet_dataset_sub_song_merged_sum_df = triplet_dataset_sub_song_merged[['user','listen_count']].groupby('user').sum().reset_index()
triplet_dataset_sub_song_merged_sum_df.rename(columns={'listen_count':'total_listen_count'},inplace=True)
triplet_dataset_sub_song_merged = pd.merge(triplet_dataset_sub_song_merged,triplet_dataset_sub_song_merged_sum_df)
triplet_dataset_sub_song_merged['fractional_play_count'] = triplet_dataset_sub_song_merged['listen_count']/triplet_dataset_sub_song_merged['total_listen_count']
small_set = triplet_dataset_sub_song_merged[['user','song','fractional_play_count']]
user_codes = small_set.user.drop_duplicates().reset_index()
song_codes = small_set.song.drop_duplicates().reset_index()
user_codes.rename(columns={'index':'user_index'}, inplace=True)
song_codes.rename(columns={'index':'song_index'}, inplace=True)
song_codes['so_index_value'] = list(song_codes.index)
user_codes['us_index_value'] = list(user_codes.index)
small_set = pd.merge(small_set,song_codes,how='left')
small_set = pd.merge(small_set,user_codes,how='left')
mat_candidate = small_set[['us_index_value','so_index_value','fractional_play_count']]
data_array = mat_candidate.fractional_play_count.values
row_array = mat_candidate.us_index_value.values
col_array = mat_candidate.so_index_value.values
data_sparse = coo_matrix((data_array, (row_array, col_array)),dtype=float)
K=50
urm = data_sparse
MAX_PID = urm.shape[1]
MAX_UID = urm.shape[0]
recommender = SVDRecommender(K) # 创建SVD推荐器
U, S, Vt = recommender.fit(urm) # 训练推荐器
uTest = [1,6,7,8,23]
uTest_recommended_items = recommender.recommend(uTest, urm, 10)
# 输出推荐结果
recommendations = pd.DataFrame(columns=['user','song', 'score','rank'])
for user in uTest:
rank = 1
for song_index in uTest_recommended_items[user, 0:10]:
song = small_set.loc[small_set['so_index_value'] == song_index].iloc[0]
recommendations = recommendations.append({'user': user, 'song': song['song'], 'score': song['fractional_play_count'], 'rank': rank}, ignore_index=True)
rank += 1
display(recommendations)
```
阅读全文