col2 = df_merged.pop('Dis') df_merged.insert(len(df_merged.columns), 'Dis', col2)
时间: 2024-01-16 09:02:04 浏览: 25
这是将 Pandas DataFrame 中名为 `Dis` 的列移动到最后一列的常用技巧。该代码会将 DataFrame 中名为 `Dis` 的列删除,并将其返回给变量 `col2`。然后,使用 `insert()` 方法将 `col2` 插入到 DataFrame 的最后一列,这样 `Dis` 列就被移动到了最后一列。
具体而言,`pop()` 方法用于删除并返回指定的列,`insert()` 方法则用于在 DataFrame 的指定位置插入一列。在这里,我们将 `Dis` 列插入到了 DataFrame 的最后一列,因此使用了 `len(df_merged.columns)` 来指定插入位置。
这个技巧的作用是,有时候我们会希望将某一列移动到 DataFrame 的最后一列,以方便观察和处理数据。而使用这种方法可以避免手动调整列的顺序。
相关问题
import pandas as pd import math as mt import numpy as np from sklearn.model_selection import train_test_split from Recommenders import SVDRecommender triplet_dataset_sub_song_merged = triplet_dataset_sub_song_mergedpd triplet_dataset_sub_song_merged_sum_df = triplet_dataset_sub_song_merged[['user','listen_count']].groupby('user').sum().reset_index() triplet_dataset_sub_song_merged_sum_df.rename(columns={'listen_count':'total_listen_count'},inplace=True) triplet_dataset_sub_song_merged = pd.merge(triplet_dataset_sub_song_merged,triplet_dataset_sub_song_merged_sum_df) triplet_dataset_sub_song_merged['fractional_play_count'] = triplet_dataset_sub_song_merged['listen_count']/triplet_dataset_sub_song_merged small_set = triplet_dataset_sub_song_merged user_codes = small_set.user.drop_duplicates().reset_index() song_codes = small_set.song.drop_duplicates().reset_index() user_codes.rename(columns={'index':'user_index'}, inplace=True) song_codes.rename(columns={'index':'song_index'}, inplace=True) song_codes['so_index_value'] = list(song_codes.index) user_codes['us_index_value'] = list(user_codes.index) small_set = pd.merge(small_set,song_codes,how='left') small_set = pd.merge(small_set,user_codes,how='left') mat_candidate = small_set[['us_index_value','so_index_value','fractional_play_count']] data_array = mat_candidate.fractional_play_count.values row_array = mat_candidate.us_index_value.values col_array = mat_candidate.so_index_value.values data_sparse = coo_matrix((data_array, (row_array, col_array)),dtype=float) K=50 urm = data_sparse MAX_PID = urm.shape[1] MAX_UID = urm.shape[0] recommender = SVDRecommender(K) U, S, Vt = recommender.fit(urm) Compute recommendations for test users uTest = [1,6,7,8,23] uTest_recommended_items = recommender.recommend(uTest, urm, 10) Output recommended songs in a dataframe recommendations = pd.DataFrame(columns=['user','song', 'score','rank']) for user in uTest: rank = 1 for song_index in uTest_recommended_items[user, 0:10]: song = small_set.loc[small_set['so_index_value'] == song_index].iloc[0] # Get song details recommendations = recommendations.append({'user': user, 'song': song['title'], 'score': song['fractional_play_count'], 'rank': rank}, ignore_index=True) rank += 1 display(recommendations)这段代码报错了,为什么?给出修改后的 代码
代码中的错误是使用了未定义的模块和类。需要先安装相应的模块并导入相应的类。
以下是修改后的代码:
```python
import pandas as pd
import numpy as np
from scipy.sparse import coo_matrix
from sklearn.model_selection import train_test_split
from Recommenders.SVDRecommender import SVDRecommender # 导入SVDRecommender类
triplet_dataset_sub_song_merged = pd.read_csv('triplet_dataset_sub_song_merged.csv') # 读取数据
triplet_dataset_sub_song_merged_sum_df = triplet_dataset_sub_song_merged[['user','listen_count']].groupby('user').sum().reset_index()
triplet_dataset_sub_song_merged_sum_df.rename(columns={'listen_count':'total_listen_count'},inplace=True)
triplet_dataset_sub_song_merged = pd.merge(triplet_dataset_sub_song_merged,triplet_dataset_sub_song_merged_sum_df)
triplet_dataset_sub_song_merged['fractional_play_count'] = triplet_dataset_sub_song_merged['listen_count']/triplet_dataset_sub_song_merged['total_listen_count']
small_set = triplet_dataset_sub_song_merged[['user','song','fractional_play_count']]
user_codes = small_set.user.drop_duplicates().reset_index()
song_codes = small_set.song.drop_duplicates().reset_index()
user_codes.rename(columns={'index':'user_index'}, inplace=True)
song_codes.rename(columns={'index':'song_index'}, inplace=True)
song_codes['so_index_value'] = list(song_codes.index)
user_codes['us_index_value'] = list(user_codes.index)
small_set = pd.merge(small_set,song_codes,how='left')
small_set = pd.merge(small_set,user_codes,how='left')
mat_candidate = small_set[['us_index_value','so_index_value','fractional_play_count']]
data_array = mat_candidate.fractional_play_count.values
row_array = mat_candidate.us_index_value.values
col_array = mat_candidate.so_index_value.values
data_sparse = coo_matrix((data_array, (row_array, col_array)),dtype=float)
K=50
urm = data_sparse
MAX_PID = urm.shape[1]
MAX_UID = urm.shape[0]
recommender = SVDRecommender(K) # 创建SVD推荐器
U, S, Vt = recommender.fit(urm) # 训练推荐器
uTest = [1,6,7,8,23]
uTest_recommended_items = recommender.recommend(uTest, urm, 10)
# 输出推荐结果
recommendations = pd.DataFrame(columns=['user','song', 'score','rank'])
for user in uTest:
rank = 1
for song_index in uTest_recommended_items[user, 0:10]:
song = small_set.loc[small_set['so_index_value'] == song_index].iloc[0]
recommendations = recommendations.append({'user': user, 'song': song['song'], 'score': song['fractional_play_count'], 'rank': rank}, ignore_index=True)
rank += 1
display(recommendations)
```
new_df_cols = new_df.columns old_df_cols = older_df.columns total = set(new_df_cols + old_df_cols) new_df = new_df.select(*self.fill_empty_colums(new_df_cols, total)).withColumn("row_priority",F.lit(0)) older_df = older_df.select(*self.fill_empty_colums(old_df_cols, total)).withColumn("row_priority",F.lit(1)) key_column = [F.col(column_name) for column_name in key_columns] merge_spec = Window.partitionBy(key_column).orderBy("row_priority") ranked_df=new_df.unionByName(older_df).withColumn("rank", F.rank().over(merge_spec)) return self.update_audit_created_column(ranked_df,key_column).where(F.col("rank") == 1).drop("rank", "row_priority")
这段代码用 PySpark 实现了对两个 DataFrame 进行合并和去重的操作,并添加了一些元数据信息。下面是使用 SQL 实现的代码:
```sql
-- 选取需要合并的列名
WITH new_cols AS (
SELECT COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'new_df'
),
old_cols AS (
SELECT COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'older_df'
),
cols AS (
SELECT DISTINCT COLUMN_NAME
FROM (
SELECT COLUMN_NAME FROM new_cols
UNION ALL
SELECT COLUMN_NAME FROM old_cols
)
),
-- 对 new_df 填充空缺的列,并添加 "row_priority" 列
new_df_filled AS (
SELECT COALESCE(col1, '') AS col1, COALESCE(col2, '') AS col2, ..., COALESCE(colN, '') AS colN, 0 AS row_priority
FROM new_df
),
new_df_selected AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY key_column ORDER BY row_priority) AS rank
FROM (
-- 选取 new_df 中的列,包括填充空缺的列和 "row_priority" 列
SELECT col1, col2, ..., colN, row_priority
FROM new_df_filled
-- 生成 key_column 列,用于分组
CROSS JOIN (SELECT col1 AS key_column FROM new_df_filled) key_columns
)
),
-- 对 older_df 填充空缺的列,并添加 "row_priority" 列
old_df_filled AS (
SELECT COALESCE(col1, '') AS col1, COALESCE(col2, '') AS col2, ..., COALESCE(colN, '') AS colN, 1 AS row_priority
FROM older_df
),
old_df_selected AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY key_column ORDER BY row_priority) AS rank
FROM (
-- 选取 older_df 中的列,包括填充空缺的列和 "row_priority" 列
SELECT col1, col2, ..., colN, row_priority
FROM old_df_filled
-- 生成 key_column 列,用于分组
CROSS JOIN (SELECT col1 AS key_column FROM old_df_filled) key_columns
)
),
-- 合并两个 DataFrame,并去重
merged_df AS (
SELECT * FROM new_df_selected
UNION ALL
SELECT * FROM old_df_selected
),
-- 选取合并后的第一行
final_df AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY key_column ORDER BY rank) AS row_num
FROM merged_df
)
SELECT col1, col2, ..., colN
FROM final_df
WHERE row_num = 1
```
这段 SQL 代码的实现原理与 PySpark 代码相同,主要分为以下几个步骤:
1. 获取需要合并的列名。
2. 对 new_df 和 older_df 分别进行填充空缺列、添加 "row_priority" 列和选取列的操作,生成 new_df_selected 和 old_df_selected 两个数据集。
3. 将 two_df_selected 进行合并,并添加 rank 列,用于去重。
4. 选取合并后的第一行,得到最终的去重结果。
相关推荐
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)