首页 PySpark 的 join 操作代码实例

PySpark 的 join 操作代码实例

时间: 2023-10-08 17:04:03 浏览: 75

Sure, here is an example of PySpark join operation: ``` # Create two dataframes df1 = spark.createDataFrame([(1, 'a'), (2, 'b'), (3, 'c')], ['id', 'value1']) df2 = spark.createDataFrame([(2, 'x'), (3, 'y'), (4, 'z')], ['id', 'value2']) # Perform join operation on id column joined_df = df1.join(df2, on=['id'], how='inner') joined_df.show() ``` In this example, two dataframes `df1` and `df2` are created with respective columns `id` and `value1`, and `id` and `value2`. The `join` operation is performed on the `id` column, with `how='inner'` parameter specifying the join type as inner join. The resulting `joined_df` dataframe will have columns `id`, `value1`, and `value2`, with only matching `id` values from both dataframes.

阅读全文