PySpark 的 join 操作代码实例
时间: 2023-10-08 17:04:03 浏览: 75
Sure, here is an example of PySpark join operation:
```
# Create two dataframes
df1 = spark.createDataFrame([(1, 'a'), (2, 'b'), (3, 'c')], ['id', 'value1'])
df2 = spark.createDataFrame([(2, 'x'), (3, 'y'), (4, 'z')], ['id', 'value2'])
# Perform join operation on id column
joined_df = df1.join(df2, on=['id'], how='inner')
joined_df.show()
```
In this example, two dataframes `df1` and `df2` are created with respective columns `id` and `value1`, and `id` and `value2`. The `join` operation is performed on the `id` column, with `how='inner'` parameter specifying the join type as inner join. The resulting `joined_df` dataframe will have columns `id`, `value1`, and `value2`, with only matching `id` values from both dataframes.
阅读全文