pyspark解析json数组
时间: 2023-09-09 10:13:08 浏览: 168
解析json数据
Sure, I can answer that question. In PySpark, you can use the `from_json` function to parse JSON arrays. Here's an example:
```
from pyspark.sql.functions import from_json
from pyspark.sql.types import StructType, StructField, StringType, ArrayType
# Sample dataset with a column containing JSON arrays
data = [("1", '[{"id": 1, "name": "John"}, {"id": 2, "name": "Jane"}]'),
("2", '[{"id": 3, "name": "Bob"}, {"id": 4, "name": "Alice"}]')]
# Define the schema of the JSON array
schema = StructType([
StructField("id", StringType()),
StructField("name", StringType())
])
# Apply the from_json function to parse the JSON array
parsed_data = spark.createDataFrame(data, ["key", "json"])
parsed_data = parsed_data.withColumn("parsed_json", from_json("json", ArrayType(schema)))
# Show the parsed data
parsed_data.show(truncate=False)
```
This will output a DataFrame where the JSON array is parsed into a column of type `ArrayType`:
```
+---+---------------------------------------------------+-----------------------------------------+
|key|json |parsed_json |
+---+---------------------------------------------------+-----------------------------------------+
|1 |[{"id": 1, "name": "John"}, {"id": 2, "name": "Jane"}]|[[1, John], [2, Jane]] |
|2 |[{"id": 3, "name": "Bob"}, {"id": 4, "name": "Alice"}]|[[3, Bob], [4, Alice]] |
+---+---------------------------------------------------+-----------------------------------------+
```
阅读全文