pyspark解析json数组
时间: 2023-09-09 10:13:10 浏览: 95
Sure, I can help you with that! In PySpark, you can use the `from_json` function from the `pyspark.sql.functions` module to parse JSON arrays. Here's an example code snippet:
```
from pyspark.sql.functions import from_json
from pyspark.sql.types import ArrayType, StringType, StructType, StructField
# Define the schema of your JSON array
schema = StructType([
StructField("name", StringType()),
StructField("age", StringType())
])
# Create a DataFrame with an example JSON array
json_array = [
'{"name": "Alice", "age": "25"}',
'{"name": "Bob", "age": "30"}',
'{"name": "Charlie", "age": "35"}'
]
df = spark.createDataFrame([(json_array,)], ["json_array"])
# Parse the JSON array using the schema
parsed_df = df.select(from_json(df.json_array, ArrayType(schema)).alias("parsed_array"))
# Extract the elements of the parsed array
parsed_df.selectExpr("parsed_array[0].name", "parsed_array[0].age").show()
```
This code defines a schema for a JSON object with `name` and `age` fields, creates a DataFrame with an example JSON array, parses the array using the schema, and then extracts the elements of the parsed array. The output of this code should be:
```
+------+---+
| name|age|
+------+---+
| Alice| 25|
+------+---+
```
阅读全文