首页pyspark.sql.functions的 regexp_replace()怎么用

pyspark.sql.functions的 regexp_replace()怎么用

时间: 2024-03-15 19:45:47 浏览: 83

pyspark.sql.functions中的`regexp_replace()`函数用于在字符串中使用正则表达式进行替换。其使用方法如下： ``` python from pyspark.sql.functions import regexp_replace # 示例数据 df = spark.createDataFrame([(1, "Hello World"), (2, "SPARK SQL"), (3, "PySPARK")], ["id", "text"]) # 使用正则表达式将text列中的大写字母替换为小写字母，创建新列new_text df = df.withColumn("new_text", regexp_replace(df["text"], "[A-Z]", "a")) # 打印转换后的数据 df.show() ``` 执行上述代码，输出结果如下： ``` +---+-----------+-----------+ | id| text| new_text| +---+-----------+-----------+ | 1|Hello World|hello world| | 2| SPARK SQL| sPaRK aQL| | 3| PySPARK| pySPARK| +---+-----------+-----------+ ``` 可以看到，`regexp_replace()`函数将text列中的大写字母替换为小写字母，并创建了一个新的列new_text。该函数的第一个参数是要进行替换的字符串列，第二个参数是正则表达式，第三个参数是用于替换匹配字符串的新字符串。

阅读全文