基于spark的波士顿房价预测中,各个属性与房价的相关系数图代码
时间: 2024-03-03 07:47:32 浏览: 22
以下是基于Spark的波士顿房价预测中,各个属性与房价的相关系数图的代码:
```scala
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.stat.Correlation
import org.apache.spark.sql.functions._
// 读取数据
val data = spark.read.format("csv")
.option("header", "true")
.option("inferSchema", "true")
.load("boston_housing.csv")
// 合并特征向量
val assembler = new VectorAssembler()
.setInputCols(Array("crim", "zn", "indus", "chas", "nox", "rm", "age", "dis", "rad", "tax", "ptratio", "b", "lstat"))
.setOutputCol("features")
val dataWithFeatures = assembler.transform(data)
// 计算相关系数
val Row(coeff1: Matrix) = Correlation.corr(dataWithFeatures, "features").head
val corrArray = coeff1.toArray
// 获取属性名
val cols = Array("crim", "zn", "indus", "chas", "nox", "rm", "age", "dis", "rad", "tax", "ptratio", "b", "lstat")
// 打印相关系数矩阵
println("Correlation Matrix:")
for (i <- 0 until corrArray.length) {
if (i % cols.length == 0) {
print("\n")
}
print(f"${corrArray(i)}%1.2f\t")
}
// 画相关系数图
val corrMatrix = coeff1.toArray
val corrMatrixRows = corrMatrix.grouped(cols.length).toArray
val corrMatrixCols = corrMatrixRows.transpose
val corrMatrixColsRows = corrMatrixCols.flatten.grouped(cols.length).toArray
val corrMatrixColsRowsWithHeader = Array(cols) ++ corrMatrixColsRows
val corrMatrixString = corrMatrixColsRowsWithHeader.map(_.mkString(",")).mkString("\n")
println("\n\nCorrelation Matrix as CSV:")
println(corrMatrixString)
```
这段代码会输出两个内容,一个是相关系数矩阵,另一个是相关系数图的CSV格式数据。你可以使用该CSV数据绘制相关系数图。注意,该代码假设数据文件名为"boston_housing.csv",属性名称为"crim", "zn", "indus", "chas", "nox", "rm", "age", "dis", "rad", "tax", "ptratio", "b", "lstat"。如果你的数据文件名或属性名称不同,需要对代码进行相应修改。