scala支持向量机模型计算混淆矩阵和roc曲线
时间: 2023-09-27 19:06:07 浏览: 48
要在Scala中使用支持向量机模型计算混淆矩阵和ROC曲线,您需要使用Spark MLlib库。以下是一个简单的示例:
```scala
import org.apache.spark.ml.classification.LinearSVC
import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
import org.apache.spark.ml.feature.{VectorAssembler, StringIndexer}
import org.apache.spark.ml.linalg.Vectors
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().appName("SVMExample").getOrCreate()
// Load data
val data = spark.read.format("csv").option("header", "true").load("data.csv")
// Convert label column to numeric
val labelIndexer = new StringIndexer().setInputCol("label").setOutputCol("indexedLabel").fit(data)
val indexed = labelIndexer.transform(data)
// Assemble feature columns into a vector
val assembler = new VectorAssembler().setInputCols(Array("feature1", "feature2")).setOutputCol("features")
val assembled = assembler.transform(indexed)
// Split data into training and test sets
val Array(training, test) = assembled.randomSplit(Array(0.7, 0.3), seed = 12345)
// Train SVM model
val svm = new LinearSVC().setMaxIter(10).setRegParam(0.1).setElasticNetParam(0.0)
val model = svm.fit(training)
// Make predictions on test data
val predictions = model.transform(test)
// Compute evaluation metrics
val evaluator = new BinaryClassificationEvaluator().setLabelCol("indexedLabel").setRawPredictionCol("rawPrediction").setMetricName("areaUnderROC")
val areaUnderROC = evaluator.evaluate(predictions)
val tp = predictions.filter("prediction = 1.0 AND indexedLabel = 1.0").count()
val fp = predictions.filter("prediction = 1.0 AND indexedLabel = 0.0").count()
val tn = predictions.filter("prediction = 0.0 AND indexedLabel = 0.0").count()
val fn = predictions.filter("prediction = 0.0 AND indexedLabel = 1.0").count()
val confusionMatrix = Seq(
(tp, fp),
(fn, tn)
)
// Output results
println(s"Area under ROC: $areaUnderROC")
println(s"Confusion matrix:\n${confusionMatrix.mkString("\n")}")
```
请注意,这些示例假定您已经将数据加载到Spark DataFrame中,并且已经使用StringIndexer和VectorAssembler转换了数据以进行训练和预测。您需要根据您的数据和模型进行相应的更改。