基于spark的波士顿房价预测中，构建决策树模型并进行可视化的代码

时间: 2024-03-03 22:48:33 浏览: 116

波士顿房价决策树python编码

好的，以下是基于Spark的波士顿房价预测中，构建决策树模型并进行可视化的代码： ```python from pyspark.ml import Pipeline from pyspark.ml.feature import VectorAssembler from pyspark.ml.regression import DecisionTreeRegressor from pyspark.ml.evaluation import RegressionEvaluator from pyspark.ml.feature import PCA from pyspark.ml.feature import StringIndexer from pyspark.ml.feature import IndexToString from pyspark.ml.feature import OneHotEncoder from pyspark.ml.tuning import ParamGridBuilder, CrossValidator from pyspark.ml.evaluation import RegressionEvaluator from pyspark.ml.tree import DecisionTreeModel, DecisionTreeRegressionModel from pyspark.mllib.evaluation import RegressionMetrics from pyspark.ml.evaluation import MulticlassClassificationEvaluator import numpy as np import matplotlib.pyplot as plt # 读取数据 data = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("boston.csv") # 数据预处理 categoricalCols = [] numericCols = data.columns numericCols.remove('medv') for categoricalCol in categoricalCols: stringIndexer = StringIndexer(inputCol=categoricalCol, outputCol=categoricalCol+"Index") encoder = OneHotEncoder(inputCols=[stringIndexer.getOutputCol()], outputCols=[categoricalCol+"classVec"]) stages += [stringIndexer, encoder] label_stringIdx = StringIndexer(inputCol="medv", outputCol="label") stages += [label_stringIdx] assemblerInputs = numericCols assembler = VectorAssembler(inputCols=assemblerInputs, outputCol="features") stages += [assembler] # 划分训练集和测试集 (trainingData, testData) = data.randomSplit([0.7, 0.3], seed=100) # 构建决策树模型 dt = DecisionTreeRegressor(labelCol="label", featuresCol="features") # 设置参数网格 paramGrid = (ParamGridBuilder() .addGrid(dt.maxDepth, [2, 4, 6, 8]) .addGrid(dt.minInstancesPerNode, [1, 2, 3]) .build()) # 交叉验证 cv = CrossValidator(estimator=dt, estimatorParamMaps=paramGrid, evaluator=RegressionEvaluator(), numFolds=5) # 构建Pipeline pipeline = Pipeline(stages=stages) # 训练模型 model = pipeline.fit(trainingData) # 对测试集进行预测 predictions = model.transform(testData) # 评估模型 evaluator = RegressionEvaluator(labelCol="label", predictionCol="prediction", metricName="rmse") rmse = evaluator.evaluate(predictions) print("Root Mean Squared Error (RMSE) on test data = %g" % rmse) # 可视化决策树 treeModel = model.stages[-1] treeModel.write().overwrite().save("dt_model") sameModel = DecisionTreeModel.load("dt_model") print(sameModel.toDebugString) # 绘制预测结果和实际结果的散点图 plt.scatter(np.array(predictions.select('prediction').collect()), np.array(predictions.select('medv').collect())) plt.xlabel('Predictions') plt.ylabel('Actual') plt.show() ``` 以上代码中，我们首先读取数据，并对数据进行预处理，然后划分训练集和测试集。接着，我们构建决策树模型，并设置参数网格进行交叉验证。通过Pipeline将数据预处理和模型构建整合在一起，然后训练模型并对测试集进行预测。最后，我们使用RegressionEvaluator评估模型的性能，并将决策树模型进行可视化，同时绘制预测结果和实际结果的散点图。

阅读全文

基于spark的波士顿房价预测中，构建决策树模型并进行可视化的代码

相关推荐

人工智能波士顿房价预测

波士顿房价预测代码pyspark-python-sklearn

xgboost分类以及回归预测代码实例，内涵实例代码及数据

【CART决策树模型入门】：从零开始构建决策树

决策树模型的终极指南：从基础到高级应用的全解析

高级数据分析：信息增益在复杂决策树结构中的巧妙应用

【R语言随机森林】：打造高效预测模型的终极指南

【Python机器学习入门】：3小时学会使用Scikit-learn进行数据分析与预测

集成方法威力：如何通过随机森林与梯度提升优化回归模型

回归问题中的机器学习算法

波士顿房价预测 python

java+sql server项目之科帮网计算机配件报价系统源代码.zip

【java毕业设计】智慧社区老人健康监测门户.zip

【java毕业设计】智慧社区心理咨询平台（源代码+论文+PPT模板）.zip

计算机系统基础实验LinkLab实验及解答：深入理解ELF文件与链接过程

基于关键词的历时百度搜索指数自动采集资料齐全+详细文档+高分项目+源码.zip

用C语言写出一个简单的圣诞树，让你的朋友们体验一下程序员的浪漫，点开即令哦！

免费下载：Hilma af Klint a Biography (Julia Voss)_tFy2T.zip

屏幕截图 2024-12-21 172527.png

最新推荐

Spark随机森林实现票房预测

java+sql server项目之科帮网计算机配件报价系统源代码.zip

JavaScript实现的高效pomodoro时钟教程

管理建模和仿真的文件

【WebLogic客户端兼容性提升秘籍】：一站式解决方案与实战案例

使用jupyter读取文件“近5年考试人数.csv”，绘制近5年高考及考研人数发展趋势图，数据如下（单位：万人）。

CMake 3.25.3版本发布：程序员必备构建工具

"互动学习：行动中的多样性与论文攻读经历"

数字信号处理全攻略：掌握15个关键技巧，提升你的处理效率

给定不超过6的正整数A，考虑从A开始的连续4个数字。请输出所有由它们组成的无重复数字的3位数。编写一个C语言程序