boston=load_boston() X=boston.data y=boston.target feature_names=boston.feature_names print(X.shape) print(feature_names) df=pd.DataFrame(X,columns=feature_names) df['PRICE']=y print(df.head(10))
时间: 2024-04-03 16:32:05 浏览: 106
这段代码是用来加载波士顿房价数据集,并将数据集转化为DataFrame格式进行展示。首先,通过load_boston()函数加载波士顿房价数据集,其中X是数据集的特征,y是数据集的目标值。通过feature_names获取数据集的特征名。然后,使用DataFrame()函数将特征数据集X和目标值y转化为DataFrame格式,并将特征名feature_names作为DataFrame的列名。最后,添加一列名为'PRICE'的列,并将目标值y填充到该列中。使用head()函数展示DataFrame的前10行数据。
相关问题
帮我优化这段代码;import numpy as np from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split from linear_model_03.closed_form_sol.LineaRegression_CFSol import LinearRegressionClosedFormSol boston = load_boston() # 加载数据 X, y = boston.data, boston.target # 样本数据和目标值 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, shuffle=True) lr_cfs = LinearRegressionClosedFormSol(fit_intercept=True, normalized=True) # 默认训练偏置项和进行标准化 lr_cfs.fit(X_train, y_train) # 训练模型 theta = lr_cfs.get_params() feature_names = boston.feature_names # 样本的名称 for i, fn in enumerate(feature_names): print(fn, ":", theta[0][i]) print("bias:", theta[1]) y_test_pred = lr_cfs.predict(x_test=X_test) mse, r2, r2c = lr_cfs.cal_mse_r2(y_test, y_test_pred) print("均方误差:%.5f,判决系数:%.5f,修正判决系数:%.5f" % (mse, r2, r2c)) # lr_cfs.plt_predict(y_test, y_test_pred, is_sort=False) lr_cfs.plt_predict(y_test, y_test_pred, is_sort=True)
好的,我来帮你优化代码。首先,你可以将导入的库放在代码开头,以便更好地组织代码。另外,命名规范也需要注意,例如,类名应该使用大写字母开头,函数名和变量名应该使用小写字母和下划线。下面是优化后的代码:
```python
import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from linear_model_03.closed_form_sol.LinearRegression_CFSol import LinearRegressionClosedFormSol
boston = load_boston()
X, y = boston.data, boston.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, shuffle=True)
lr_cfs = LinearRegressionClosedFormSol(fit_intercept=True, normalized=True)
lr_cfs.fit(X_train, y_train)
theta = lr_cfs.get_params()
feature_names = boston.feature_names
for i, fn in enumerate(feature_names):
print(fn, ":", theta[0][i])
print("bias:", theta[1])
y_test_pred = lr_cfs.predict(x_test=X_test)
mse, r2, r2c = lr_cfs.cal_mse_r2(y_test, y_test_pred)
print("均方误差:%.5f,判决系数:%.5f,修正判决系数:%.5f" % (mse, r2, r2c))
lr_cfs.plt_predict(y_test, y_test_pred, is_sort=True)
```
这样,代码更加清晰易读,同时也符合 PEP8 规范。
如何用我的.csv文件替换下列python代码中的数据集,其中我的.csv文件含有四个变量的四列数据。 #code-4-3.py #Simple Linear Regression from sklearn.datasets import load_boston from sklearn.linear_model import LinearRegression import matplotlib.pyplot as plt from sklearn. model_selection import train_test_split dataset = load_boston() x_data = dataset.data # 导入所有特征变量 y_data = dataset.target # 导入目标值(房价) name_data = dataset.feature_names #导入特征 x_train,x_test,y_train,y_test = train_test_split(x_data, y_data,test_size= 0.25,random_state= 1001) x_data_train = x_train[:, 5].reshape(-1, 1)#选取前400个样本作为训练集 y_data_train = y_train.reshape(-1, 1) x_data_test = x_test[:, 5].reshape(-1, 1)#选取剩余的样本作为训练集 y_data_test = y_test.reshape(-1, 1) simple_model = LinearRegression() #创建线性回归估计器实例 simple_model.fit(x_data_train,y_data_train)#用训练数据拟合模型 y_data_test_p = simple_model.predict(x_data_test)#用训练的模型对测试集进行预测 plt.subplot(1, 1, 1) plt.scatter(x_data_test,y_data_test,s = 20, color="r") plt.scatter(x_data_test,y_data_test_p,s = 20, color="b") plt.xlabel('Room Number') plt.ylabel('Price') plt.title(name_data[5]) plt.show() r_squared = simple_model.score(x_data_test, y_data_test) print('R2') print(r_squared)
你可以使用pandas库中的read_csv()函数读取你的.csv文件,并将其转换为numpy数组,然后将其用于代码中的x_data和y_data。以下是代码示例:
```python
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
# 读取csv文件并将其转换为numpy数组
data = pd.read_csv('your_file.csv')
x_data = np.array(data.iloc[:, :-1])
y_data = np.array(data.iloc[:, -1])
x_train,x_test,y_train,y_test = train_test_split(x_data, y_data,test_size= 0.25,random_state= 1001)
x_data_train = x_train[:, 5].reshape(-1, 1)
y_data_train = y_train.reshape(-1, 1)
x_data_test = x_test[:, 5].reshape(-1, 1)
y_data_test = y_test.reshape(-1, 1)
simple_model = LinearRegression()
simple_model.fit(x_data_train,y_data_train)
y_data_test_p = simple_model.predict(x_data_test)
plt.subplot(1, 1, 1)
plt.scatter(x_data_test,y_data_test,s = 20, color="r")
plt.scatter(x_data_test,y_data_test_p,s = 20, color="b")
plt.xlabel('Room Number')
plt.ylabel('Price')
plt.title('your_feature_name')
plt.show()
r_squared = simple_model.score(x_data_test, y_data_test)
print('R2')
print(r_squared)
```
请注意,你需要将代码中的“your_file.csv”和“your_feature_name”替换为你自己的.csv文件名和特征名称。
阅读全文