使用Python的TensorFlow2.0将波士顿房价数据集分为训练集,验证集和测试集(6:2:2),进行Tensorflow2.0的数据加载、线性回归模型的搭建、线性回归模型的交叉验证、模型保持和新数据预测
时间: 2023-06-10 18:06:19 浏览: 99
首先,我们需要加载波士顿房价数据集。可以使用`sklearn`库中的`load_boston`方法来加载数据集:
```python
from sklearn.datasets import load_boston
import numpy as np
boston = load_boston()
X = boston.data
y = boston.target
```
接下来,将数据集分为训练集、验证集和测试集。我们可以使用`train_test_split`方法来实现:
```python
from sklearn.model_selection import train_test_split
# 将数据集分为训练集、验证集和测试集(6:2:2)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)
X_valid, X_test, y_valid, y_test = train_test_split(X_test, y_test, test_size=0.5, random_state=42)
```
然后,我们需要进行数据标准化,以便于模型的训练和预测:
```python
from sklearn.preprocessing import StandardScaler
# 对数据进行标准化
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_valid = scaler.transform(X_valid)
X_test = scaler.transform(X_test)
```
接下来,我们可以使用TensorFlow2.0搭建线性回归模型:
```python
import tensorflow as tf
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(1, input_shape=[13])
])
```
然后,我们可以使用交叉验证来评估模型的性能:
```python
from sklearn.model_selection import KFold
# 交叉验证
k = 5
kf = KFold(n_splits=k, shuffle=True, random_state=42)
for i, (train_index, val_index) in enumerate(kf.split(X_train, y_train)):
print("Fold ", i)
X_fold_train, y_fold_train = X_train[train_index], y_train[train_index]
X_fold_val, y_fold_val = X_train[val_index], y_train[val_index]
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(1, input_shape=[13])
])
model.compile(loss="mse", optimizer=tf.keras.optimizers.SGD(lr=1e-3))
history = model.fit(X_fold_train, y_fold_train, epochs=50, validation_data=(X_fold_val, y_fold_val))
```
接下来,我们可以使用`ModelCheckpoint`方法保存最好的模型:
```python
checkpoint_cb = tf.keras.callbacks.ModelCheckpoint("best_model.h5", save_best_only=True)
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(1, input_shape=[13])
])
model.compile(loss="mse", optimizer=tf.keras.optimizers.SGD(lr=1e-3))
history = model.fit(X_train, y_train, epochs=50, validation_data=(X_valid, y_valid), callbacks=[checkpoint_cb])
model = tf.keras.models.load_model("best_model.h5")
```
最后,我们可以使用训练好的模型进行新数据的预测:
```python
y_pred = model.predict(X_test)
```
完整代码如下:
```python
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split, KFold
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
# 加载数据集
boston = load_boston()
X = boston.data
y = boston.target
# 将数据集分为训练集、验证集和测试集(6:2:2)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)
X_valid, X_test, y_valid, y_test = train_test_split(X_test, y_test, test_size=0.5, random_state=42)
# 对数据进行标准化
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_valid = scaler.transform(X_valid)
X_test = scaler.transform(X_test)
# 交叉验证
k = 5
kf = KFold(n_splits=k, shuffle=True, random_state=42)
for i, (train_index, val_index) in enumerate(kf.split(X_train, y_train)):
print("Fold ", i)
X_fold_train, y_fold_train = X_train[train_index], y_train[train_index]
X_fold_val, y_fold_val = X_train[val_index], y_train[val_index]
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(1, input_shape=[13])
])
model.compile(loss="mse", optimizer=tf.keras.optimizers.SGD(lr=1e-3))
history = model.fit(X_fold_train, y_fold_train, epochs=50, validation_data=(X_fold_val, y_fold_val))
# 保存最好的模型
checkpoint_cb = tf.keras.callbacks.ModelCheckpoint("best_model.h5", save_best_only=True)
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(1, input_shape=[13])
])
model.compile(loss="mse", optimizer=tf.keras.optimizers.SGD(lr=1e-3))
history = model.fit(X_train, y_train, epochs=50, validation_data=(X_valid, y_valid), callbacks=[checkpoint_cb])
model = tf.keras.models.load_model("best_model.h5")
# 使用模型进行预测
y_pred = model.predict(X_test)
```
阅读全文