iris_target = data.target # 得到数据对应的标签 iris_features = pd.DataFrame( data=data.data, columns=data.feature_names)为什么y不用变成dataframe
时间: 2024-05-20 10:15:16 浏览: 99
在机器学习领域,通常将输入数据表示为 X,输出数据表示为 y。在这种情况下,通常只需要将目标标签存储为一个一维数组或列表,因为目标变量通常是一个单独的列,而不是多个列。因此,在示例中,我们将 iris 数据集的目标标签存储为一个一维数组,而不是一个 DataFrame。另一方面,我们将特征数据存储为 DataFrame,因为通常输入数据包含多个列。
相关问题
from sklearn import datasets from pandas import DataFrame import pandas as pd x_data=datasets.load_iris().data #导入特征features y_data=datasets.load_iris().target #导入标签labels print("x_data from datasets: \n",x_data) print("y_data from datasets: \n",y_data) x_data=DataFrame(x_data,columns=['花萼长度','花萼宽度','花瓣长度','花瓣宽度']) pd.set_option('display.unicode.east_asian_width',True) #设置列名对齐 print("x_data add index: \n",x_data) x_data['类别']=y_data #新加一列,列标签为“类别” print("x_data add a column: \n",x_data)
这段代码使用scikit-learn库中的iris数据集作为示例,展示了如何导入数据并创建一个包含特征和标签的DataFrame对象。
首先,使用`datasets.load_iris().data`和`datasets.load_iris().target`导入iris数据集的特征和标签。
接着,使用`DataFrame()`将特征数据转换为DataFrame对象,并将其列名称设置为`['花萼长度','花萼宽度','花瓣长度','花瓣宽度']`。
然后,使用`pd.set_option('display.unicode.east_asian_width',True)`设置列名称对齐。
最后,使用`x_data['类别']=y_data`向DataFrame中添加一个新列,其列标签为“类别”,并将y_data作为该列的值。该代码的输出包括原始特征数据,添加列标签的特征数据,以及包含标签数据的DataFrame对象。
上面代码报错E:/python3.9-workspace/python-learn/softmax regression.py:47: MatplotlibDeprecationWarning: Support for FigureCanvases without a required_interactive_framework attribute was deprecated in Matplotlib 3.6 and will be removed two minor releases later. plt.scatter(X_test[y_pred == i, 0], X_test[y_pred == i, 1], color=colors[i], marker=markers[i], E:/python3.9-workspace/python-learn/softmax regression.py:50: MatplotlibDeprecationWarning: Support for FigureCanvases without a required_interactive_framework attribute was deprecated in Matplotlib 3.6 and will be removed two minor releases later. plt.show()
这是由于 Matplotlib 3.6 版本开始,FigureCanvas 对象需要一个 `required_interactive_framework` 属性,而在旧版本中没有该属性,因此会出现警告。这不会影响代码的运行,但建议更新到新版本的 Matplotlib 库。
如果想要避免这个警告,可以在代码中加入以下语句:
```python
import warnings
warnings.filterwarnings("ignore", message="Support for FigureCanvases without a required_interactive_framework attribute was deprecated")
```
这样就可以忽略这个警告了。完整代码如下:
```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings("ignore", message="Support for FigureCanvases without a required_interactive_framework attribute was deprecated")
iris = load_iris()
X = pd.DataFrame(data=iris.data, columns=iris.feature_names)
y = pd.DataFrame(data=iris.target, columns=['target'])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
class SoftmaxRegression:
def __init__(self, n_classes, n_features):
self.n_classes = n_classes
self.n_features = n_features
self.weights = np.zeros((n_features, n_classes))
def softmax(self, z):
return np.exp(z) / np.sum(np.exp(z), axis=1, keepdims=True)
def predict(self, X):
logits = np.dot(X, self.weights)
return np.argmax(self.softmax(logits), axis=1)
def fit(self, X, y, learning_rate=0.1, epochs=1000):
for i in range(epochs):
logits = np.dot(X, self.weights)
y_pred = self.softmax(logits)
loss = -np.mean(np.sum(y * np.log(y_pred), axis=1))
gradient = np.dot(X.T, (y_pred - y))
self.weights -= learning_rate * gradient
if i % 100 == 0:
print(f'Epoch {i}, Loss: {loss:.4f}')
model = SoftmaxRegression(n_classes=3, n_features=4)
model.fit(X_train, pd.get_dummies(y_train['target']).values, learning_rate=0.1, epochs=1000)
y_pred = model.predict(X_test)
colors = ['red', 'green', 'blue']
markers = ['o', 's', 'x']
for i in range(3):
plt.scatter(X_test[y_pred == i, 0], X_test[y_pred == i, 1], color=colors[i], marker=markers[i],
label=iris.target_names[i])
plt.legend()
plt.show()
```
这样就不会再出现警告了。
阅读全文