import pandas as pd import numpy as np import time data = pd.read_excel(r"C:/Users/86178/Desktop/test/test/TF-IDF/SSG hole span版.xlsx") data.head() """ 数据标准化:Z标准化 """ from sklearn.preprocessing import StandardScaler scaler = StandardScaler() data['Hole'] = scaler.fit_transform(data[['Hole']]) data['Span (m)'] = scaler.fit_transform(data[['Span (m)']]) data['Bridge length (m)'] = scaler.fit_transform(data[['Bridge length (m)']]) # 需要进行哑变量编码的列名列表 columns_to_encode = ["Plane linear","Pier type","Foundation type","Bearing type"] # 对指定列进行哑变量编码 data["Plane linear"] = pd.get_dummies(data, columns=data["Plane linear"]) data["Pier type"] = pd.get_dummies(data, columns=data["Pier type"]) data["Foundation type"] = pd.get_dummies(data, columns=data["Foundation type"]) data["Bearing type"] = pd.get_dummies(data, columns=data["Bearing type"]) #对原始数据集变量与标签分离 X_whole = data[["Bridge length (m)","Pier type","Foundation type","Hole","Span (m)", "Bearing type","Plane linear"]] y_whole = data[['Vulnerability grade']]
时间: 2023-08-20 09:54:40 浏览: 112
根据你提供的代码,我注意到一个潜在的问题在于对哑变量进行编码时的参数传递方式。`pd.get_dummies()`函数中的`columns`参数需要接收一个列名列表,而你传递的是整个列数据。这可能导致错误。
你可以将以下部分的代码进行修改:
```python
data["Plane linear"] = pd.get_dummies(data, columns=data["Plane linear"])
data["Pier type"] = pd.get_dummies(data, columns=data["Pier type"])
data["Foundation type"] = pd.get_dummies(data, columns=data["Foundation type"])
data["Bearing type"] = pd.get_dummies(data, columns=data["Bearing type"])
```
修改为:
```python
data = pd.get_dummies(data, columns=["Plane linear", "Pier type", "Foundation type", "Bearing type"])
```
这样就能正确地对指定列进行哑变量编码,并将编码后的结果更新到`data` DataFrame 中的相应列中。
请注意,这只是一个潜在的问题,并不能保证解决你的所有问题。如果你遇到任何其他错误,请提供相关错误信息,以便我能更好地帮助你。
相关问题
## 获取数据 from sklearn.metrics import r2_score import statsmodels.api as sm import numpy as np import pandas as pd import matplotlib.pyplot as plt data = pd.read_table('C:/Users/lb/Desktop/test/k-means_data.txt',sep='\t',engine="python",encoding = 'gbk') data.columns.values data.head()改为读取csv文件
可以使用`pd.read_csv()`函数读取csv文件。将代码中的`pd.read_table()`替换为`pd.read_csv()`,同时指定`delimiter`参数为逗号(默认值即为逗号),即可读取csv文件。下面是修改后的代码:
```python
from sklearn.metrics import r2_score
import statsmodels.api as sm
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('C:/Users/lb/Desktop/test/k-means_data.txt', delimiter=',', encoding='gbk')
data.columns.values
data.head()
```
import numpy as np import pandas as pd path = "F:/Desktop/Xiameimei/" data = pd.read_excel(path + 'shuju.xlsx') data.head()写的有问题吗
代码有问题。应该是这样写:
```
import numpy as np
import pandas as pd
path = "F:/Desktop/Xiameimei/"
data = pd.read_excel(path + 'shuju.xlsx')
data.head()
```
这段代码的作用是导入numpy和pandas库,读取"shuju.xlsx"文件,并展示文件的前5行数据。
阅读全文