从pycharm中UCI数据库下载Bank Marketing数据集,实现逻辑回归二分类,
时间: 2024-09-16 21:05:43 浏览: 122
鸢尾花数据集实现简单一元线性回归.py
在PyCharm中使用UCI机器学习库(比如mlxtend或sklearn内置的load_uci函数)下载Bank Marketing数据集并实现逻辑回归二分类的步骤如下:
1. **安装所需库**:
首先,你需要确保已经安装了以下库:
```shell
pip install pandas mlxtend
```
2. **下载数据**:
使用mlxtend库的`fetch_bank_marketing()`函数可以获取数据集:
```python
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.data import loadlocal_csv
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank-additional.zip" # 数据集链接
data = loadlocal_csv(url, column_names=True) # 载入数据,column_names=True用于指定列名
```
此时,`data`变量会包含数据的CSV文件内容。
3. **数据预处理**:
- 分割特征(X)和目标变量(y):
```python
X = data.drop('y', axis=1)
y = data['y']
```
- 将类别特征转换为数值,因为逻辑回归需要数值输入:
```python
te = TransactionEncoder()
te_ary = te.fit(X).transform(X)
X = pd.DataFrame(te_ary, columns=te.columns_)
```
4. **分割数据集**:
```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```
5. **训练逻辑回归模型**:
```python
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(solver='liblinear') # 'liblinear'适用于小规模数据,精度较高
lr.fit(X_train, y_train)
```
6. **模型评估**:
```python
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
y_pred = lr.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(cm)
print("Classification Report:")
print(classification_report(y_test, y_pred))
```
7. **保存和使用模型**:
如果想保存模型以便后续使用,可以这样做:
```python
import pickle
with open('bank_marketing_lr_model.pkl', 'wb') as file:
pickle.dump(lr, file)
```
然后,加载模型:
```python
with open('bank_marketing_lr_model.pkl', 'rb') as file:
loaded_model = pickle.load(file)
loaded_y_pred = loaded_model.predict(X_test)
```
阅读全文