为什么以下代码读出的lab4_result.txt显示的结果乱码,请对以下代码进行修改import pandas as pd from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.naive_bayes import MultinomialNB from sklearn.metrics import accuracy_score # 读取训练数据和测试数据 train_data = pd.read_csv('data_train.csv') test_data = pd.read_csv('data_test.csv') # 提取特征向量 vectorizer = TfidfVectorizer() X_train = vectorizer.fit_transform(train_data['title'] + ' ' + train_data['keywords']) X_test = vectorizer.transform(test_data['title'] + ' ' + test_data['keywords']) # 训练模型 model = MultinomialNB() model.fit(X_train, train_data['label']) # 预测结果 y_pred = model.predict(X_test) # 输出分类结果 with open('lab4_result.txt', 'w') as f: for i, y in enumerate(y_pred): f.write('学号***姓名***{}\n'.format(y))
时间: 2023-06-25 11:03:00 浏览: 117
这个问题很可能是由于文件编码格式不一致导致的。可以在打开文件时指定编码格式,例如使用UTF-8编码格式打开文件,可以修改代码如下:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
# 读取训练数据和测试数据
train_data = pd.read_csv('data_train.csv', encoding='utf-8')
test_data = pd.read_csv('data_test.csv', encoding='utf-8')
# 提取特征向量
vectorizer = TfidfVectorizer()
X_train = vectorizer.fit_transform(train_data['title'] + ' ' + train_data['keywords'])
X_test = vectorizer.transform(test_data['title'] + ' ' + test_data['keywords'])
# 训练模型
model = MultinomialNB()
model.fit(X_train, train_data['label'])
# 预测结果
y_pred = model.predict(X_test)
# 输出分类结果
with open('lab4_result.txt', 'w', encoding='utf-8') as f:
for i, y in enumerate(y_pred):
f.write('学号***姓名***{}\n'.format(y))
如果还是无法解决问题,可以尝试将文件的编码格式转换为UTF-8格式。
相关推荐


















