基于TfidfVectorizer()写一个朴素贝叶斯代码 我的文件时excel
时间: 2024-05-10 11:14:26 浏览: 76
首先,需要将Excel文件读入Python中。可以使用pandas库来读取Excel文件:
```python
import pandas as pd
df = pd.read_excel('your_file.xlsx')
```
然后,需要对文本进行分词和向量化。可以使用sklearn库中的TfidfVectorizer()来完成这一步:
```python
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer()
X = tfidf.fit_transform(df['text_column'])
y = df['label_column']
```
这里假设Excel文件中有一个名为`text_column`的列存储文本数据,另一个名为`label_column`的列存储标签数据。
接下来,使用朴素贝叶斯算法进行训练和预测:
```python
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
nb = MultinomialNB()
nb.fit(X_train, y_train)
y_pred = nb.predict(X_test)
```
最后,可以使用sklearn库中的各种评估方法来评估模型的性能,例如准确率、精确率、召回率和F1值等。
完整代码如下:
```python
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
# 读取Excel文件
df = pd.read_excel('your_file.xlsx')
# 分词和向量化
tfidf = TfidfVectorizer()
X = tfidf.fit_transform(df['text_column'])
y = df['label_column']
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 训练朴素贝叶斯模型
nb = MultinomialNB()
nb.fit(X_train, y_train)
# 预测测试集
y_pred = nb.predict(X_test)
# 评估模型性能
print('Accuracy:', accuracy_score(y_test, y_pred))
print('Precision:', precision_score(y_test, y_pred, average='macro'))
print('Recall:', recall_score(y_test, y_pred, average='macro'))
print('F1 score:', f1_score(y_test, y_pred, average='macro'))
```
阅读全文