用jupyter notebook代码告诉我
时间: 2024-10-25 21:01:22 浏览: 13
在Jupyter Notebook中,你可以使用Python编写代码来展示垃圾邮件分类或影评情感分析的过程。这里是一个简化的示例:
**垃圾邮件分类 (using Scikit-Learn)**
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
# 1. 数据读取
data = pd.read_csv('Spam_Email_Classification.csv')
# 2. 数据预处理
X = data['text']
y = data['label']
# 3. 划分数据
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 4. 特征工程
vectorizer = CountVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)
# 5. 模型训练 (朴素贝叶斯)
clf = MultinomialNB()
clf.fit(X_train_vec, y_train)
# 6. 预测和评估
y_pred = clf.predict(X_test_vec)
print("Accuracy:", accuracy_score(y_test, y_pred))
```
**影评情感分析 (using TextBlob and Scikit-Learn)**
```python
from textblob import TextBlob
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression, SGDClassifier
from sklearn.metrics import classification_report
# 1. 数据预处理
reviews = open('影评数据1很差2较差3还行4推荐5力荐.txt', 'r').read().split('\n')
labels = [int(review.split()[0]) for review in reviews]
# 2. 数据集划分
train_reviews, test_reviews, train_labels, test_labels = train_test_split(reviews, labels, test_size=0.2)
# 3. 分词和数据向量化
def preprocess(text):
return ' '.join(TextBlob(text).words)
X_train = [preprocess(review) for review in train_reviews]
X_test = [preprocess(review) for review in test_reviews]
vectorizer = TfidfVectorizer()
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)
# 4. 模型对比 (Logistic Regression & SGD Classifier)
models = {
'Logistic Regression': LogisticRegression(),
'SGD Classifier': SGDClassifier(loss='hinge', max_iter=100)
}
for model_name, model in models.items():
model.fit(X_train_tfidf, train_labels)
y_pred = model.predict(X_test_tfidf)
print(f"{model_name}:\n{classification_report(test_labels, y_pred)}")
```
阅读全文