python情感分类
时间: 2023-06-28 08:12:45 浏览: 47
情感分类是自然语言处理中的一个重要任务,它的目标是将文本分为不同的情感类别,如正面、负面和中性等。下面是一个简单的 Python 代码示例,用于情感分类:
1. 安装必要的库
```python
!pip install pandas
!pip install numpy
!pip install scikit-learn
!pip install nltk
!pip install textblob
```
2. 加载数据集
```python
import pandas as pd
df = pd.read_csv('your_dataset.csv', encoding='utf-8')
```
3. 数据预处理
```python
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from textblob import TextBlob
nltk.download('stopwords')
nltk.download('wordnet')
stop_words = stopwords.words('english')
lemmatizer = WordNetLemmatizer()
def preprocess_text(text):
# 去除特殊符号和数字
text = re.sub('[^a-zA-Z]', ' ', text)
# 小写化
text = text.lower()
# 分词
words = text.split()
# 去除停用词
words = [word for word in words if word not in stop_words]
# 词形还原
words = [lemmatizer.lemmatize(word) for word in words]
# 情感极性分析
sentiment = TextBlob(text).sentiment.polarity
return ' '.join(words), sentiment
```
4. 特征提取
```python
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(analyzer='word', binary=True)
X = vectorizer.fit_transform(df['text'])
```
5. 模型训练与评估
```python
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
y = df['label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
clf = MultinomialNB().fit(X_train, y_train)
y_pred = clf.predict(X_test)
print('Accuracy:', accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))
```
这是一个简单的情感分类示例,你可以根据具体的需求和数据集进行相应的调整和优化。