cnn垃圾邮件分类代码
时间: 2023-07-26 15:03:55 浏览: 49
这里提供一个简单的CNN垃圾邮件分类代码示例,仅供参考:
```python
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.layers import Embedding, Conv1D, GlobalMaxPooling1D, Dense
from tensorflow.keras.models import Sequential
# 读入数据
data = pd.read_csv('spam.csv', encoding='latin-1')
texts = data['v2'].values
labels = data['v1'].values
# 对文本进行分词和编码
tokenizer = Tokenizer(num_words=10000)
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)
# 对文本序列进行填充
maxlen = 100
x = pad_sequences(sequences, maxlen=maxlen)
# 对标签进行编码
y = np.zeros(shape=(len(labels), 1))
y[labels == 'spam'] = 1
# 划分训练集和测试集
indices = np.arange(len(texts))
np.random.shuffle(indices)
x = x[indices]
y = y[indices]
train_size = int(len(x) * 0.8)
x_train, x_test = x[:train_size], x[train_size:]
y_train, y_test = y[:train_size], y[train_size:]
# 构建CNN模型
model = Sequential()
model.add(Embedding(input_dim=10000, output_dim=32, input_length=maxlen))
model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(GlobalMaxPooling1D())
model.add(Dense(units=1, activation='sigmoid'))
model.summary()
# 编译模型并进行训练
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=64, epochs=10, validation_split=0.2)
# 在测试集上进行评估
results = model.evaluate(x_test, y_test)
print('Test loss:', results[0])
print('Test accuracy:', results[1])
```
需要注意的是,这只是一个简单的示例,实际应用中还需要对数据进行更详细的预处理、调参等操作。