首页countvectorizer 示例

countvectorizer 示例

时间: 2023-08-10 17:42:55 浏览: 67

下面是一个使用CountVectorizer进行文本特征提取的简单示例： ```python from sklearn.feature_extraction.text import CountVectorizer # 创建 CountVectorizer 对象 vectorizer = CountVectorizer() # 训练数据集 corpus = [ 'This is the first document', 'This is the second second document', 'And the third one', 'Is this the first document' ] # 生成文本特征矩阵 X = vectorizer.fit_transform(corpus) # 输出特征矩阵 print(X.toarray()) # 输出特征词汇表 print(vectorizer.get_feature_names()) ``` 输出结果为： ``` [[0 1 1 1 0 0 1 0 1] [0 1 0 1 0 2 1 0 1] [1 0 0 0 1 0 0 1 0] [0 1 1 1 0 0 1 0 1]] ['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this'] ``` 可以看到，通过CountVectorizer，四个文本被转换成了一个大小为(4, 9)的特征矩阵，其中每一行代表一个文本，每一列代表一个特征词。

阅读全文