countvectorizer怎么调用调用多个文件
时间: 2023-05-21 10:05:03 浏览: 66
您可以使用以下代码示例调用CountVectorizer并处理多个文件:
```
from sklearn.feature_extraction.text import CountVectorizer
import glob
# Create list of file paths
file_list = glob.glob('path/to/files/*.txt')
# Initialize CountVectorizer
vectorizer = CountVectorizer()
# Create empty list for storing corpus
corpus = []
# Iterate over file list and extract text for each file
for file_path in file_list:
with open(file_path, 'r', encoding='utf-8') as f:
text = f.read()
corpus.append(text)
# Fit the vectorizer to the corpus
vectorizer.fit_transform(corpus)
# Get feature names
feature_names = vectorizer.get_feature_names()
# Get document-term matrix
dtm = vectorizer.transform(corpus)
```
这个示例代码演示了如何使用CountVectorizer创建文档-词项矩阵,并处理多个文本文件。请注意,您需要先安装scikit-learn库,通常可以使用以下命令进行安装:
```
pip install -U scikit-learn
```