首页python实现英文txt文本TF-IDF提取关键词

python实现英文txt文本TF-IDF提取关键词

时间: 2023-07-07 16:21:42 浏览: 90

好的，我可以为您提供相关的Python代码。首先，需要安装Python中的nltk和sklearn库，可以使用以下命令进行安装： ``` pip install nltk pip install sklearn ``` 接下来，可以使用以下代码进行TF-IDF提取关键词： ```python import nltk from sklearn.feature_extraction.text import TfidfVectorizer # 读取文本文件 with open('text.txt', 'r') as f: text = f.read() # 使用nltk进行分词 tokens = nltk.word_tokenize(text) # 定义停用词列表 stopwords = nltk.corpus.stopwords.words('english') # 去除停用词 filtered_tokens = [token for token in tokens if token.lower() not in stopwords] # 定义TF-IDF向量化器 tfidf = TfidfVectorizer() # 将文本向量化 tfidf_matrix = tfidf.fit_transform(filtered_tokens) # 获取关键词及其TF-IDF值 feature_names = tfidf.get_feature_names() for col in tfidf_matrix.nonzero()[1]: print(feature_names[col], ' - ', tfidf_matrix[0, col]) ``` 其中，'text.txt'为待提取关键词的文本文件名，可以根据实际情况进行修改。代码执行后，会输出关键词及其TF-IDF值。

阅读全文