计算两个txt文档中光谱的余弦距离的python代码
时间: 2023-03-26 20:02:54 浏览: 99
可以使用Python中的SciPy库中的spatial.distance.cosine函数来计算两个文本文件中光谱的余弦距离。具体代码如下:
```python
from scipy.spatial import distance
import string
def get_cosine_distance(file1, file2):
with open(file1) as f1, open(file2) as f2:
text1 = f1.read()
text2 = f2.read()
translator = str.maketrans('', '', string.punctuation)
text1 = text1.translate(translator)
text2 = text2.translate(translator)
vector1 = text1.lower().split()
vector2 = text2.lower().split()
# 去除停用词等无关词汇
vector1 = [word for word in vector1 if word not in stopwords]
vector2 = [word for word in vector2 if word not in stopwords]
# 构建词频向量
word_set = set(vector1).union(set(vector2))
vector1_dict = dict.fromkeys(word_set, )
vector2_dict = dict.fromkeys(word_set, )
for word in vector1:
vector1_dict[word] += 1
for word in vector2:
vector2_dict[word] += 1
# 计算余弦距离
vector1_list = [vector1_dict[word] for word in word_set]
vector2_list = [vector2_dict[word] for word in word_set]
cosine_distance = 1 - distance.cosine(vector1_list, vector2_list)
return cosine_distance
```
其中,file1和file2分别表示两个文本文件的路径,stopwords是一个停用词列表,可以根据实际情况进行修改。
阅读全文