JAVA代码之余弦相似度
时间: 2023-07-02 13:22:49 浏览: 175
计算两个文本向量的余弦相似度可以使用以下Java代码:
```java
import java.util.*;
public class CosineSimilarity {
public static void main(String[] args) {
String[] doc1 = {"hello", "world", "java"};
String[] doc2 = {"hello", "world", "python"};
Map<String, Integer> tf1 = getTermFrequency(doc1);
Map<String, Integer> tf2 = getTermFrequency(doc2);
double cosineSimilarity = getCosineSimilarity(tf1, tf2);
System.out.println("Cosine Similarity: " + cosineSimilarity);
}
public static Map<String, Integer> getTermFrequency(String[] doc) {
Map<String, Integer> tf = new HashMap<>();
for (String term : doc) {
tf.put(term, tf.getOrDefault(term, 0) + 1);
}
return tf;
}
public static double getCosineSimilarity(Map<String, Integer> tf1, Map<String, Integer> tf2) {
Set<String> terms = new HashSet<>(tf1.keySet());
terms.addAll(tf2.keySet());
double dotProduct = 0;
double magnitude1 = 0;
double magnitude2 = 0;
for (String term : terms) {
int count1 = tf1.getOrDefault(term, 0);
int count2 = tf2.getOrDefault(term, 0);
dotProduct += count1 * count2;
magnitude1 += count1 * count1;
magnitude2 += count2 * count2;
}
double magnitude = Math.sqrt(magnitude1) * Math.sqrt(magnitude2);
return magnitude == 0 ? 0 : dotProduct / magnitude;
}
}
```
在这个例子中,我们使用两个字符串数组作为文本向量。`getTermFrequency()`函数接收一个字符串数组并返回一个包含每个单词出现次数的映射。`getCosineSimilarity()`函数计算两个文本向量的余弦相似度。最后,我们在`main()`函数中调用这两个函数并打印输出余弦相似度。
阅读全文