java实现中文字符串余弦相似度匹配
时间: 2023-08-01 10:14:31 浏览: 303
Java基于余弦方法实现的计算相似度算法示例
以下是Java实现中文字符串余弦相似度匹配的示例代码:
```
import java.util.HashMap;
import java.util.Map;
public class ChineseCosineSimilarity {
public static double cosineSimilarity(String text1, String text2) {
Map<String, Integer> wordFrequencyMap1 = new HashMap<>();
Map<String, Integer> wordFrequencyMap2 = new HashMap<>();
String[] words1 = text1.split("");
String[] words2 = text2.split("");
for (String word : words1) {
if (wordFrequencyMap1.containsKey(word)) {
wordFrequencyMap1.put(word, wordFrequencyMap1.get(word) + 1);
} else {
wordFrequencyMap1.put(word, 1);
}
}
for (String word : words2) {
if (wordFrequencyMap2.containsKey(word)) {
wordFrequencyMap2.put(word, wordFrequencyMap2.get(word) + 1);
} else {
wordFrequencyMap2.put(word, 1);
}
}
double dotProduct = 0.0;
double normA = 0.0;
double normB = 0.0;
for (String word : wordFrequencyMap1.keySet()) {
if (wordFrequencyMap2.containsKey(word)) {
dotProduct += wordFrequencyMap1.get(word) * wordFrequencyMap2.get(word);
}
normA += Math.pow(wordFrequencyMap1.get(word), 2);
}
for (String word : wordFrequencyMap2.keySet()) {
normB += Math.pow(wordFrequencyMap2.get(word), 2);
}
return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}
public static void main(String[] args) {
String text1 = "你好,世界";
String text2 = "世界,你好";
double cosineSimilarity = cosineSimilarity(text1, text2);
System.out.println("Cosine similarity: " + cosineSimilarity);
}
}
```
在`cosineSimilarity`方法中,我们将每个字符串转换为单个字符,然后计算它们的词频。然后,我们可以计算两个字符串的余弦相似度。在`main`方法中,我们定义了两个中文字符串并计算它们的余弦相似度。需要注意的是,在中文字符串中,一个中文字符可能不是一个单词,因此我们将每个中文字符看作一个单词进行处理。
阅读全文