用node写计算长度不相同的两个文本之间的相似度的代码
时间: 2024-02-29 16:51:45 浏览: 64
计算文本相似度
计算两个文本之间的相似度可以使用余弦相似度公式来实现。以下是使用Node.js编写的示例代码:
```javascript
function calculateSimilarity(text1, text2) {
// 将文本转换为单词数组
const words1 = text1.toLowerCase().split(/[^\w]+/).filter(Boolean);
const words2 = text2.toLowerCase().split(/[^\w]+/).filter(Boolean);
// 将所有单词放入一个集合中,以便计算出所有单词的数量
const allWords = new Set([...words1, ...words2]);
// 计算每个文本中每个单词的出现次数
const wordCounts1 = countWords(words1, allWords);
const wordCounts2 = countWords(words2, allWords);
// 计算余弦相似度
const dotProduct = calculateDotProduct(wordCounts1, wordCounts2);
const magnitude1 = calculateMagnitude(wordCounts1);
const magnitude2 = calculateMagnitude(wordCounts2);
const similarity = dotProduct / (magnitude1 * magnitude2);
return similarity;
}
// 计算单词在文本中的出现次数
function countWords(words, allWords) {
const wordCounts = {};
for (const word of allWords) {
wordCounts[word] = 0;
}
for (const word of words) {
wordCounts[word]++;
}
return wordCounts;
}
// 计算两个文本的点积
function calculateDotProduct(wordCounts1, wordCounts2) {
let dotProduct = 0;
for (const word in wordCounts1) {
dotProduct += wordCounts1[word] * wordCounts2[word];
}
return dotProduct;
}
// 计算文本的向量大小
function calculateMagnitude(wordCounts) {
let sumOfSquares = 0;
for (const word in wordCounts) {
sumOfSquares += wordCounts[word] ** 2;
}
return Math.sqrt(sumOfSquares);
}
```
使用示例:
```javascript
const text1 = "Hello world! This is a test.";
const text2 = "This is a test. Hello world!";
const similarity = calculateSimilarity(text1, text2);
console.log(`The similarity between the texts is ${similarity}`);
```
输出结果:
```
The similarity between the texts is 1
```
请注意,此示例代码假设输入的文本为纯文本字符串。如果您的输入包含HTML标记或其他格式,则需要对文本进行预处理,以便正确提取单词。
阅读全文