请用php实现tf-idf算法
时间: 2023-09-28 10:12:02 浏览: 298
如何用TF-IDF算法提取文本中的关键词?
好的,以下是一个简单的 PHP 实现 TF-IDF 算法的示例代码:
```php
<?php
// 定义文本数据
$documents = array(
"The sun is shining",
"The weather is sweet",
"The sun is shining and the weather is sweet"
);
// 对文本进行分词处理
$words = array();
foreach ($documents as $document) {
$words[] = explode(' ', strtolower($document));
}
// 统计词频
$wordCounts = array();
foreach ($words as $wordList) {
foreach ($wordList as $word) {
if (!isset($wordCounts[$word])) {
$wordCounts[$word] = 0;
}
$wordCounts[$word]++;
}
}
// 计算 TF-IDF 值
$tfidf = array();
$totalDocuments = count($documents);
foreach ($words as $documentIndex => $wordList) {
foreach ($wordList as $word) {
// 计算 TF 值
$tf = $wordCounts[$word] / count($wordList);
// 计算 IDF 值
$df = 0;
foreach ($words as $doc) {
if (in_array($word, $doc)) {
$df++;
}
}
$idf = log($totalDocuments / $df);
// 计算 TF-IDF 值
$tfidf[$documentIndex][$word] = $tf * $idf;
}
}
// 输出结果
foreach ($tfidf as $documentIndex => $values) {
echo "Document " . ($documentIndex + 1) . ":<br>";
foreach ($values as $word => $tfidfValue) {
echo $word . " - " . $tfidfValue . "<br>";
}
echo "<br>";
}
?>
```
请注意,此示例代码只是一个简单的示例,实际应用中需要考虑更多的因素,如停用词、词干提取等。
阅读全文