C语言tire树词频统计
时间: 2023-07-01 18:15:54 浏览: 181
Trie树(又称字典树、前缀树)是一种数据结构,常用于字符串的存储和查找。在统计词频时,我们可以使用Trie树来进行高效的统计。
具体的实现步骤如下:
1. 定义Trie树节点结构体,并初始化根节点。
```
#define ALPHABET_SIZE 26
struct TrieNode {
struct TrieNode* children[ALPHABET_SIZE];
bool isEndOfWord;
int frequency;
};
struct TrieNode* createNode() {
struct TrieNode* node = (struct TrieNode*)malloc(sizeof(struct TrieNode));
for (int i = 0; i < ALPHABET_SIZE; i++) {
node->children[i] = NULL;
}
node->isEndOfWord = false;
node->frequency = 0;
return node;
}
struct TrieNode* root = createNode();
```
2. 插入单词。
```
void insertWord(char* word) {
struct TrieNode* node = root;
for (int i = 0; word[i] != '\0'; i++) {
int index = word[i] - 'a';
if (node->children[index] == NULL) {
node->children[index] = createNode();
}
node = node->children[index];
}
node->isEndOfWord = true;
node->frequency++;
}
```
3. 统计单词出现次数。
```
void countWords(struct TrieNode* node, char* buffer, int depth) {
if (node->isEndOfWord) {
buffer[depth] = '\0';
printf("%s : %d\n", buffer, node->frequency);
}
for (int i = 0; i < ALPHABET_SIZE; i++) {
if (node->children[i] != NULL) {
buffer[depth] = i + 'a';
countWords(node->children[i], buffer, depth + 1);
}
}
}
void printWords() {
char buffer[MAX_WORD_LENGTH];
countWords(root, buffer, 0);
}
```
4. 读取文本文件,并进行词频统计。
```
#define MAX_LINE_LENGTH 1024
void countWordsFromFile(char* fileName) {
FILE* fp = fopen(fileName, "r");
if (fp == NULL) {
printf("Failed to open file: %s\n", fileName);
return;
}
char line[MAX_LINE_LENGTH];
while (fgets(line, MAX_LINE_LENGTH, fp) != NULL) {
char* word = strtok(line, " \n\t\r");
while (word != NULL) {
insertWord(word);
word = strtok(NULL, " \n\t\r");
}
}
fclose(fp);
}
```
使用以上代码即可对文本文件中的单词进行词频统计。
阅读全文