编写Java程序白鹿原词频统计
时间: 2024-09-23 13:02:01 浏览: 38
编写Java程序来统计白鹿原这本书中的词频,通常需要以下几个步骤:
1. **数据准备**:首先,你需要有一个文本文件,包含《白鹿原》的内容。可以将书籍内容读取到字符串变量中。
```java
String bookContent = readFile("白鹿原.txt");
```
2. **分词处理**:接着,使用正则表达式或者现成的库(如开源的Jieba Java版等)对文本进行分词。
```java
List<String> words = Arrays.asList(bookContent.split("\\s+"));
```
3. **创建哈希表**:使用HashMap存储每个单词及其出现的次数,键是单词,值是计数。
```java
Map<String, Integer> wordCount = new HashMap<>();
for (String word : words) {
wordCount.put(word, wordCount.getOrDefault(word, 0) + 1);
}
```
4. **统计并打印结果**:遍历哈希表,输出每个单词及其频率。
```java
wordCount.forEach((word, count) -> System.out.println(String.format("%s: %d", word, count)));
```
5. **异常处理**:记得处理可能出现的错误,比如文件读取失败、空格分割导致的无效单词等。
完整示例代码片段:
```java
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.HashMap;
import java.util.List;
public class WordFrequencyCounter {
public static void main(String[] args) {
try {
String bookFile = "白鹿原.txt";
List<String> words = getWordsFromFile(bookFile);
Map<String, Integer> wordCount = countWords(words);
printWordFrequency(wordCount);
} catch (Exception e) {
e.printStackTrace();
}
}
private static List<String> getWordsFromFile(String fileName) throws Exception {
// 使用BufferedReader读取文件内容,并分词
BufferedReader reader = new BufferedReader(new FileReader(fileName));
StringBuilder content = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
content.append(line).append(" ");
}
reader.close();
return Arrays.asList(content.toString().split("\\s+"));
}
private static Map<String, Integer> countWords(List<String> words) {
// 创建并统计词频
Map<String, Integer> wordCount = new HashMap<>();
for (String word : words) {
wordCount.put(word, wordCount.getOrDefault(word, 0) + 1);
}
return wordCount;
}
private static void printWordFrequency(Map<String, Integer> wordCount) {
wordCount.forEach((word, count) -> System.out.println(String.format("%s: %d", word, count)));
}
}
```
阅读全文