用Java统计文件中中文字符出现次数并排序
时间: 2024-03-27 19:36:41 浏览: 23
下面是一个Java程序,可以实现统计文本文件中中文字符出现次数并按照出现次数排序的功能:
```java
import java.io.*;
import java.util.*;
public class ChineseCharCount {
public static void main(String[] args) {
String filename = "test.txt"; // 待统计的文件名
Map<Character, Integer> charMap = new HashMap<>(); // 用于存储中文字符和出现次数的映射
try (BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(filename), "UTF-8"))) {
String line;
while ((line = reader.readLine()) != null) {
for (int i = 0; i < line.length(); i++) {
char c = line.charAt(i);
if (isChineseChar(c)) {
Integer count = charMap.get(c);
if (count == null) {
charMap.put(c, 1);
} else {
charMap.put(c, count + 1);
}
}
}
}
} catch (IOException e) {
e.printStackTrace();
}
List<Map.Entry<Character, Integer>> charList = new ArrayList<>(charMap.entrySet()); // 将映射转换为列表
charList.sort((e1, e2) -> e2.getValue().compareTo(e1.getValue())); // 按照出现次数从大到小排序
for (Map.Entry<Character, Integer> entry : charList) {
System.out.println(entry.getKey() + ": " + entry.getValue());
}
}
// 判断一个字符是否为中文字符
private static boolean isChineseChar(char c) {
Character.UnicodeBlock ub = Character.UnicodeBlock.of(c);
return ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS
|| ub == Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS
|| ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A
|| ub == Character.UnicodeBlock.GENERAL_PUNCTUATION
|| ub == Character.UnicodeBlock.CJK_SYMBOLS_AND_PUNCTUATION
|| ub == Character.UnicodeBlock.HALFWIDTH_AND_FULLWIDTH_FORMS;
}
}
```
程序首先读取指定的文本文件,逐行遍历文件中的每个字符,如果一个字符是中文字符,则将其加入到一个映射中,并记录其出现次数。最后,将映射转换为列表并按照出现次数从大到小排序,输出结果。
程序中使用了`isChineseChar`方法来判断一个字符是否为中文字符,该方法判断的依据是该字符所属的`UnicodeBlock`是否为中文字符所使用的`UnicodeBlock`。