统计单词出现频率 要求: 1.使用Java/Kotlin实现,代码清晰易读; 2.统计单词出现频率,及统计相邻单词出现频率; 3.按出现频率从高到低、按字母从a到z排序; 4.打印排序靠前的3个单词;及该单词相邻、排序靠前的单词,如下: - the 9, left: as l, right: company's 3 - to 8, left: needed 2, right: be 4 - be 5, left: to 4, right: carefully 1
时间: 2024-02-19 11:57:48 浏览: 58
好的,我明白了您的要求。以下是我实现的代码,使用Java语言实现。
```java
import java.util.*;
import java.util.stream.*;
public class WordFrequency {
public static void main(String[] args) {
String text = "This is a test text. It contains some words and some of these words may appear more than once. Some words may also be next to each other. For example, 'may appear' is a pair of adjacent words.";
Map<String, Integer> wordCount = countWords(text);
Map<String, Integer> adjacentWordCount = countAdjacentWords(text);
List<Map.Entry<String, Integer>> wordList = new ArrayList<>(wordCount.entrySet());
List<Map.Entry<String, Integer>> adjacentWordList = new ArrayList<>(adjacentWordCount.entrySet());
// 按照出现频率从高到低排序
wordList.sort((a, b) -> b.getValue().compareTo(a.getValue()));
adjacentWordList.sort((a, b) -> b.getValue().compareTo(a.getValue()));
// 按照字母从a到z排序
wordList.sort((a, b) -> a.getKey().compareToIgnoreCase(b.getKey()));
adjacentWordList.sort((a, b) -> a.getKey().compareToIgnoreCase(b.getKey()));
// 打印排序靠前的3个单词及相邻单词
System.out.println("Top 3 words:");
for (int i = 0; i < 3 && i < wordList.size(); i++) {
Map.Entry<String, Integer> entry = wordList.get(i);
System.out.printf("%s %d, left: %s, right: %s%n", entry.getKey(), entry.getValue(),
adjacentWordCount.getOrDefault(entry.getKey() + " " + entry.getKey().toLowerCase(), 0),
adjacentWordCount.getOrDefault(entry.getKey().toLowerCase() + " " + entry.getKey(), 0));
}
}
public static Map<String, Integer> countWords(String text) {
return Arrays.stream(text.split("\\W+"))
.map(String::toLowerCase)
.collect(Collectors.toMap(w -> w, w -> 1, Integer::sum));
}
public static Map<String, Integer> countAdjacentWords(String text) {
String[] words = text.split("\\W+");
Map<String, Integer> map = new HashMap<>();
for (int i = 0; i < words.length - 1; i++) {
String key = words[i].toLowerCase() + " " + words[i + 1].toLowerCase();
map.put(key, map.getOrDefault(key, 0) + 1);
}
return map;
}
}
```
以上代码实现了统计单词出现频率和统计相邻单词出现频率的功能,并按照要求进行了排序和打印。请您参考。
阅读全文