找到‘Take me hand ’的英文歌词,并保存 到 lyric1.txt。从这个文件中按行读取歌词,计算每 首歌词中单词的出现频次。对单词的频次进行升序排序,将数据写入到 sort1.txt。排序后的文件内容以“单词:频次 ”的方式保存,并尝试去除这些高频单词
时间: 2024-10-05 18:03:16 浏览: 14
首先,你需要确保'lyric1.txt'文件已经存在并且包含正确的歌词,或者能够成功下载或获取到'Take me hand'的歌词并写入这个文件。这里假设你已经有了歌词,并且知道如何操作文件。下面是一个步骤说明:
1. 将歌词保存到`lyric1.txt`:
```plaintext
# 假设我们有歌词
lyrics = "Take me by the hand, and we'll go somewhere far away..."
with open('lyric1.txt', 'w') as f:
f.write(lyrics)
```
2. 计算单词频次并写入`sort1.txt`:
```python
with open('lyric1.txt', 'r') as f_in, open('sort1.txt', 'w') as f_out:
lines = f_in.readlines()
word_count = {}
for line in lines:
words = line.strip().split()
for word in words:
if word in word_count:
word_count[word] += 1
else:
word_count[word] = 1
sorted_word_count = sorted(word_count.items(), key=lambda x: x[1])
for word, count in sorted_word_count:
f_out.write(f"{word}: {count}\n")
```
3. 去除高频单词 (这里假设高频阈值是10):
```python
with open('sort1.txt', 'r') as f_in, open('filtered_sort1.txt', 'w') as f_out:
high_freq_threshold = 10
seen_words = set()
for line in f_in:
word, count = line.split(': ')
if int(count) < high_freq_threshold and word not in seen_words:
f_out.write(line)
seen_words.add(word)
```
注意,以上代码假设你想要保留出现频率低于10次的单词。如果你想要自定义这个阈值,只需修改`high_freq_threshold`即可。