读取附件Who Moved My Cheese.txt中的内容,要求拆分为中英两个文件,文件名为词频数量最多的词
时间: 2024-12-20 07:25:00 浏览: 5
Python 合并多个TXT文件并统计词频的实现
为了读取文本文件"Who Moved My Cheese.txt"的内容并将其拆分成英文和中文两部分文件,我们可以使用Python语言中的`pandas`库来处理文本,并使用`jieba`库进行中文分词。假设文本是纯英文或混合了英文和少量中文,我们首先需要对文本进行预处理,然后计算词频。
以下是步骤:
1. **导入所需库**:
```python
import pandas as pd
from collections import Counter
import re
import jieba
```
2. **读取文本**:
```python
with open("WhoMovedMyCheese.txt", "r", encoding="utf-8") as file:
text = file.read()
```
3. **提取英文部分** (如果文件全英文):
```python
# 如果文本全是英文,可以直接计数
if not any(c.isdigit() for c in text): # 判断是否有数字,作为英文判断条件
english_words = re.findall(r'\b\w+\b', text)
else:
# 如果有中文,先分割英文再计数
chinese_pattern = "[^\u4e00-\u9fa5]"
text_split = re.split(chinese_pattern, text)
english_words = [word for part in text_split if word.isalpha()]
```
4. **计算英文词频**:
```python
english_word_counts = Counter(english_words)
most_common_english = english_word_counts.most_common(1) # 获取最常出现的英文单词及其频率
```
5. **中文分词及计数**:
```python
if any(c.isdigit() for c in text): # 确定包含中文
chinese_text = text.translate(str.maketrans('', '', string.punctuation)) # 移除标点符号
words = jieba.lcut(chinese_text)
chinese_word_counts = Counter(words)
most_common_chinese = chinese_word_counts.most_common(1) # 获取最常出现的中文单词及其频率
```
6. **保存到文件**:
```python
# 英文部分
eng_word, eng_freq = most_common_english[0]
with open(f"English_Frequency_{eng_word}.txt", "w", encoding="utf-8") as file:
file.write(f"The most common English word is '{eng_word}' with frequency {eng_freq}.")
# 中文部分(如果有)
chi_word, chi_freq = most_common_chinese[0] if most_common_chinese else ("无", 0)
with open(f"Chinese_Frequency_{chi_word}.txt", "w", encoding="utf-8") as file:
file.write(f"The most common Chinese character is '{chi_word}' with frequency {chi_freq}.")
```
阅读全文