利用Python，根据所给的一段黑神话的文本，构建单词表和出现次数的字典。3、输出频率最高的10个单词，去除冠词、代词和介词等无意义的单词。4、可以定义多个函数分别实现独立的功能。在主模块中串联各个函数，实现总体上的功能。

要完成这个任务，你需要先安装`nltk`库（Natural Language Toolkit），它提供了处理自然语言文本的工具，包括分词、去除停用词等功能。如果你还没有安装，可以通过`pip install nltk`进行安装。以下是实现这个功能的步骤： 1. 安装并导入所需的库： ```python import string from collections import Counter from nltk.corpus import stopwords from nltk.tokenize import word_tokenize, sent_tokenize from nltk.stem import WordNetLemmatizer # 下载停用词和词形还原器 nltk.download('punkt') nltk.download('stopwords') nltk.download('wordnet') ``` 2. 定义辅助函数： - `remove_punctuation`：去除标点符号 - `remove_stopwords`：移除停用词 - `lemmatize_words`：词形还原 - `build_word_list`：构建单词列表 - `count_word_frequency`：计算单词频率 ```python def remove_punctuation(text): return text.translate(str.maketrans('', '', string.punctuation)) def remove_stopwords(words): stop_words = set(stopwords.words("english")) return [word for word in words if word.lower() not in stop_words] def lemmatize_words(words): lemmatizer = WordNetLemmatizer() return [lemmatizer.lemmatize(word) for word in words] def build_word_list(text): sentences = sent_tokenize(text) words = [] for sentence in sentences: words.extend(remove_punctuation(lemmatize_words(word_tokenize(sentence)))) return words def count_word_frequency(word_list): return Counter(word_list).most_common(10) ``` 3. 主程序调用这些函数： ```python if __name__ == "__main__": # 提供黑神话文本 myth_text = "这里插入你的黑神话文本" # 构建单词表并去除无意义的单词 cleaned_words = build_word_list(remove_punctuation(myth_text)) # 计算频率并输出结果 top_10_words = count_word_frequency(cleaned_words) print(f"频率最高的10个单词（去除停用词）：") for word, freq in top_10_words: print(f"{word}: {freq}") ``` 在这个例子中，假设黑神话文本已经存储在`myth_text`变量中。你可以替换为你实际的文本内容。这个脚本首先对文本进行预处理（分句、去标点、词形还原、去停用词），然后计算并输出最常见的10个单词。

阅读全文

大家在看

计算机控制实验74HC4051的使用

软件工程-总体设计概述(ppt-113页).ppt

多文档应用程序MDI-vc++、MFC基础教程

中国移动5G规模试验测试规范--核心网领域--SA基础网元性能测试分册.pdf

CAN分析仪 解析 DBC uds 源码

最新推荐

python 文本单词提取和词频统计的实例

python统计文本文件内单词数量的方法

python利用蒙版抠图(使用PIL.Image和cv2)输出透明背景图

python练习题 ：用户任意输入10个整数到列表中，然后由大到小排列并输出。

Python实现嵌套列表及字典并按某一元素去重复功能示例

RStudio中集成Connections包以优化数据库连接管理

管理建模和仿真的文件

Keil uVision5全面精通指南

flink提交给yarn19个全量同步MYsqlCDC的作业，flink的配置参数怎样设置

PHP博客旅游的探索之旅

CAN分析仪解析 DBC uds 源码

python练习题：用户任意输入10个整数到列表中，然后由大到小排列并输出。