首页python自然语言处理代码以及注释

python自然语言处理代码以及注释

时间: 2023-07-23 11:28:42 浏览: 144

以下是一个简单的 Python 自然语言处理代码以及注释： ```python # 导入必要的库 import nltk from nltk.tokenize import word_tokenize from nltk.corpus import stopwords from nltk.stem import WordNetLemmatizer # 下载必要的数据和语料库 nltk.download('punkt') nltk.download('stopwords') nltk.download('wordnet') # 定义一个函数来进行文本预处理 def preprocess_text(text): # 将文本转换为小写 text = text.lower() # 分词 words = word_tokenize(text) # 去除停用词 stop_words = set(stopwords.words('english')) words = [word for word in words if word not in stop_words] # 词形还原 lemmatizer = WordNetLemmatizer() words = [lemmatizer.lemmatize(word) for word in words] # 返回处理后的文本 return ' '.join(words) # 测试函数 text = "This is a sample sentence, showing off the stop words filtration and lemmatization." preprocessed_text = preprocess_text(text) print(preprocessed_text) ``` 这个代码演示了如何使用 NLTK 库进行文本预处理，包括将文本转换为小写、分词、去除停用词和词形还原。这些步骤可以帮助我们更好地理解和处理文本数据。

阅读全文