python清洗。txt文件代码
时间: 2023-11-12 17:08:56 浏览: 97
以下是一个简单的Python代码示例,可以用于清洗文本文件:
```python
import string
def clean_text(text):
# remove punctuation
text = text.translate(str.maketrans('', '', string.punctuation))
# convert to lowercase
text = text.lower()
# remove numbers and special characters
text = ''.join(c for c in text if c.isalpha() or c.isspace())
# split into words
words = text.split()
# remove stopwords
stopwords = set(['the', 'and', 'a', 'an', 'in', 'of', 'to', 'that', 'it', 'with', 'for', 'on', 'is', 'was', 'by', 'as'])
words = [w for w in words if w not in stopwords]
# join words back into text
text = ' '.join(words)
return text
# example usage
with open('input.txt', 'r') as f:
text = f.read()
clean_text = clean_text(text)
with open('output.txt', 'w') as out:
out.write(clean_text)
```
这个代码可以去除文本中的标点符号、数字和特殊字符,转换为小写字母,去除常用停用词,并将单词分割和连接成干净的文本。
阅读全文