首页python语言利用pandas删除csv文件第一列并且删除重复行，再对其中"评论内容"进行预料清洗

python语言利用pandas删除csv文件第一列并且删除重复行，再对其中"评论内容"进行预料清洗

时间: 2023-12-30 17:06:40 浏览: 147

可以使用以下代码实现： ```python import pandas as pd import re # 读取csv文件并删除第一列 df = pd.read_csv('filename.csv', usecols=range(1, len(df.columns))) # 删除重复行 df.drop_duplicates(inplace=True) # 定义文本预处理函数 def clean_text(text): # 去除标点符号、数字和特殊字符 text = re.sub(r'[^\w\s]', '', text) text = re.sub(r'\d+', '', text) text = re.sub(r'\s+', ' ', text) text = text.strip() return text.lower() # 对评论内容进行预处理 df['评论内容'] = df['评论内容'].apply(clean_text) # 保存处理后的数据到新的csv文件 df.to_csv('new_filename.csv', index=False) ``` 其中，`filename.csv` 是要处理的原始csv文件的文件名，`new_filename.csv` 是处理后的新文件的文件名。`clean_text()` 函数用于对文本进行预处理，去除标点符号、数字和特殊字符，并将文本转换为小写。最后，将处理后的数据保存到新的csv文件中。

阅读全文