python统计csv文件某一列出现最多的词,且每个单元格数据为不定长的列表,列表中每个元素为字符串
时间: 2023-06-15 21:08:11 浏览: 125
遍历文件夹下的CSV文件并统计指定列单词的词频
可以使用Python的csv库和collections库来实现统计csv文件某一列出现最多的词。
假设我们要统计的是csv文件中第二列,可以按照以下步骤实现:
1. 导入csv和collections库
```python
import csv
from collections import Counter
```
2. 打开csv文件并读取数据
```python
with open('data.csv', 'r') as file:
reader = csv.reader(file)
data = list(reader)
```
3. 提取第二列数据
```python
column = [row[1] for row in data]
```
4. 将列表中的字符串拆分为单词,并将所有单词合并为一个列表
```python
words = [word for row in column for word in row.split()]
```
5. 统计单词出现次数
```python
word_count = Counter(words)
```
6. 找出出现次数最多的单词
```python
most_common_word = word_count.most_common(1)[0][0]
```
完整代码:
```python
import csv
from collections import Counter
with open('data.csv', 'r') as file:
reader = csv.reader(file)
data = list(reader)
column = [row[1] for row in data]
words = [word for row in column for word in row.split()]
word_count = Counter(words)
most_common_word = word_count.most_common(1)[0][0]
print('出现次数最多的单词是:', most_common_word)
```
注意:如果每个单元格数据为不定长的列表,需要先将列表转换为字符串再进行拆分和统计。可以使用join()方法将列表中的元素合并为一个字符串,例如:
```python
words = [word for row in column for cell in row for word in ','.join(cell).split()]
```
阅读全文