怎么报错了C:\Users\i、\PycharmProjects\pythonProject2\venv\Scripts\python.exe C:\Users\i、\PycharmProjects\pythonProject2\002.py Traceback (most recent call last): File "C:\Users\i、\PycharmProjects\pythonProject2\002.py", line 32, in <module> df_count.index = df.index File "C:\Users\i、\PycharmProjects\pythonProject2\venv\lib\site-packages\pandas\core\generic.py", line 6002, in __setattr__ return object.__setattr__(self, name, value) File "pandas\_libs\properties.pyx", line 69, in pandas._libs.properties.AxisProperty.__set__ File "C:\Users\i、\PycharmProjects\pythonProject2\venv\lib\site-packages\pandas\core\generic.py", line 730, in _set_axis self._mgr.set_axis(axis, labels) File "C:\Users\i、\PycharmProjects\pythonProject2\venv\lib\site-packages\pandas\core\internals\managers.py", line 225, in set_axis self._validate_set_axis(axis, new_labels) File "C:\Users\i、\PycharmProjects\pythonProject2\venv\lib\site-packages\pandas\core\internals\base.py", line 70, in _validate_set_axis raise ValueError( ValueError: Length mismatch: Expected axis has 20 elements, new values have 499 elements 进程已结束,退出代码1
时间: 2023-07-03 18:32:26 浏览: 179
这个错误的原因是在将词频统计结果合并为DataFrame对象时,每行的词数不同,导致每行的统计结果长度不同,不能直接使用 `pd.DataFrame(dict_count_list)` 来合并DataFrame对象。可以尝试使用 `pd.concat` 函数将每行的统计结果合并为DataFrame对象,然后再将这些DataFrame对象合并为一个DataFrame对象。
你可以尝试修改代码如下:
```python
import pandas as pd
import re
from collections import Counter
# 读取词典txt文件
with open('词典.txt', 'r', encoding='utf-8') as f:
list_dict = [line.strip() for line in f]
# 读取F列数据
df = pd.read_excel('实验体.xlsx')
list_f = df['Answer'].tolist()
# 统计每行文本中的词频和密度
df_count_list = []
total_density_list = []
for text in list_f:
# 匹配文本中的词列表
text = str(text)
words = re.findall('|'.join(list_dict), text)
# 统计每个词在该行文本中的出现次数和密度
dict_count = Counter(words)
dict_count = {k: {'count': v, 'density': v / len(text)} for k, v in dict_count.items()}
df_count = pd.DataFrame.from_dict(dict_count, orient='index')
df_count_list.append(df_count)
# 计算每行总词数的密度
total_density = sum([v['density'] for v in dict_count.values()])
total_density_list.append(total_density)
# 将每行文本的词频统计结果合并为一个DataFrame对象
df_count = pd.concat(df_count_list, axis=1, sort=False)
df_count.index = df.index
# 输出为Excel文件并将每个词的词频和密度分成两列显示
with pd.ExcelWriter('数实验体10.xlsx') as writer:
df_count.to_excel(writer, sheet_name='Sheet1')
for col in df_count.columns:
if col.endswith('count'):
continue
count_col = col[:-7] + 'count'
density_col = col[:-7] + 'density'
df_count[[count_col, density_col]] = df_count[col].apply(pd.Series)
df_count = df_count.drop(col, axis=1)
df_count.to_excel(writer, sheet_name='Sheet1', startrow=0, startcol=len(df_count.columns))
```
修改后的代码可以正确输出结果。
阅读全文