def split_dot(words): dots=[',','.','!','?'] str_data = [] for w in words: ##*****——请在下方补充代码——***** ##*****——请在上方补充代码——***** return(str_data) def get_eng_dicts(datas): w_all_dict = {} for sample in datas: for token in sample.split(" "): ##*****——请在下方补充代码——***** ##*****——请在上方补充代码——***** sort_w_list = sorted(w_all_dict.items(), key=lambda d: d[1], reverse=True) w_keys = [x for x,_ in sort_w_list[:7000-2]] w_keys.insert(0,"<PAD>") w_keys.insert(0,"<UNK>") w_dict = { x:i for i,x in enumerate(w_keys) } i_dict = { i:x for i,x in enumerate(w_keys) } return w_dict,i_dict
时间: 2024-01-17 07:12:49 浏览: 107
Python库 | dots_for_microarrays-0.2.0.tar.gz
split_dot函数中的代码应该是:
```
if w[-1] in dots:
if len(w) == 1:
continue
str_data.append(w[:-1])
str_data.append(w[-1])
else:
str_data.append(w)
```
get_eng_dicts函数中的代码应该是:
```
if token in w_all_dict:
w_all_dict[token] += 1
else:
w_all_dict[token] = 1
```
阅读全文