翻译以下内容到美式英语,根据单词难度分为的三类,我们研究对应的单词属性。我们统计了三种属性在不同类别中的平均值,如图()可以明显看出,在简单模式下,对应的单词使用频率较高,重复单词数少,且重复字母之间的距离较小;在较困难模式下,对应单词的使用频率低,单词重复数比较多但是重复字母之间距离较大;在困难模式下,对应单词的使用频率低,单词重复数最多且重复字母之间距离较大。 我们对EERIE进行数据化属性拆分,dis、re、fre分别为2,3,6.649,输catboost分类模型,得到该词属于第三类。 四.(准确性说明)根据catboost算法得到的分类器对于测试集的准确率达到0.77,此外在第二问我们对EERIE单词进行7个比率预测,得到7个比率的分布与kmeans聚类得到的三类的中心点数据进行相关性检验,得到EERIE的分布比率和第三类的相关性系数最大,进一步验证了我们根据catboost算法对EERIE单词的分类是准确的。
时间: 2023-03-12 09:42:05 浏览: 189
difficulty:使用单词频率数据评估英语单词难度
We conducted a data analysis of the words according to the three categories of difficulty, and studied the corresponding word attributes. We calculated the average values of the three attributes in different categories, as shown in the figure (...), it can be clearly seen that, in the simple mode, the corresponding words are used more frequently, fewer repeated words, and the distance between repeated letters is small; In the more difficult mode, the corresponding words are used less frequently, more repeated words but the distance between repeated letters is larger; In the difficult mode, the corresponding words are used less frequently, the most repeated words and the distance between the repeated letters is larger. We conducted data attribute splitting for the word EERIE, dis, re, and fre were 2, 3, and 6.649 respectively. We used the catboost classification model and found that the word belonged to the third category. In the fourth question, we verified the accuracy of the classifier. According to the catboost algorithm, the accuracy of the test set was 0.77. In addition, we predicted seven rates for the word EERIE and found that the correlation coefficient between the distribution rate of EERIE and the center data of the third class obtained by kmeans clustering was the highest, further verifying the accuracy of the classification of EERIE according to the catboost algorithm.
阅读全文