2023年美赛获奖C类论文：Wordle游戏数据分析

版权申诉

86 浏览量更新于2024-06-16 收藏 2.24MB PDF 举报

在2023年的美国数学竞赛(MCM/ICM)中，有一篇名为"Breaking the Wordle"的C类获奖论文，该研究关注了热门社交媒体上Wordle游戏的流行趋势及其对用户行为的影响。Wordle是一款基于单词猜测的游戏，吸引了大量用户参与。论文的核心任务是对游戏数据进行深入分析，特别是探讨时间因素（如季节性和节假日）以及单词属性（如重复字母数、元音字母数、辅音字母数、常见度和频率）如何影响报告的数量和分布。首先，为了预测未来的报告数量，研究人员采用了一种基于先知的时间序列预测模型。他们考虑了趋势、季节性变化以及特定日期的节假日效应。通过对2022年数据的处理和清洗，他们标准化了数据，并提取出关键特征，为模型构建做好准备。根据模型预测，到2023年3月1日，Wordle的报告数量可能在10,355至18,742次之间波动。在报告数量的时间分布方面，论文发现报告数量通常在工作日达到峰值，尤其是在周三，而在周末则相对较低。进一步的研究还探索了单词属性对游戏难度比例的影响。例如，具有重复字母较多的单词可能会增加玩家的猜测难度，从而影响他们在游戏中的报告次数。此外，元音字母和辅音字母的比例也可能是决定游戏策略的一个重要因素。通过这些分析，作者不仅揭示了Wordle游戏的行为模式，还为游戏开发者提供了洞察玩家行为的洞见，这对于优化游戏设计、用户体验和数据分析都具有重要意义。这篇C类获奖论文展示了将数学建模应用于社交媒体现象的有效方法，同时也为未来类似游戏的数据驱动研究树立了典范。

Team # 2314151 Page 4 of 24

2.2 Notations

Table 1: Notations

Symbol Deﬁnition

Timestamp

k Growth rate

The amount of change in the growth rate on the timestamp

m Oﬀset amount

ϵ Error term

N Number of cycles in the seasonality model

Period before and after a holiday

Range of holiday eﬀects

P Signiﬁcance level

3 Data Processing

3.1 Data Cleaning

Topic C reports on the use of Wordle in the past year. However, we found a lot of dirty data in this

report.

Table 2: Dirty data

Contest number Word Number of reported results Number in hard mode 1 try 2 tries 3 tries 4 tries 5 tries 6 tries 7 or more tries (X)

525 clen 26381 2424 1 17 36 31 12 3 0

314 tash 106652 7001 2 19 34 27 13 4 1

540 na

ıve 21947 2075 1 7 24 32 24 11 1

473 marxh 30935 2885 0 9 30 35 19 6 1

207 favor 137586 3073 1 4 15 26 29 21 4

In the data shown above, the two words numbered 525, and 314 do not match the game because they

are only 4 in length, so we inferred that the dataset blundered by under-entering the letters. To solve

such a problem, we found the most similar letters to them instead by comparing them with artiﬁcial

intelligence algorithms. The word numbered 540 is due to a misspelling of the letter, which should

be ”naive.” We searched the word database and found that the word ”marxh,” numbered 473, did not

exist. We then compared the shapes of the words with database analysis and concluded that the correct

spelling should be ”marsh.” The word numbered 207 has an extra space in the input, so it is also an

outlier. We can delete the extra space to get the correct data.

3.2 Outlier rejection and standardization

We use the 68–95–99.7 rule (3σ criterion) to screen and reject outliers[2]. We found an anomaly

in the Number of reported results data for the word ’study’ on 2022/11/30, and we zeroed it to bring it

剩余24页未读，继续阅读

阿拉伯梳子

粉丝: 2706
资源: 5734

2023年美赛获奖C类论文：Wordle游戏数据分析

2023年美赛论文模板.docx

美赛获奖论文

2023美赛summary及C-data

2023年美赛获奖D类论文_2300229.pdf

2023年美赛获奖D类论文_2304962.pdf

2023年美赛获奖C类论文_2309397.pdf

2023年美赛获奖C类论文_2311717.pdf

2023年美赛获奖D类论文_2303967.pdf

2023年美赛获奖C类论文_2311035.pdf

2023年美赛获奖E类论文_2301428.pdf

最新资源