2023年美国数学竞赛获奖C类论文：深度剖析Wordle游戏策略

版权申诉

86 浏览量更新于2024-06-16 收藏 6.03MB PDF 举报

在2023年的美国数学竞赛(MCM/ICM)中，一篇C类获奖论文聚焦于"Crack the Wordle Puzzle: Word Attribute Analysis Approaches"。该研究针对近年来风靡社交媒体的五字母单词游戏——Wordle展开深入探讨。Wordle玩家的成绩报告对于游戏管理者至关重要，因为这些报告能揭示游戏难度、预测玩家数量并促使他们进行适时调整。论文首先将Wordle游戏的参与过程与传染病的传播模型——SIR模型进行了类比。它将玩家分为五个状态：正在玩的"感染者"，未参与一段时间的"易感者"，对游戏失去兴趣的"已恢复者"，在Twitter上分享游戏体验的"传播者"以及退出游戏的"康复者"。通过这种模拟，研究者试图理解报告中数据的趋势，并据此构建数学模型来预测未来的玩家行为。研究人员采用了多元角度和层次的方法，对Wordle游戏的词汇特性进行了分析，包括但不限于词汇频率、词汇难度分布、玩家策略变化等因素。他们可能还研究了玩家在游戏中发现新词的速度、重复尝试的频率以及随着时间推移游戏难度的变化规律。此外，论文可能探讨了如何利用这些数据分析结果优化游戏设计，比如改进词库选择、调整谜底提示系统，或者提供个性化的游戏体验。通过这种细致入微的研究，论文不仅展示了数学在娱乐行业的应用潜力，也强调了数据分析在提高用户体验和商业决策中的核心作用。这篇获奖论文为理解Wordle游戏的社会动态、用户行为和游戏改进提供了宝贵的洞察，展示了数学理论与实际问题的紧密结合，是信息技术领域的一个精彩案例。

Team # 2301192 Page 5

Table 2: Notations of Word Attributes Used in the Paper

Symbols Denition

Freq Word Frequency

SLF the Sum of Letter Frequencies

BU the Breadth of Usage of a Word

NDLW the Number of Dierent Letters in a Word

a-z the Number of Letters from a to z in a Word

3 Model 1-Integration of Interpretation and Prediction Model

based on Prophet and SIRS

3.1 Data Preprocessing and Exploratory Analysis

3.1.1 Data Collection and Pre-processing

In addressing task 1, it is dispensable to analyze the attributes of words related to the prob-

lem and collect relevant data.The possible factors include the frequency, the breadth of the usage

in dierent elds, the number of dierent letters in words and parts of speech. In general Nat-

ural Language Processing (NLP), there are 36 commonly used parts of speech[2], of which we

selected 18 types relevant to this task as shown in Table 1.

To process missing values, abnormal values and repeated observations in the original data

set, we apply a series of data processing methods: data cleaning, establishment of dummy

variables for discrete variables, logarithmic transformation of the number of reports and

set-up of new attributes. The four steps enable the elimination of extraneous information and

facilitate the identication and extraction of relevant information from the dataset.

Step 1: In the stage of data cleaning, we use Python to check for missing, outlier and du-

plicate values. By measuring length of words, we check for empty or unusually long values.

We nd that there are no empty values but three outliers: ”tash”, ”clen” and ”rprobe”. After

searching and comparing online, we correct those words as ”trash”, ”clean” and ”probe”. Fur-

thermore, using the ”duplicate()” method, we check for duplicate values with no duplicate value

found.

Step 2: To make the discrete variable of part-of-speech easier to be processed by the model,

we construct 17 dummy variables to convert the discrete variable into binary variables.

Step 3: We plan to use a time series model to predict the number of reports on March 1,

2023. In these types of models, it is crucial to eliminate heteroscedasticity in the data. Taking

the logarithm of the data does not change its nature or correlation, but it compresses the scale of

the variable. By shrinking the absolute values of the data, it is easier to eliminate the problem

of heteroscedasticity. Therefore, we logarithmically transform the reported quantity.

Step 4: To comprehensively explore the inuence of various word attributes on reported

Hard-Mode-played scores, we further extract the attributes of words and establish several new

variables. This will be elaborated in Section

3.4.

3.1.2 Data Description and Exploratory Analysis

The data is visualized to dig into the inherent rules, which is helpful for modeling. Figure

剩余24页未读，继续阅读

阿拉伯梳子

粉丝: 2460
资源: 5734

2023年美国数学竞赛获奖C类论文：深度剖析Wordle游戏策略

美赛获奖论文

2023年美赛论文模板.docx

2023美赛summary及C-data

2023年美赛获奖C类论文_2314151.pdf

2023年美赛获奖A类论文_2300336.pdf

2023年美赛获奖A类论文_2300661.pdf

2023年美赛获奖C类论文_2309397.pdf

2023年美赛获奖D类论文_2303967.pdf

2023年美赛获奖E类论文_2305598.pdf

2023年美赛获奖E类论文_2307336.pdf

最新资源