2023年美赛C类获奖论文：Wordle游戏数据分析

版权申诉

170 浏览量更新于2024-06-16 收藏 9.53MB PDF 举报

"2023年美赛获奖C类论文，主题为‘Wordle:OneLetterMakesaDifference’，分析了Wordle游戏自2022年初推出以来的数据，研究了用户参与度、游戏难度与玩家行为之间的关系。" 这篇获奖论文主要探讨了2023年美国数学建模竞赛（MCM/ICM）C类问题，参赛队伍通过分析Wordle这款游戏的数据来揭示其背后的模式和趋势。Wordle是一款在社交媒体上引发热潮的游戏，因其简单的规则和挑战性而广受欢迎。论文首先关注的是随着时间变化的报告结果数量。团队构建了一个ARIMA（自回归整合滑动平均模型）模型，预测2023年3月1日的报告结果数量，结果显示即使在发布后较长时间，Wordle仍保持着高人气。接下来，论文深入研究了影响游戏难度百分比的因素。通过拟合多线性回归模型，研究发现单词中的重复字母数量和单词的频率与游戏难度正相关。此外，玩家提前从社区获取的难度信息可能会影响他们选择的游戏模式。这表明玩家策略和社区反馈对游戏体验有显著影响。论文还对报告结果的分布情况感兴趣，可能涉及了玩家完成游戏的速度、错误次数等统计指标，以及这些指标如何随时间变化。可能还分析了不同难度级别的游戏对玩家留存率或参与度的影响，以及Wordle的社交分享文化对其流行度的贡献。此外，论文可能还讨论了Wordle的传播动力学，包括新玩家的加入速度、游戏的每日参与度波动以及与特定事件（如节假日、新闻热点等）的相关性。通过这些分析，论文不仅揭示了Wordle现象的统计特性，也为理解和预测类似社交媒体游戏的用户行为提供了一种方法论。这篇获奖论文通过数据驱动的方法，展示了如何利用统计建模和机器学习技术来探索和解释现实世界中的复杂现象，尤其是在娱乐和社交媒体领域。这对于理解用户行为、优化游戏设计和预测流行趋势具有重要的理论和实践价值。

Team # 2318982 Page 5 of 25

Table 1: Notations

Symbols Description

The set of states that are reachable in one step of state i.

S The state space of the Markov chain.

W All the words a player may ﬁll in.

The subjective probability that word x is the correct answer.

freq

The word frequency of word x.

The amount of information obtained by ﬁlling in the word x at the opening.

(r)

true

The correct word of the r th day.

The set of words that the player has guessed when he is in state i

(r)

(i, j) The transfer probability from state i to j in Markov chain on day r.

(r)

The number of steps to ﬁrst reach state j from state i on the Markov chain at day r.

(r)

The set of absorbing states of Markov chains on day r.

(r)

absorbed

Number of steps before falling into an absorbing state on Markov chain at day r.

(r) The proportion of all players using strategy k on day r.

where we deﬁne the main parameters while speciﬁc value of those parameters will be given later.

3 Data Preprocessing

Since we are only allowed to use the datasets “Problem_C_Data_Wordle.csv” by COMAP

oﬃcial, we need to pre-process the dataset before solving the problem. An initial inspection of the

dataset showed that there are some outliers and missing values.

• In the word column, we ﬁnd that the length of some words are not equal to ﬁve,such as

“rprobe”, “clen” and “tash”. As mentioned by COMAP oﬃcial, in line 18, for contest 545,

the word listed is “rprobe” while it should be “probe”. By looking up the solution word of

the day published by wordle, we also get that “clen” should be “clean” and “tash” should be

“trash”.

• Additionally, in line 34, for contest 529, the number of reported results listed is “2569”, while

the correct number should be “25569”.

4 Task 1: Number Prediction and Word Attributes

In this section, we predicted the number of reported results on March 1, 2023 by building an

ARIMA model and choosing the optimal parameters. Then we summarize the word attributes and

then explore the eﬀect of word attributes on the percentage of scores reported in the diﬃculty model

by building a multiple linear regression.

4.1 Number Prediction Based on ARIMA Model

Autoregressive integrated moving average, which is known as ARIMA, is a statistical analysis

model that uses time-series data to predict the future trend. The basic idea of ARIMA is that

the data sequence formed by the prediction over time is regarded as a random sequence and a

剩余24页未读，继续阅读

阿拉伯梳子

粉丝: 2733

2023年美赛C类获奖论文：Wordle游戏数据分析

2023年美赛论文模板.docx

2023美赛summary及C-data

2023年全国大学生数学建模比赛C题获奖论文

2023年美赛获奖C类论文_2314151.pdf

2023年美赛获奖A类论文_2321860.pdf

2023年美赛获奖A类论文_2322687.pdf

2023年美赛获奖C类论文_2318036.pdf

2023年美赛获奖D类论文_2300229.pdf

2023年美赛获奖D类论文_2304962.pdf

2023年美赛获奖E类论文_2301428.pdf

最新资源