理解word2vec：斯坦福CS224n作业二解析

需积分: 9 60 浏览量更新于2024-09-06 收藏 291KB PDF 举报

"这是斯坦福大学CS224n课程关于word2vec的作业描述，主要涉及word2vec算法的理解和应用。" 在自然语言处理领域，word2vec是一种广泛使用的模型，它通过学习词汇在语料库中的上下文关系来表示单词的分布式向量。这个模型由Tomas Mikolov等人提出，它主要分为两种变体：连续词袋模型（CBOW）和 Skip-gram 模型。这里我们主要讨论的是Skip-gram模型，它是作业的重点。 Skip-gram模型的核心思想是“一个词可以通过它的上下文来理解”（a word is known by the company it keeps）。换句话说，模型试图预测一个中心词周围的上下文词。在给定的描述中，例如中心词是"banking"，上下文窗口大小为2，那么“turning”，“into”，“crises”，和“as”就是上下文词，即“outside words”。模型的目标是准确地学习条件概率分布P(O|C)，即给定中心词C时，预测上下文词O的概率。具体来说，P(O=o|C=c)是中心词为c时，出现特定的上下文词o的概率。在word2vec中，这个条件概率分布是通过向量内积和softmax函数来计算的。这里的向量包括中心词的向量表示\(v_c\)和上下文词的向量表示\(u_o\)。公式可以表示为： \[ P(O=o|C=c) = \frac{exp(u^\top ov_c)}{\sum_{w \in \text{Vocab}} exp(u^\top wv_c)} \] 其中，Vocab是词汇表，包含了所有可能的单词。softmax函数使得所有概率之和为1，确保了概率的合理性。向量\(u\)和\(v\)是通过训练数据学习得到的，它们捕捉了词与词之间的语义关系。训练过程中，模型通常采用负采样或者Hierarchical Softmax等方法优化计算效率。负采样是在每个训练步骤中，除了真实的上下文词之外，还会随机选择一些“负样本”单词进行预测，这样可以减少计算复杂度。在实际应用中，word2vec模型表现出强大的能力，能够捕捉到词汇之间的隐含关系，比如“king”-“man”="queen"-"woman"这样的类比关系。它也被广泛应用在推荐系统、文本分类、情感分析等任务中。这份作业将要求学生深入理解word2vec的工作原理，包括其数学模型和训练过程，并可能涉及到模型优化、参数调整等方面的内容。完成这个作业需要对神经网络、概率论和矩阵运算有较好的理解。

CS 224n Assignment #2: word2vec (43 Points)

Due on Tuesday Jan. 21, 2020 by 4:30pm (before class)

1 Written: Understanding word2vec (23 points)

Let’s have a quick refresher on the word2vec algorithm. The key insight behind word2vec is that ‘a word

is known by the company it keeps’. Concretely, suppose we have a ‘center’ word c and a contextual window

surrounding c. We shall refer to words that lie in this contextual window as ‘outside words’. For example,

in Figure 1 we see that the center word c is ‘banking’. Since the context window size is 2, the outside words

are ‘turning’, ‘into’, ‘crises’, and ‘as’.

The goal of the skip-gram word2vec algorithm is to accurately learn the probability distribution P (O|C).

Given a speciﬁc word o and a speciﬁc word c, we want to calculate P (O = o|C = c), which is the probability

that word o is an ‘outside’ word for c, i.e., the probability that o falls within the contextual window of c.

Figure 1: The word2vec skip-gram prediction model with window size 2

In word2vec, the conditional probability distribution is given by taking vector dot-products and applying

the softmax function:

P (O = o | C = c) =

exp(u

)

w∈Vocab

exp(u

)

(1)

Here, u

is the ‘outside’ vector representing outside word o, and v

is the ‘center’ vector representing center

word c. To contain these parameters, we have two matrices, U and V . The columns of U are all the ‘outside’

vectors u

. The columns of V are all of the ‘center’ vectors v

. Both U and V contain a vector for every

w ∈ Vocabulary.

Recall from lectures that, for a single pair of words c and o, the loss is given by:

naive-softmax

, o, U ) = − log P (O = o|C = c). (2)

Another way to view this loss is as the cross-entropy

between the true distribution y and the predicted

distribution

y. Here, both y and

y are vectors with length equal to the number of words in the vocabulary.

Furthermore, the k

entry in these vectors indicates the conditional probability of the k

word being an

‘outside word’ for the given c. The true empirical distribution y is a one-hot vector with a 1 for the true out-

side word o, and 0 everywhere else. The predicted distribution

y is the probability distribution P (O|C = c)

given by our model in equation (1).

Assume that every word in our vocabulary is matched to an integer number k. Bolded lowercase letters represent vectors.

is both the k

column of U and the ‘outside’ word vector for the word indexed by k. v

is both the k

column of V and

the ‘center’ word vector for the word indexed by k. In order to simplify notation we shall interchangeably use k to

refer to the word and the index-of-the-word.

The Cross Entropy Loss between the true (discrete) probability distribution p and another distribution q is −

log(q

下载后可阅读完整内容，剩余3页未读，立即下载

三七、

粉丝: 25
资源: 1

理解word2vec：斯坦福CS224n作业二解析

Deep Learning 实战之 word2vec.pdf

( 12-word2vec.pdf )

深入理解word2vec.pdf

自然语言处理Word2Vec.pdf

Deep Learning实战之word2vec.pdf

word2vec.pdf+计算机视觉.pdf文档 基于python+tensorflow《深度学习在互联网上的应用》——寒小阳

word2vec.tar.gz 源码 安装文件

word2vec讲义.pdf

word2vec源码解析.pdf

word2vec数学原理.pdf

最新资源

word2vec.pdf+计算机视觉.pdf文档基于python+tensorflow《深度学习在互联网上的应用》——寒小阳

word2vec.tar.gz 源码安装文件