word2vec参数学习详解

4星 · 超过85%的资源需积分: 50 4 浏览量更新于2024-07-19 收藏 798KB PDF 举报

"word2vec是近年来在自然语言处理（NLP）领域备受关注的技术，它通过训练得到的词向量能够捕获词汇的语义信息，并在多种NLP任务中表现出实用性。这篇文档深入解析了word2vec模型的参数学习过程，包括原始的连续词袋（CBOW）模型和跳过 gram（SG）模型，以及优化技术如层次 Softmax 和负采样。此外，还提供了梯度方程的直观解释和数学推导，适合对神经网络不熟悉的学者深入理解word2vec的工作机制。附录中回顾了神经网络的基础知识，有助于全面掌握word2vec的核心概念和算法细节。" 在word2vec模型中，主要有两种训练方法：CBOW和SG。CBOW模型的目标是通过上下文单词预测目标单词，而SG模型则相反，它是通过目标单词来预测上下文单词。这两种方法都有助于学习到具有语义信息的词向量。 CBOW模型的参数更新通常涉及到计算损失函数的梯度，然后用梯度下降法更新权重。损失函数通常选择交叉熵，通过反向传播算法计算梯度。在训练过程中，词向量会在大量文本数据的迭代中逐渐调整，使得相似的词汇在向量空间中的位置接近。 SG模型的训练更为复杂，因为它需要处理每个单词的上下文窗口。对于每个目标单词，SG会随机抽取一定数量的上下文单词作为负样本，与目标单词一起参与训练。负采样是优化策略的一种，可以有效减少计算量，提高训练效率。层次Softmax是另一种优化技术，它通过二叉树结构来替代全连接的softmax层，大大降低了计算复杂性。对于每个目标单词，只需沿着二叉树路径计算概率，而不是对所有词汇计算。在理解word2vec的过程中，数学推导和直观解释同样重要。梯度方程的直观解释可以帮助我们理解模型如何根据数据调整词向量，而数学推导则确保了模型的正确性和可优化性。 word2vec模型是基于神经网络的词表示方法，它的成功在于能捕捉到词汇之间的语义关系，并在诸如词类标注、句法分析、情感分析等任务中展现出强大的能力。通过深入学习word2vec的参数学习和优化策略，我们可以更好地理解和利用这一工具，为NLP领域的研究和应用带来更多的可能性。

This is equivalent to the tensor product of x and EH, i.e.,

∂E

∂W

= x ⊗ EH = xEH

(15)

from which we obtain a V × N matrix. Since only one component of x is non-zero, only

one row of

∂E

∂W

is non-zero, and the value of that row is EH

, an N -dim vector. We obtain

the update equation of W as

(new)

= v

(old)

− ηEH

(16)

where v

is a row of W, the “input vector” of the only context word, and is the only row

of W whose derivative is non-zero. All the other rows of W will remain unchanged after

this iteration, because their derivatives are zero.

Intuitively, since vector EH is the sum of output vectors of all words in vocabulary

weighted by their prediction error e

= y

− t

, we can understand (16) as adding a portion

of every output vector in vocabulary to the input vector of the context word. If, in the

output layer, the probability of a word w

being the output word is overestimated (y

> t

then the input vector of the context word w

will tend to move farther away from the output

vector of w

; conversely if the probability of w

being the output word is underestimated

< t

), then the input vector w

will tend to move closer to the output vector of w

;

if the probability of w

is fairly accurately predicted, then it will have little eﬀect on the

movement of the input vector of w

. The movement of the input vector of w

is determined

by the prediction error of all vectors in the vocabulary; the larger the prediction error, the

more signiﬁcant eﬀects a word will exert on the movement on the input vector of the

context word.

As we iteratively update the model parameters by going through context-target word

pairs generated from a training corpus, the eﬀects on the vectors will accumulate. We

can imagine that the output vector of a word w is “dragged” back-and-forth by the input

vectors of w’s co-occurring neighbors, as if there are physical strings between the vector

of w and the vectors of its neighbors. Similarly, an input vector can also be considered as

being dragged by many output vectors. This interpretation can remind us of gravity, or

force-directed graph layout. The equilibrium length of each imaginary string is related to

the strength of cooccurrence between the associated pair of words, as well as the learning

rate. After many iterations, the relative positions of the input and output vectors will

eventually stabilize.

1.2 Multi-word context

Figure 2 shows the CBOW model with a multi-word context setting. When computing

the hidden layer output, instead of directly copying the input vector of the input context

word, the CBOW model takes the average of the vectors of the input context words, and

剩余20页未读，继续阅读

kimnoic

粉丝: 3

word2vec参数学习详解

w2vtools词嵌入工具集：深度学习模型测试必备

C++实现的无词分割版Word2vec：w2v-sembei详细使用指南

W2V_TextRank: Java实现的Word2Vec改进TextRank算法文本摘要工具

w2v_reco_hyperparameters_matter

w2v-sembei:Word2vec的无词分割版本的C ++实现

论文研究 - 具有最小匹配能量的具有完美匹配的单圈图的排序

论文研究-一种基于TextRank的单文本关键字提取算法.pdf

{[Cu2(2, 2-bipy)2(4, 4-bipy)]2[Cu(4, 4-bipy)][PW10(Ⅵ)W2(Ⅴ)O40]}(bipy=联吡啶)的合成与表征 (2008年)

频率调控Matlab代码-neology:论文代码“新词生于何处：新词及其语义邻域的分布语义分析”（SCiL2020）

论文集：探索文件距离与词嵌入技术

最新资源