总体结构:
Skip-gram模型的目标函数是最大化:
对于Skip-gram,更大的context window 可以生成更多的训练样本,获得更精确的表达,但训练时间更长。
Trick:
1).Hierarchical Softmax
The main advantage is that instead of evaluating W output nodes in the neural network to obtain the probability distribution, it is needed to evaluate only about log2(W) nodes.
简而言之,构造了一颗二叉树,减少运算量
2).Negative Sampling
Sorry, I can't understand
3).Subsampling of Frequent Words
以概率:
抛弃单词,其中f是词频,t是阈值,通常为10^-5。