Efficient Estimation of Word Representations in Vector Space
arXiv: 1301.3781
TLDR(中文)
Word2Vec 提出了词向量(词嵌入)的概念:通过在大规模文本上训练神经网络,让语义相近的词 在向量空间中距离相近。"king - man + woman ≈ queen"的类比关系让世人看到了词嵌入的威力, 为后来所有语言模型的嵌入层奠定了基础。
TLDR (English)
Word2Vec introduced the concept of word embeddings: training neural networks on large text corpora so semantically similar words cluster in vector space. The famous "king - man + woman ≈ queen" analogy demonstrated its power, laying the foundation for embedding layers in all subsequent language models.