Word Embedding refers to techniques that map discrete words or tokens into continuous vector spaces of typically 50–1000 dimensions, capturing semantic and syntactic relationships based on contextual usage in large corpora 维基百科. Models like Word2Vec use shallow neural networks trained with objectives such as Continuous Bag-of-Words (CBOW) or Skip-Gram to learn embeddings where semantically similar words lie close together in the vector space. More advanced contextual embeddings, such as ELMo and BERT, generate dynamic representations by conditioning on entire sentences, improving performance in downstream tasks by providing word sense disambiguation 维基百科. Word embeddings have become foundational inputs for virtually all modern NLP systems, enabling efficient representation learning and transfer across tasks.