Module 1 · Foundations
Word Embeddings: GPS for Words
Learn how token IDs are transformed into rich meaning-filled vectors that capture semantic relationships.
The Embedding Table
Inside every LLM there's a giant table — one row per token in the vocabulary, one column per embedding dimension. This table (called the embedding matrix) is learned during training. When you input a token, the model just looks up its row.
Key Concepts
Embedding
A dense vector of numbers representing a token's meaning in high-dimensional space.
Embedding Dimension
The length of the vector. GPT-3 uses 12,288 dimensions.
Semantic Similarity
Similar meaning → similar vectors. Measured by cosine similarity.
Embedding Matrix
A table of size [vocab_size × embedding_dim] learned during training.
Quick Check
What is the purpose of word embeddings?