Module 1 Word Embeddings
Module 1 · Foundations

Word Embeddings: GPS for Words

Learn how token IDs are transformed into rich meaning-filled vectors that capture semantic relationships.

⏱️ 10 min 🎯 Lesson 3 of 11
Slide 1 of 6

Token IDs Have No Meaning

After tokenization, we have numbers like 1024, 9604, 318. But these are just arbitrary IDs — like house numbers on a street.

Problem: Token ID 1024 ("king") and 1025 ("queen") are just 1 apart numerically, but they have very different meanings. And "king" and "emperor" have similar meanings but very different IDs.
Solution: Transform each token ID into a dense vector of floating-point numbers that captures meaning. This is an embedding.
Slide 2 of 6

What is an Embedding?

An embedding is a list of numbers (a vector) that represents a token in a high-dimensional space. Words with similar meanings end up close together.

Each word → a vector (simplified to 4D here):

king = [0.95, 0.22, 0.87, 0.11]
queen = [0.91, 0.78, 0.85, 0.13]
man = [0.93, 0.21, 0.10, 0.08]
woman = [0.89, 0.77, 0.08, 0.10]
pizza = [0.10, 0.05, 0.02, 0.91]
Observation

king & queen: similar 1st and 3rd dims 👑

man & woman: similar 1st dim, different 2nd ♀️

pizza: completely different pattern 🍕

Real embeddings have 512, 768, or even 12,288 dimensions — not just 4!

Slide 3 of 6

The GPS Analogy

🗺️

Words as locations on a map

Just like GPS coordinates tell you exactly where something is on Earth, an embedding tells you where a word "is" in meaning-space. Cities near each other on a map tend to be similar — words near each other in embedding space tend to have similar meanings.

In 2D, it might look like:

👑 Royalty cluster: king, queen, prince, princess — all close together
🐾 Animals cluster: dog, cat, wolf, lion — grouped nearby
💻 Tech cluster: computer, laptop, phone — positioned together
🍕 Food cluster: pizza, burger, sushi — in their own region
Slide 4 of 6

The Famous King − Man + Woman = Queen

One of the most mind-blowing properties of embeddings is that mathematical operations work semantically:

king − man + woman ≈ queen

This means the embedding space has captured the concept of "gender" as a direction in the space! Subtracting "man-ness" and adding "woman-ness" moves you from king to queen.

1
Start at: king = [0.95, 0.22, 0.87]
2
Subtract: man = [0.93, 0.21, 0.10] (removing male concept)
3
Add: woman = [0.89, 0.77, 0.08] (adding female concept)
4
Result ≈ queen = [0.91, 0.78, 0.85]
Slide 5 of 6

How Embeddings are Learned

Embeddings aren't programmed — they're learned during training by the neural network itself.

1
Start: Every token gets a random vector of numbers
2
Training: The model tries to predict the next word millions of times
3
Learning: Vectors that appear in similar contexts get nudged closer together
4
Result: Similar words end up with similar vectors automatically!
💡 "You shall know a word by the company it keeps" — J.R. Firth, 1957. Words that appear in similar contexts have similar meanings. Embeddings capture exactly this!
Slide 6 of 6

Interactive: Embedding Space

Hover over the words to explore their positions. Notice how similar words cluster together:

This is a 2D projection of high-dimensional embeddings. Hover a dot to highlight it.

1 / 6
🗺️

The Embedding Table

Inside every LLM there's a giant table — one row per token in the vocabulary, one column per embedding dimension. This table (called the embedding matrix) is learned during training. When you input a token, the model just looks up its row.

Key Concepts

📍
Embedding
A dense vector of numbers representing a token's meaning in high-dimensional space.
📐
Embedding Dimension
The length of the vector. GPT-3 uses 12,288 dimensions.
🧮
Semantic Similarity
Similar meaning → similar vectors. Measured by cosine similarity.
📊
Embedding Matrix
A table of size [vocab_size × embedding_dim] learned during training.

Quick Check

What is the purpose of word embeddings?

A
To compress text to save memory
B
To represent tokens as vectors where similar meanings are geometrically close
C
To translate text between languages
D
To remove punctuation from text