Module 1 · Foundations

Word Embeddings: GPS for Words

Learn how token IDs are transformed into rich meaning-filled vectors that capture semantic relationships.

⏱️ 10 min 🎯 Lesson 3 of 11

Slide 1 of 6

Token IDs Have No Meaning

After tokenization, we have numbers like 1024, 9604, 318. But these are just arbitrary IDs — like house numbers on a street.

                ❌ Problem: Token ID 1024 ("king") and 1025 ("queen") are just 1 apart numerically, but they have very different meanings. And "king" and "emperor" have similar meanings but very different IDs.
              

                ✅ Solution: Transform each token ID into a dense vector of floating-point numbers that captures meaning. This is an embedding.
              

Slide 2 of 6

What is an Embedding?

An embedding is a list of numbers (a vector) that represents a token in a high-dimensional space. Words with similar meanings end up close together.

Each word → a vector (simplified to 4D here):

king = [0.95, 0.22, 0.87, 0.11]

queen = [0.91, 0.78, 0.85, 0.13]

man = [0.93, 0.21, 0.10, 0.08]

woman = [0.89, 0.77, 0.08, 0.10]

pizza = [0.10, 0.05, 0.02, 0.91]

Observation

king & queen: similar 1st and 3rd dims 👑

man & woman: similar 1st dim, different 2nd ♀️

pizza: completely different pattern 🍕

Real embeddings have 512, 768, or even 12,288 dimensions — not just 4!

Slide 3 of 6

The GPS Analogy

🗺️

Words as locations on a map

Just like GPS coordinates tell you exactly where something is on Earth, an embedding tells you where a word "is" in meaning-space. Cities near each other on a map tend to be similar — words near each other in embedding space tend to have similar meanings.

In 2D, it might look like:

👑 Royalty cluster: king, queen, prince, princess — all close together

🐾 Animals cluster: dog, cat, wolf, lion — grouped nearby

💻 Tech cluster: computer, laptop, phone — positioned together

🍕 Food cluster: pizza, burger, sushi — in their own region

Slide 4 of 6

The Famous King − Man + Woman = Queen

One of the most mind-blowing properties of embeddings is that mathematical operations work semantically:

king − man + woman ≈ queen

This means the embedding space has captured the concept of "gender" as a direction in the space! Subtracting "man-ness" and adding "woman-ness" moves you from king to queen.

Start at: king = [0.95, 0.22, 0.87]

Subtract: man = [0.93, 0.21, 0.10] (removing male concept)

Add: woman = [0.89, 0.77, 0.08] (adding female concept)

Result ≈ queen = [0.91, 0.78, 0.85]

Slide 5 of 6

How Embeddings are Learned

Embeddings aren't programmed — they're learned during training by the neural network itself.

Start: Every token gets a random vector of numbers

Training: The model tries to predict the next word millions of times

Learning: Vectors that appear in similar contexts get nudged closer together

Result: Similar words end up with similar vectors automatically!

                💡 "You shall know a word by the company it keeps" — J.R. Firth, 1957. Words that appear in similar contexts have similar meanings. Embeddings capture exactly this!
              

Slide 6 of 6

Interactive: Embedding Space

Hover over the words to explore their positions. Notice how similar words cluster together:

This is a 2D projection of high-dimensional embeddings. Hover a dot to highlight it.

1 / 6

🗺️

The Embedding Table

Inside every LLM there's a giant table — one row per token in the vocabulary, one column per embedding dimension. This table (called the embedding matrix) is learned during training. When you input a token, the model just looks up its row.

Key Concepts

📍

Embedding

A dense vector of numbers representing a token's meaning in high-dimensional space.

📐

Embedding Dimension

The length of the vector. GPT-3 uses 12,288 dimensions.

🧮

Semantic Similarity

Similar meaning → similar vectors. Measured by cosine similarity.

📊

Embedding Matrix

A table of size [vocab_size × embedding_dim] learned during training.

Quick Check

What is the purpose of word embeddings?

To compress text to save memory

To represent tokens as vectors where similar meanings are geometrically close

To translate text between languages

To remove punctuation from text