Python +
AI
Python + AI
🧠 3/11: LLMs
↖️ 3/13: Vector
embeddings
🔍 3/18: RAG
3/20: Vision models
3/25: Structured outputs
3/27: Quality
Register & Safety
aka.ms/PythonAI/serie
Python + AI
↖️Vector embeddings
Pamela Fox
Python Cloud Advocate
www.pamelafox.org
Today we'll cover...
• What are vector embeddings?
• Vector similarity space
• Vector search
• Vector distance metrics
• Vector quantization
• Dimension reduction
Vector embeddings 101
Want to follow along?
1. Open this GitHub repository:
https://github.com/pamelafox/vector-embeddings-demo
s
2. Use "Code" button to create a GitHub Codespace:
3. Wait a few minutes for Codespace to start up
Vector embeddings
An embedding encodes an input as a list of floating-point numbers.
"dog" → [0.017198, -0.007493, -0.057982,…]
Different embedding models output different embeddings, with varying
lengths.
Vector
Embedding model Encodes MTEB Avg.
length
word2vec words 300
text (up to ~400
Sbert (Sentence-Transformers) 768
words)
text (up to 8191
OpenAI text-embedding-ada-00 1536 61.0%
2 tokens)
text (up to 8191
OpenAI text-embedding-3-small 256 - 1536 62.3%
tokens)
MTEB: text (up to 8191
OpenAIhttps://huggingface.co/spaces/mteb/leaderb
text-embedding-3-large 256 - 3072 64.6%
oard tokens)
Generating an embedding with
OpenAI SDK
Use the OpenAI SDK with OpenAI.com, Azure, Ollama, or
GitHub Models:
openai_client = openai.OpenAI(
base_url="https://models.inference.ai.azure.com",
api_key=os.environ["GITHUB_TOKEN"]
)
Generate embeddings for single or multiple inputs:
embeddings_response = openai_client.embeddings.create(
model ="text-embedding-3-small",
dimensions=1536,
input="hello world"
)
print(embeddings_response.data[0].embedding)
Notebook: generate_embedding.ipynb
Vector embeddings vary across
models
"queen "queen "queen
" " "
word2vec-google-news-300 text-embedding-ada-002 text-embedding-3-small
300 dimensions 1536 dimensions 1536 dimensions
[0.0052490234375, [-0.00449855113402009, [0.04379640519618988,
-0.1435546875, -0.006737332791090012, -0.03982372209429741,
-0.0693359375,...] - 0.044741131365299225, ...]
0.002418933203443885, ...]
Notebook: comparison.ipynb
Vector similarity
We compute embeddings so that we can calculate similarity between inputs.
The most common distance measurement is cosine similarity.
def cosine_similarity(v1, v2):
dot_product = sum(
[a * b for a, b in zip(v1,
v2)])
magnitude = (
sum([a**2 for a in v1]) *
sum([a**2 for a in v2])) ** 0.5
return dot_product / magnitude
Notebook: similarity.ipynb
Similarity space varies across models
text-embedding-ada-002 text-embedding-3-small (1536)
cosin cosin
word word
e e
1.000 1.000
dog dog
0 0
anima 0.885 anima 0.661
l 5 l 9
0.866 0.650
god cat
0 2
0.863 0.618
cat car
5 5
0.856 0.592
fish horse
6 7
0.855 0.573
bird boat
5 7
0.853 0.565
Similarity values range across models
Cosine similarity of "dog" to 1000 other words across two models.
text-embedding-ada-002 text-embedding-3-small (1536)
Business uses for vector similarity
Recommendation system:
https://learn.microsoft.com/azure/postgresql/flexible-server/generative-ai-recommendation-system
Fraud detection:
https://www.redpanda.com/blog/fraud-detection-pipeline-redpanda-
pinecone
Vector search
Vector search
1 Compute the embedding vector for the query
2 Find K closest vectors for the query vector
• Search exhaustively or using approximations
Query Query vector K closest vectors
Compute Search
embedding vector existing vectors
[-0.003335318, - [[“snake”, [-0.122, ..],
“tortoise” 0.0176891904,…] Search [“frog”, [-0.045, ..]]]
OpenAI
create embedding existing vectors
Exhaustive vector search in Python
An exhaustive search checks every single vector for the closest
one.
def exhaustive_search(query_vector, vectors):
similarities = []
for title, vector in vectors.items():
similarity = cosine_similarity(query_vector,
vector)
similarities.append((title, similarity))
similarities.sort(key=lambda x: x[1], reverse=True)
return similarities
Notebook: search.ipynb
ANN (Approximate Nearest Neighbor)
search
There are multiple ANN search algorithms that can speed up
search time.
Algorithm Python package Example database support
HNSW hnswlib PostgreSQL pgvector extension
Azure AI Search
Chromadb
Weaviate
DiskANN diskannpy Cosmos DB
IVFFlat faiss PostgreSQL pgvector extension
Faiss faiss None, in-memory index only
HNSW: Hierarchical Navegable Small
Worlds
The HNSW algorithm is great for situations where your index may
be frequently updated, and scales logarithmically even with large
indexes.
import hnswlib
p = hnswlib.Index(space='cosine', dim=1536)
p.init_index(
max_elements=len(movies),
ef_construction=200,
M=16)
vectors = list(movies.values())
ids = list([i for i in range(len(vectors))])
p.add_items(vectors, ids)
p.set_ef(50)
From HNSW research paper:
https://github.com/nmslib/hnswlib
Business use: Retrieval Augmented
Generation
Vector search can greatly improve the retrieval step in RAG.
Azure OpenAI +
Azure AI Search +
Azure AI Vision +
Azure App Service +
Code:
aka.ms/ragchat
Demo:
aka.ms/ragchat/demo
Join upcoming stream on RAG on 3/18! aka.ms/PythonAI/series
Vector distance metrics
Common distance metrics
Four common distance metrics between two vectors are:
1. Euclidean distance
2. Manhattan distance
3. Inner product
4. Cosine distance
The metric that we pick may depend on whether the vectors are unit
vectors.
Notebook: distance_metrics.ipynb
Unit vectors
A unit vector is a vector with a magnitude of 1.
def magnitude(vector):
return sum([a**2 for a in vector]) ** 0.5
Two vectors with same magnitude After normalization,
of 3.7416573867739413: two vectors with same magnitude
of 1:
[1, 2, 3] [0.26726124 0.53452248 0.80178373]
[3, 1, 2] [0.80178373 0.26726124 0.53452248]
Euclidean distance
The straight-line distance between two points in Euclidean space.
def euclidean(v1, v2):
return magnitude(v1 - v2)
euclidean(
[0.26726124 0.53452248
0.80178373],
[0.80178373 0.26726124
0.53452248]
)
0.65
5
Manhattan distance
The "taxicab" distance between two points in Euclidean space.
def manhattan(v1, v2):
return sum(abs(a - b)
for a, b in zip(v1, v2))
manhattan(
[0.26726124 0.53452248
0.80178373],
[0.80178373 0.26726124
0.53452248]
)
1.07
Dot product
The sum of products of corresponding vector elements.
def dot_product(v1, v2):
return sum(a * b
for a, b in zip(v1, v2))
x1y1 + x2y2 + x3y3
dot_product(
[0.26726124 0.53452248
0.80178373],
[0.80178373 0.26726124
0.53452248]
)
0.78
6
Cosine distance
The complement of the cosine of the angle between two vectors in
Euclidean space.
def cosine_similarity(v1, v2):
return dot_product(v1, v2) /
(magnitude(v1) * magnitude(v2))
def cosine_distance(v1, v2):
return 1 - cosine_similarity(v1,
v2)
cosine_distance(
[0.26726124 0.53452248
0.80178373],
[0.80178373 0.26726124
0.53452248]
)
0.21
Cosine similarity vs. Dot product
For unit vectors, the cosine similarity is the same as the dot
product.
>> cosine_similarity(v1, v2) == dot_product(v1, v2)
True
>>> 1 - cosine_distance(v1, v2) == dot_product(v1, v2)
True
In some vector databases, the dot product operator will be slightly faster
than cosine distance operators, since it does not need to calculate the
magnitude.
If your embeddings are unit vectors, consider using dot product as the metric.
OpenAI embedding models currently all output unit vectors!
Vector quantization
Vector quantization
Most vector embeddings are stored as floating point numbers (64-bit in
Python). We can use quantization to reduce the size of the embeddings.
Scalar quantization: Reduce each number to an integer
[0.03265173360705376, [53, 40, 20, ...]
0.01370371412485838,
-0.017748944461345673,...]
Binary quantization: Reduce each number to a single bit
[0.03265173360705376, [1, 1, 0, ...]
0.01370371412485838,
-0.017748944461345673,...]
Notebook: quantization.ipynb
Scalar quantization: The process
float3 int8
2
[0.03265173360705376, 0.01370371412485838, ...] [53, 40,...]
[-0.00786194484680891, - [27, 19, ...]
0.018985141068696976, ...] [29, 44, ...]
[-0.0039056178648024797,
0.019039113074541092, ...]
1. Calculate the min/max of all the embeddings
2. Normalize each embedding's values to [0, 1] range
3. Map normalized values into integer buckets from -128 to
+127
Min float Max float
~Min ~Max
observed observed
value value
-128 127
Scalar quantization: Before & after
"Moan
float3 a" int8
quantizati
2
[0.03265173360705376, [53, 40, 20, ...]
on
0.01370371412485838,
-0.017748944461345673,...]
Scalar quantization: Affects on
similarity
float3 int8
2
[0.03265173360705, 0.013703...] [53, 40,...]
[-0.00786194484680891, -0.0189...] [27, 19, ...]
[-0.0039056178648024797, [29, 44, ...]
0.0190...]
movie similarity movie similarity
Moana 1.000000 Moana 1.000000
Mulan 0.546800 ✅ Mulan 0.903532
Lilo & Stitch 0.502114 The Little Mermaid 0.894227
The Little Mermaid 0.498209 Lilo & Stitch 0.893718
Big Hero 6 0.491800 ✅ Big Hero 6 0.890959
Monsters University 0.484857 Monsters University 0.890915
✅
The Princess and the Frog 0.471984 ✅ The Princess and the Frog 0.889009
Finding Dory 0.471386 ✅ Finding Dory 0.888350
Maleficent 0.461029 Ice Princess 0.885539
Ice Princess 0.457817 Maleficent 0.885364
Binary quantization: The process
float3 bit
2
[0.03265173360705376, 0.01370371412485838, ...] [1, 1,...]
[-0.00786194484680891, - [0, 0, ...]
0.018985141068696976, ...] [0, 1, ...]
[-0.0039056178648024797,
0.019039113074541092, ...]
1. Pick a center C based on average, sample, or offline
knowledge
2. If value is >= C, map to 1, otherwise map to 0
0 C 1
Binary quantization: Before & after
"Moan
float3 a" bit
quantizati
2
[0.03265173360705376, [1, 1, 0, ...]
on
0.01370371412485838,
-0.017748944461345673,...]
Binary quantization: Affects on
similarity
float3 bit
2
[0.03265173360705, 0.013703...] [1, 1,...]
[-0.00786194484680891, -0.0189...] [0, 0, ...]
[-0.0039056178648024797, [0, 1, ...]
0.0190...]
movie similarity movie similarity
Moana 1.000000 Moana 1.000000
Mulan 0.546800 ✅ Mulan 0.686634
Lilo & Stitch 0.502114 The Little Mermaid 0.666260
The Little Mermaid 0.498209 The Princess and the Frog 0.659825
Big Hero 6 0.491800 Lilo & Stitch 0.657599
Monsters University 0.484857 ❌ Big Hero 6 0.655869
The Princess and the Frog 0.471984 Ice Princess 0.648046
Finding Dory 0.471386 ✅ Finding Dory 0.643830
Maleficent 0.461029 The Lion King 0.643088
Ice Princess 0.457817 Maleficent 0.642270
Quantization: effects on storage size
float3 int8 bit
2
[0.03265173360705,...] [53, 40,...] [1, 1,...]
[- [27, [0,
0.00786194484680891,...] 19, ...] 0, ...]
[- [29, [0,
0.00390561786480247,...] 44, ...] 1, ...]
Python built-
in number 12728 12728 12728
type
numpy typed
arrays 12400 1648 1648
Databases with vector storage support can often save more space
with bits,
using techniques such as bit packing.
Quantization: effects on index size in AI
Search
Azure AI Search supports quantization as a way to reduce vector storage
space needed.
float3 int8 bit
2
[0.03265173360705,...] [53, 40,...] [1, 1,...]
[- [27, [0,
0.00786194484680891,...] 19, ...] 0, ...]
[- [29, [0,
0.00390561786480247,...] 44, ...] 1, ...]
Vector
index size 1177.12 298.519 41.8636
(MB)
74.64% reduction! 96.44% reduction!
AI Search has two storage locations for vectors: the HNSW index used for
searching, and the actual data storage. The stats above are for index size.
Learn more in RAG time https://aka.ms/rag-time/journey3
series:
MRL dimension reduction
MRL: Matryoshka Representation
Learning
MRL is a technique that lets you reduce the dimensions of a
vector,
while still retaining much of the original semantic
The OpenAI text-embedding-3-large
representation. 3072
...
model
has default dimensions of 3072, 102 ...
4
but can be truncated all the way to
256. 512 ...
⚠️Only some models support MRL!
You can truncate either:
• when first generating embeddings 256 ...
• or when storing in database (if
supported)
Dimension reduction with OpenAI SDK
Specify dimensions when generating an embedding:
embeddings_response = openai_client.embeddings.create(
model ="text-embedding-3-small",
input="hello world",
dimensions=256
)
print(embeddings_response.data[0].embedding)
Notebook: dimension_reduction.ipynb
Dimension reduction: Before & after
"Moan "Moan
a" a"
dimensions=153 dimensions=256
6
[0.03265173360705376, [0.06316128373146057,
0.01370371412485838, 0.02650836855173111,
-0.017748944461345673,...] -0.03433343395590782,...]
Dimension reduction: Affects on
similarity
dimensions=1536 dimensions=256
[0.03265173360705376, [0.03265173360705376,
0.01370371412485838, 0.01370371412485838,
-0.017748944461345673,...] -0.017748944461345673,...]
movie similarity movie similarity
Moana 1.000000 Moana 1.000000
Mulan 0.546800 The Little Mermaid 0.587367
Lilo & Stitch 0.502114 Mulan 0.583428
The Little Mermaid 0.498209 Lilo & Stitch 0.575990
Big Hero 6 0.491800 ✅ Big Hero 6 0.574590
Monsters University 0.484857 The Princess and the Frog 0.568726
❌
The Princess and the Frog 0.471984 Finding Dory 0.549391
Finding Dory 0.471386 The Lion King 0.521125
Maleficent 0.461029 Tangled 0.513131
Ice Princess 0.457817 Maleficent 0.511412
❌
Dimension reduction plus
quantization
For maximum vector compression, combine both techniques!
...
1 MRL
Dimension To keep high accuracy,
Reduction only compress vectors in
... index,
oversample when retrieving,
2 Scalar or Binary and rescore using originals.
Quantization
. That's how Azure AI Search
can handle billions of vectors
Learn more in RAG time https://aka.ms/rag-time/journey3
Dive even deeper into vector
embeddings!
Vector embeddings 101 Quantization:
• Embedding projector • Scalar quantization 101
• Why are Cosine Similarities of Text embe • Product quantization 101
ddings almost always positive? • Binary and scalar quantiz
• Expected Angular Differences in Embeddi ation
ng Random Text?
• Embeddings: What they are and why the
y matter MRL dimension reduction:
• Unboxing Nomic Embed v
ANN algorithms 1.5: Resizable Production
• HNSW tutorial Embeddings with MRL
• Video: HNSW for Vector Search Explained • MRL from the Ground Up
Distance metrics:
• Two Forms of the Dot Product
• Is Cosine-Similarity of Embeddings Really
About Similarity?
Next steps 🧠 3/11: LLMs
Join upcoming streams! →
↖️ 3/13: Vector
embeddings
Come to office hours on
Thursdays in Discord: 🔍 3/18: RAG
aka.ms/pythonai/oh
3/20: Vision models
Get more Python AI 3/25: Structured outputs
resources 3/27: Quality & Safety
aka.ms/thesource/Python_A Register @ aka.ms/PythonAI/series
I
Thank you!