Vector Databases

Databases designed to store, index, and query high-dimensional vectors (embeddings) so you can search by meaning rather than by exact match.

What is it?

A traditional database answers the question: “does this row match these exact criteria?” A vector database answers a fundamentally different question: “which records are most similar to this?”¹ This distinction matters because a growing class of data — documents, images, audio, user behaviour — cannot be searched effectively by exact match. The right query is not “find this,” but “find what is close to this.”

Vector databases store embeddings — lists of numbers that encode the meaning of content — alongside optional metadata. When you search a vector database, you provide a query vector (the embedding of your question or input), and the database returns the stored vectors that are closest to it in high-dimensional space. “Close” means “similar in meaning.” A search for “how do I recover my account” would return results about “password reset” even though the two phrases share no words, because their embeddings sit near each other in vector space.¹

The parent concept, databases, covers how data is stored and queried for traditional use cases — exact lookups, filtering, aggregation. Vector databases extend this idea to a domain where exact lookup is the wrong tool. Where a relational database uses a B-tree index to find rows matching a condition, a vector database uses specialised index structures to find vectors that are geometrically close to a query vector.² The core trade-off is precision: relational databases give you exact answers; vector databases give you approximate answers ranked by similarity, which is exactly what you want when dealing with meaning.

In plain terms

A vector database is like a librarian who understands what you mean, not just what you say. If you ask a regular librarian for “books about fixing broken glass,” they search the card catalogue for those exact words. A vector-database librarian understands you probably want the book titled “Window Repair Guide” even though it does not contain the word “broken” — because it is about the same thing.

At a glance

How vector search works (click to expand)
graph LR
    Q[Query Text] --> EM[Embedding Model]
    EM --> QV[Query Vector]
    QV --> IDX{ANN Index}
    IDX --> R1[Result 1 - similarity 0.94]
    IDX --> R2[Result 2 - similarity 0.89]
    IDX --> R3[Result 3 - similarity 0.85]
    D[Documents] --> EM2[Embedding Model]
    EM2 --> SV[Stored Vectors + Metadata]
    SV --> IDX
Key: Documents are converted to vectors by an embedding model and stored in the database. At query time, the query is also converted to a vector, and the ANN (approximate nearest neighbour) index finds the stored vectors closest to it. Results are ranked by similarity score.

How does it work?

1. Storing vectors with metadata

Every record in a vector database has three parts: the vector itself (the embedding), a unique identifier, and optional metadata (structured fields like author, date, category, or source URL).¹

For example, if you embed a collection of support articles, each record might look like this:

Part	Example
ID	`article-0042`
Vector	`[0.12, -0.33, 0.91, ... ]` (1,536 dimensions)
Metadata	`{ "category": "billing", "language": "en", "updated": "2025-11-01" }`

The metadata is important because it enables filtered search — combining semantic similarity with traditional attribute matching.

Think of it like...

A music streaming service. Each song has an audio fingerprint (the vector) that captures its “mood,” plus metadata tags like genre, tempo, and release year. When you say “play something like this,” the system matches on the fingerprint. When you add “but only jazz from the 2020s,” it uses the metadata to filter.

2. Similarity search vs exact match

Relational databases find rows that match a condition exactly: WHERE country = 'CH'. Vector databases find records that are nearest to a query vector in high-dimensional space.² The distance between two vectors is measured using metrics like:

Cosine similarity — measures the angle between two vectors (the most common for text embeddings)
Euclidean distance — measures straight-line distance between two points
Dot product — fast and effective when vectors are normalised

The result is a ranked list of records ordered by similarity, not a binary match/no-match answer.

Key distinction

Relational databases answer “does this exist?” with certainty. Vector databases answer “what is most similar?” with a ranked score. Neither is better — they solve different problems.

3. Approximate nearest neighbour (ANN) search

Finding the exact nearest neighbours to a query vector requires comparing the query against every stored vector — this is computationally impractical at scale. If you have 10 million vectors with 1,536 dimensions each, an exhaustive search means billions of floating-point operations per query.³

Approximate nearest neighbour (ANN) algorithms solve this by trading a tiny amount of accuracy for massive gains in speed. Instead of comparing against every vector, they use clever indexing structures that narrow the search to a small subset of candidates. The result is typically 95-99% as accurate as an exhaustive search, at a fraction of the computation cost.³

The word “approximate” often worries newcomers, but in practice the trade-off is favourable: the results are nearly identical to a brute-force search, and the queries that would take seconds now take milliseconds.

Think of it like...

Looking up a word in a dictionary. You do not read every page from cover to cover. You jump to the right section based on the first letter, then narrow down. You might occasionally land one page off, but you find what you need almost instantly. ANN algorithms work the same way — they use structure to skip most of the data and zoom in on the relevant region.

4. Common index types — HNSW and IVF

Two indexing strategies dominate production vector databases:³

HNSW (Hierarchical Navigable Small World) builds a multi-layered graph where each vector is a node connected to its nearest neighbours. The top layers are sparse, allowing fast long-range jumps across the dataset. The bottom layers are dense, enabling precise local search. At query time, the algorithm starts at the top, hops through the graph toward the target region, then drills down for accuracy. HNSW delivers excellent recall and fast queries but uses significant memory because the entire graph must be held in RAM.³

IVF (Inverted File Index) divides the vector space into clusters (using a technique like k-means). Each cluster groups similar vectors together. At query time, the algorithm identifies the nearest clusters and searches only within those clusters, skipping the rest. IVF uses less memory than HNSW and works well when combined with compression techniques, but it requires a training step to build the clusters.³

Index type	Strengths	Trade-offs
HNSW	Fast queries, high recall	Memory-intensive
IVF	Lower memory, good with compression	Requires cluster training, slightly lower recall

Example: choosing an index strategy (click to expand)

Consider a RAG system for a knowledge base with 500,000 documents:

HNSW would be the default choice if the server has enough RAM (roughly 3-6 GB for 500K vectors at 1,536 dimensions). Queries would return in single-digit milliseconds with 97%+ recall.

IVF would be preferred if memory is constrained or the dataset is expected to grow to tens of millions of vectors, especially when combined with product quantisation (PQ) to compress the vectors.

Most managed vector databases (Pinecone, Weaviate) handle this decision for you. If you are self-hosting with a library like Faiss, you choose and tune the index explicitly.

5. Hybrid search — combining vectors with metadata

Pure vector search returns the most semantically similar results globally. But in practice, you often want to combine meaning-based search with traditional filters: “find similar documents, but only from this user, created after this date.”³

Hybrid search combines dense vector similarity with structured attribute filtering. Some systems also combine dense vectors with sparse retrieval methods like BM25 (traditional keyword matching) to get the best of both worlds — semantic understanding and keyword precision.³

For example: a query for “GPT-5 release date” might semantically drift toward general AI topics. Adding keyword matching ensures the specific phrase is weighted, while semantic search captures conceptually related results.

Concept to explore

See rag for how hybrid search fits into retrieval-augmented generation, the dominant pattern for grounding LLM responses in real data.

6. The role in RAG architectures

Vector databases are the retrieval backbone of Retrieval-Augmented Generation. In a RAG pipeline:⁴

Documents are split into chunks and embedded into vectors
The vectors are stored in a vector database alongside the source text
When a user asks a question, the question is embedded into a query vector
The vector database returns the most relevant document chunks
Those chunks are fed to an LLM as context for generating the answer

Without vector databases, RAG cannot find semantically relevant documents at scale. They are what make it possible to ground LLM responses in real, specific data rather than relying solely on the model’s training knowledge.

Why do we use it?

Key reasons

1. Meaning-based retrieval. Vector databases let you search by what content means, not what words it contains. “Automobile repair” finds documents about “car maintenance” because the meanings are close, even though the words differ.¹

2. Scale for AI workloads. Modern AI applications — chatbots, recommendation engines, content search — need to compare a query against millions of vectors in real time. ANN indexing makes this possible with sub-100ms latency at production scale.³

3. Foundation for RAG. Retrieval-Augmented Generation depends on vector databases to find relevant context for LLM responses. Without them, RAG systems would need to fall back on keyword search, missing semantically relevant content.⁴

4. Multimodal flexibility. Vector databases store any kind of embedding — text, images, audio, code. The same database can power text search, image similarity, and cross-modal retrieval (searching images with text queries).²

When do we use it?

When building semantic search that finds results by meaning, not keywords
When implementing RAG to ground LLM responses in a knowledge base
When building recommendation systems (find products, articles, or content similar to what a user liked)
When you need to deduplicate or cluster unstructured content (similar support tickets, near-duplicate documents)
When building multimodal search (searching images by text description, or audio by mood)

Rule of thumb

If your users would describe what they want in natural language rather than typing exact keywords, a vector database is likely part of the solution.

How can I think about it?

The neighbourhood map

A vector database is like a city where every building is placed based on what it does, not its street address.

Every document gets coordinates on this map (its vector) based on its meaning

Restaurants cluster together in one district; hospitals cluster in another

Searching means dropping a pin on the map (“I need something like this”) and finding the nearest buildings

Approximate search means you check the buildings in your neighbourhood, not every building in the city — you might miss one on the edge, but you find the best matches quickly

Metadata filtering is like searching only within a specific district (“restaurants, but only Italian ones open after 8 PM”)

A relational database, by contrast, is like a phone book — perfect if you know the exact name, useless if you only know roughly what you are looking for

The wine sommelier

A vector database is like a sommelier who organises a wine cellar by flavour profile rather than alphabetically.

Each wine’s position in the cellar is determined by its taste characteristics — body, acidity, sweetness, tannins — encoded as a vector

Nearby wines taste similar: a Barolo and a Nebbiolo sit on adjacent shelves

Asking for a wine “like this one” means the sommelier walks to that shelf and pulls the neighbours — no need to describe the grape, region, or year

ANN search means the sommelier checks the most promising sections of the cellar first, rather than tasting every bottle

Hybrid search is when you add constraints: “like this one, but under 30 francs and from France” — combining flavour similarity with metadata

A regular database would be like organising the cellar alphabetically by label — useful if you know the name, but unhelpful for discovering wines you would enjoy

Concepts to explore next

Concept	What it covers	Status
rag	Using vector retrieval to ground LLM responses in real documents	stub
knowledge-graphs	Structured representations of relationships, complementary to vector similarity	stub
embeddings	The numerical meaning representations that vector databases store and query	complete

Some cards don't exist yet

A broken link is a placeholder for future learning, not an error.

Check your understanding

Test yourself (click to expand)

Explain the fundamental difference between how a relational database and a vector database answer a query. Why does this difference matter for AI applications?

Name the three parts of a record in a vector database and describe the role of each.

Distinguish between HNSW and IVF indexing. What are the key trade-offs, and when might you choose one over the other?

Interpret this scenario: a semantic search for “how to fix a broken window” returns a document titled “Glass Pane Replacement Guide” with a similarity score of 0.92, but misses a document titled “Window Settings in Operating Systems.” Why does this happen, and why is it desirable?

Connect vector databases to RAG. What specific role does the vector database play in a RAG pipeline, and what would happen if you replaced it with a traditional keyword search?

Where this concept fits

Position in the knowledge graph
graph TD
    CSM[Client-Server Model] --> DB[Databases]
    DB --> RDB[Relational Databases]
    DB --> DDB[Document Databases]
    DB --> VDB[Vector Databases]
    DB --> SQL[SQL]
    DB --> SCH[Schemas]
    MRF[Machine-Readable Formats] --> EMB[Embeddings]
    EMB -.->|prerequisite| VDB
    style VDB fill:#4a9ede,color:#fff
Related concepts:

rag — vector databases provide the retrieval layer that powers RAG, finding semantically relevant documents to feed to an LLM

knowledge-graphs — represent explicit relationships between entities, complementing the implicit similarity relationships captured by vector embeddings

machine-readable-formats — vector databases store embeddings, a specific type of machine-readable format optimised for meaning rather than structure

Explorer

Vector Databases

Vector Databases

What is it?

At a glance

How does it work?

1. Storing vectors with metadata

2. Similarity search vs exact match

3. Approximate nearest neighbour (ANN) search

4. Common index types — HNSW and IVF

5. Hybrid search — combining vectors with metadata

6. The role in RAG architectures

Why do we use it?

When do we use it?

How can I think about it?

Concepts to explore next

Check your understanding

Where this concept fits

Sources

Further reading

Graph View

Table of Contents

Backlinks

Explorer

Vector Databases

Vector Databases

What is it?

At a glance

How does it work?

1. Storing vectors with metadata

2. Similarity search vs exact match

3. Approximate nearest neighbour (ANN) search

4. Common index types — HNSW and IVF

5. Hybrid search — combining vectors with metadata

6. The role in RAG architectures

Why do we use it?

When do we use it?

How can I think about it?

Concepts to explore next

Check your understanding

Where this concept fits

Sources

Further reading

Footnotes

Graph View

Table of Contents

Backlinks