Merkle trees are one of those concepts that quietly power some of the most important technologies we use today—Git, blockchains, and peer-to-peer systems like IPFS. They provide a compact way to prove that a piece of data belongs to a larger dataset, without needing to reveal or send everything.
But Merkle trees are often introduced only in the context of massive distributed systems. What if you just want to play with them locally, on your laptop, using tools you already know? That's where SQLite comes in.
In this tutorial, we'll build a Merkle tree in Python and persist it to SQLite. By the end, you'll have a working project that can:
- Build a Merkle tree from arbitrary strings.
- Store all nodes—leaves and internal hashes—in a relational database.
- Query inclusion proofs for any leaf.
- Verify proofs against the stored root.
- Inspect and visualize the tree with SQL or helper scripts.
This combination gives you a verifiable, tamper-evident, and queryable data structure inside a single .db
file.
All the code is available on GitHub.
Why SQLite?
Before we dive into code, let's answer the obvious question: why store a Merkle tree in SQLite at all?
SQLite is a lightweight, embeddable database. You don't need a server, a cluster, or even an internet connection—just a file. That makes it ideal for:
Tamper-evident logs
Store logs or events as leaves, and use the Merkle root to prove no row was modified. If someone changes even one character in the database, the root hash won't match.Lightweight blockchain-like systems
You can experiment with blockchain concepts without spinning up heavy infrastructure. Exchange Merkle roots and proofs between peers instead of full tables.Efficient synchronization
Two devices with the same dataset can exchange just their root hashes. If they differ, you can reconcile branch by branch instead of copying everything.Verifiable queries
SQL gives you speed and structure; Merkle trees give you proofs. Together, you can run a query and provide a cryptographic guarantee that the results belong to a specific database state.Education and demos
SQLite makes the concept tangible. You can see every node in a table, inspect it with SQL, and visualize the tree.
So while it may seem unusual, storing Merkle trees in SQLite is a neat way to bridge cryptographic structures with the databases we already use.
How Merkle Trees Work
A Merkle tree is a binary tree where:
- Leaves contain the hash of the raw data.
- Internal nodes contain the hash of the concatenation of their children.
- The root hash commits to the entire dataset.
If you want to prove that “gamma” is part of the dataset, you don't need to show every other value. Instead, you provide its sibling hashes along the path to the root. The verifier recomputes and checks if it matches the stored root hash.
The Project Structure
Our Python project is minimal and dependency-free. The main files are:
├── main.py # Example usage ├── merkle_tree.py # Core MerkleTree class ├── print_proof.py # CLI to print and verify proofs ├── visualize_tree.py # CLI to render tree as ASCII or Graphviz ├── pyproject.toml # Metadata and scripts └── README.md
Everything uses only the Python standard library (hashlib
, sqlite3
, pathlib
).
The SQLite Schema
The heart of persistence is a single table:
CREATE TABLE IF NOT EXISTS merkle_nodes ( id INTEGER PRIMARY KEY, parent_id INTEGER, left_child_id INTEGER, right_child_id INTEGER, hash TEXT NOT NULL, level INTEGER NOT NULL, is_leaf BOOLEAN NOT NULL, data TEXT, FOREIGN KEY(parent_id) REFERENCES merkle_nodes(id), FOREIGN KEY(left_child_id) REFERENCES merkle_nodes(id), FOREIGN KEY(right_child_id) REFERENCES merkle_nodes(id) );
Each row is one node. Leaves carry original data
; internal nodes have only hash
. Parent and child links let you traverse the tree in either direction.
The MerkleTree
Class
The core logic lives in merkle_tree.py
. It's wrapped in a class that manages both hashing and database storage.
Initialization and Context Manager
with MerkleTree("merkle_tree.db") as tree: root_id = tree.build_tree(["alpha", "beta", "gamma"])
The class opens and closes its SQLite connection automatically. Using a with
block ensures resources are cleaned up.
Building the Tree
build_tree(data_blocks)
does three things:
- Hashes the leaves with SHA-256.
- Iteratively pairs them, hashing concatenated child hashes until a root is formed. If there's an odd number, the last hash is duplicated (Bitcoin-style).
- Inserts each node into the database and links parent-child relationships.
The method returns the root node's ID for convenience.
Generating Proofs
get_proof(leaf_id)
walks from a leaf up to the root, collecting sibling hashes along the way. Each step also records whether the current node was a left or right child. That orientation matters because the concatenation order of hashes changes the result.
- If the current node was the left child, the sibling is on the right →
(sibling_hash, True)
. - If the current node was the right child, the sibling is on the left →
(sibling_hash, False)
.
This way, the verifier knows whether to compute sha256(current + sibling)
or sha256(sibling + current)
.
Example proof path:
[ ("hash_of_right_sibling", True), # current node was left ("hash_of_left_sibling", False), # current node was right ("hash_of_right_sibling", True) # current node was left ]
During verification, you start from the leaf's hash, then fold in each sibling in the correct order until you recompute the root.
Verification
Finally, verify_proof(data_item, proof_path, expected_root)
recomputes the path from a leaf to the root.
It works like this:
- Start with the SHA-256 hash of the original data.
- For each
(sibling_hash, is_left_child)
in the proof path:
-
If
is_left_child == True
, the current node was the left child, so compute:
current = sha256((current + sibling_hash).encode())
-
If
is_left_child == False
, the current node was the right child, so compute:
current = sha256((sibling_hash + current).encode())
- When you reach the top, compare the result to the stored root.
Example:
current = hashlib.sha256(data_item.encode()).hexdigest() for sibling_hash, is_left_child in proof_path: if is_left_child: current = hashlib.sha256((current + sibling_hash).encode()).hexdigest() else: current = hashlib.sha256((sibling_hash + current).encode()).hexdigest() print(current == expected_root) # True if inclusion is valid
This guarantees that only the correct leaf, combined with the right sequence of sibling hashes, can reconstruct the root.
Running the Demo
main.py
ties everything together:
uv run main.py
Output looks like:
Root node ID: 31 Leaf: 3 5f9a... Proof for 'gamma': [('4f4a...', True), ('0cb0...', False), ('673e...', True)] Verification: True Merkle tree stored in /absolute/path/merkle_tree.db
This shows:
- The root node was built.
- A proof for
"gamma"
was generated. - Verification succeeded.
- The full tree is persisted in
merkle_tree.db
.
Inspecting with SQL
Because everything is in SQLite, you can poke around yourself:
sqlite3 merkle_tree.db
View schema and sample nodes:
.schema merkle_nodes SELECT id, level, is_leaf, substr(hash,1,16) AS h, data FROM merkle_nodes ORDER BY level, id LIMIT 10;
Find the root:
SELECT id, hash FROM merkle_nodes WHERE parent_id IS NULL;
You can even extract inclusion proofs with a recursive CTE.
Printing and Verifying Proofs
The helper script print_proof.py
lets you fetch proofs from an existing database:
uv run python print_proof.py gamma
Output:
Data: gamma DB: merkle_tree.db Leaf: id=3 hash=be9d... Root: afc48e... Proof (sibling_hash, is_left_child): ('4f4a94...', True) ('0cb030...', False) ('673e72...', True) Verification: True
Visualizing the Tree
The optional visualize_tree.py
script shows the tree in ASCII or Graphviz DOT:
uv run python visualize_tree.py --format ascii
Example output:
Root: id=31 hash=abc123 Level 0: [N id=31 h=abc123] Level 1: [N id=25 h=1122aa] [N id=30 h=99ee77] Level 2: [L id=3 h=5f9a... data='gamma'] [L id=4 h=...]
How This Compares to Git and Bitcoin
It's worth noting how our simple SQLite-backed Merkle tree relates to real-world systems:
Git uses Merkle trees to track file versions. Every commit points to a tree object (directory), which points to blobs (files). Hashes are stored in a content-addressable database (
.git/objects
). Our implementation is simpler but follows the same principle: content → hash → parent hash → root.Bitcoin uses Merkle trees to commit to all transactions in a block. Each leaf is a transaction hash; the Merkle root is included in the block header. Our example uses strings like
"alpha"
instead of transactions, but the process—pairing, hashing, and duplicating odd leaves—is identical.
The difference is scale and purpose. Git and Bitcoin optimize for billions of objects or high-security consensus. Our SQLite example is educational: small, transparent, and easy to query.
Lessons Learned
This little project highlights some powerful ideas:
- Cryptography meets databases: By storing Merkle trees in SQLite, you blend verifiability with queryability.
- Proofs are compact: Verifying inclusion doesn't require downloading everything—just log(N) sibling hashes.
- SQLite is versatile: It's more than a toy database; you can model even cryptographic structures.
- Educational clarity: Inspecting the tree in SQL makes the concept far less abstract.
Where to Go Next
This is a minimal, educational implementation. Real-world extensions could include:
- Append-only logs for tamper-evident auditing.
- Multiple independent Merkle trees in one database.
- Performance tuning for large datasets.
- More advanced hash strategies (domain separation, byte-level concatenation).
- Integration with APIs that require verifiable responses.
Conclusion
Merkle trees are often taught as abstract cryptographic structures behind big systems like Bitcoin. But as this tutorial shows, you can implement them locally with just Python and SQLite.
The result is a verifiable, tamper-evident database in a single .db
file—no servers, no dependencies. Whether you're prototyping, teaching, or just exploring, it's a powerful way to make Merkle trees tangible.
Top comments (0)