DEV Community

Stephen Collins
Stephen Collins

Posted on

Merkle Trees in SQLite with Python: A Practical Tutorial

Blog Post Cover Image

Merkle trees are one of those concepts that quietly power some of the most important technologies we use today—Git, blockchains, and peer-to-peer systems like IPFS. They provide a compact way to prove that a piece of data belongs to a larger dataset, without needing to reveal or send everything.

But Merkle trees are often introduced only in the context of massive distributed systems. What if you just want to play with them locally, on your laptop, using tools you already know? That's where SQLite comes in.

In this tutorial, we'll build a Merkle tree in Python and persist it to SQLite. By the end, you'll have a working project that can:

  • Build a Merkle tree from arbitrary strings.
  • Store all nodes—leaves and internal hashes—in a relational database.
  • Query inclusion proofs for any leaf.
  • Verify proofs against the stored root.
  • Inspect and visualize the tree with SQL or helper scripts.

This combination gives you a verifiable, tamper-evident, and queryable data structure inside a single .db file.

All the code is available on GitHub.


Why SQLite?

Before we dive into code, let's answer the obvious question: why store a Merkle tree in SQLite at all?

SQLite is a lightweight, embeddable database. You don't need a server, a cluster, or even an internet connection—just a file. That makes it ideal for:

  • Tamper-evident logs
    Store logs or events as leaves, and use the Merkle root to prove no row was modified. If someone changes even one character in the database, the root hash won't match.

  • Lightweight blockchain-like systems
    You can experiment with blockchain concepts without spinning up heavy infrastructure. Exchange Merkle roots and proofs between peers instead of full tables.

  • Efficient synchronization
    Two devices with the same dataset can exchange just their root hashes. If they differ, you can reconcile branch by branch instead of copying everything.

  • Verifiable queries
    SQL gives you speed and structure; Merkle trees give you proofs. Together, you can run a query and provide a cryptographic guarantee that the results belong to a specific database state.

  • Education and demos
    SQLite makes the concept tangible. You can see every node in a table, inspect it with SQL, and visualize the tree.

So while it may seem unusual, storing Merkle trees in SQLite is a neat way to bridge cryptographic structures with the databases we already use.


How Merkle Trees Work

A Merkle tree is a binary tree where:

  • Leaves contain the hash of the raw data.
  • Internal nodes contain the hash of the concatenation of their children.
  • The root hash commits to the entire dataset.

If you want to prove that “gamma” is part of the dataset, you don't need to show every other value. Instead, you provide its sibling hashes along the path to the root. The verifier recomputes and checks if it matches the stored root hash.


The Project Structure

Our Python project is minimal and dependency-free. The main files are:

├── main.py # Example usage ├── merkle_tree.py # Core MerkleTree class ├── print_proof.py # CLI to print and verify proofs ├── visualize_tree.py # CLI to render tree as ASCII or Graphviz ├── pyproject.toml # Metadata and scripts └── README.md 
Enter fullscreen mode Exit fullscreen mode

Everything uses only the Python standard library (hashlib, sqlite3, pathlib).


The SQLite Schema

The heart of persistence is a single table:

CREATE TABLE IF NOT EXISTS merkle_nodes ( id INTEGER PRIMARY KEY, parent_id INTEGER, left_child_id INTEGER, right_child_id INTEGER, hash TEXT NOT NULL, level INTEGER NOT NULL, is_leaf BOOLEAN NOT NULL, data TEXT, FOREIGN KEY(parent_id) REFERENCES merkle_nodes(id), FOREIGN KEY(left_child_id) REFERENCES merkle_nodes(id), FOREIGN KEY(right_child_id) REFERENCES merkle_nodes(id) ); 
Enter fullscreen mode Exit fullscreen mode

Each row is one node. Leaves carry original data; internal nodes have only hash. Parent and child links let you traverse the tree in either direction.


The MerkleTree Class

The core logic lives in merkle_tree.py. It's wrapped in a class that manages both hashing and database storage.

Initialization and Context Manager

with MerkleTree("merkle_tree.db") as tree: root_id = tree.build_tree(["alpha", "beta", "gamma"]) 
Enter fullscreen mode Exit fullscreen mode

The class opens and closes its SQLite connection automatically. Using a with block ensures resources are cleaned up.

Building the Tree

build_tree(data_blocks) does three things:

  1. Hashes the leaves with SHA-256.
  2. Iteratively pairs them, hashing concatenated child hashes until a root is formed. If there's an odd number, the last hash is duplicated (Bitcoin-style).
  3. Inserts each node into the database and links parent-child relationships.

The method returns the root node's ID for convenience.

Generating Proofs

get_proof(leaf_id) walks from a leaf up to the root, collecting sibling hashes along the way. Each step also records whether the current node was a left or right child. That orientation matters because the concatenation order of hashes changes the result.

  • If the current node was the left child, the sibling is on the right → (sibling_hash, True).
  • If the current node was the right child, the sibling is on the left → (sibling_hash, False).

This way, the verifier knows whether to compute sha256(current + sibling) or sha256(sibling + current).

Example proof path:

[ ("hash_of_right_sibling", True), # current node was left  ("hash_of_left_sibling", False), # current node was right  ("hash_of_right_sibling", True) # current node was left ] 
Enter fullscreen mode Exit fullscreen mode

During verification, you start from the leaf's hash, then fold in each sibling in the correct order until you recompute the root.

Verification

Finally, verify_proof(data_item, proof_path, expected_root) recomputes the path from a leaf to the root.

It works like this:

  1. Start with the SHA-256 hash of the original data.
  2. For each (sibling_hash, is_left_child) in the proof path:
  • If is_left_child == True, the current node was the left child, so compute:

     current = sha256((current + sibling_hash).encode()) 
  • If is_left_child == False, the current node was the right child, so compute:

     current = sha256((sibling_hash + current).encode()) 
  1. When you reach the top, compare the result to the stored root.

Example:

current = hashlib.sha256(data_item.encode()).hexdigest() for sibling_hash, is_left_child in proof_path: if is_left_child: current = hashlib.sha256((current + sibling_hash).encode()).hexdigest() else: current = hashlib.sha256((sibling_hash + current).encode()).hexdigest() print(current == expected_root) # True if inclusion is valid 
Enter fullscreen mode Exit fullscreen mode

This guarantees that only the correct leaf, combined with the right sequence of sibling hashes, can reconstruct the root.


Running the Demo

main.py ties everything together:

uv run main.py 
Enter fullscreen mode Exit fullscreen mode

Output looks like:

Root node ID: 31 Leaf: 3 5f9a... Proof for 'gamma': [('4f4a...', True), ('0cb0...', False), ('673e...', True)] Verification: True Merkle tree stored in /absolute/path/merkle_tree.db 
Enter fullscreen mode Exit fullscreen mode

This shows:

  • The root node was built.
  • A proof for "gamma" was generated.
  • Verification succeeded.
  • The full tree is persisted in merkle_tree.db.

Inspecting with SQL

Because everything is in SQLite, you can poke around yourself:

sqlite3 merkle_tree.db 
Enter fullscreen mode Exit fullscreen mode

View schema and sample nodes:

.schema merkle_nodes SELECT id, level, is_leaf, substr(hash,1,16) AS h, data FROM merkle_nodes ORDER BY level, id LIMIT 10; 
Enter fullscreen mode Exit fullscreen mode

Find the root:

SELECT id, hash FROM merkle_nodes WHERE parent_id IS NULL; 
Enter fullscreen mode Exit fullscreen mode

You can even extract inclusion proofs with a recursive CTE.


Printing and Verifying Proofs

The helper script print_proof.py lets you fetch proofs from an existing database:

uv run python print_proof.py gamma 
Enter fullscreen mode Exit fullscreen mode

Output:

Data: gamma DB: merkle_tree.db Leaf: id=3 hash=be9d... Root: afc48e... Proof (sibling_hash, is_left_child): ('4f4a94...', True) ('0cb030...', False) ('673e72...', True) Verification: True 
Enter fullscreen mode Exit fullscreen mode

Visualizing the Tree

The optional visualize_tree.py script shows the tree in ASCII or Graphviz DOT:

uv run python visualize_tree.py --format ascii 
Enter fullscreen mode Exit fullscreen mode

Example output:

Root: id=31 hash=abc123 Level 0: [N id=31 h=abc123] Level 1: [N id=25 h=1122aa] [N id=30 h=99ee77] Level 2: [L id=3 h=5f9a... data='gamma'] [L id=4 h=...] 
Enter fullscreen mode Exit fullscreen mode

How This Compares to Git and Bitcoin

It's worth noting how our simple SQLite-backed Merkle tree relates to real-world systems:

  • Git uses Merkle trees to track file versions. Every commit points to a tree object (directory), which points to blobs (files). Hashes are stored in a content-addressable database (.git/objects). Our implementation is simpler but follows the same principle: content → hash → parent hash → root.

  • Bitcoin uses Merkle trees to commit to all transactions in a block. Each leaf is a transaction hash; the Merkle root is included in the block header. Our example uses strings like "alpha" instead of transactions, but the process—pairing, hashing, and duplicating odd leaves—is identical.

The difference is scale and purpose. Git and Bitcoin optimize for billions of objects or high-security consensus. Our SQLite example is educational: small, transparent, and easy to query.


Lessons Learned

This little project highlights some powerful ideas:

  • Cryptography meets databases: By storing Merkle trees in SQLite, you blend verifiability with queryability.
  • Proofs are compact: Verifying inclusion doesn't require downloading everything—just log(N) sibling hashes.
  • SQLite is versatile: It's more than a toy database; you can model even cryptographic structures.
  • Educational clarity: Inspecting the tree in SQL makes the concept far less abstract.

Where to Go Next

This is a minimal, educational implementation. Real-world extensions could include:

  • Append-only logs for tamper-evident auditing.
  • Multiple independent Merkle trees in one database.
  • Performance tuning for large datasets.
  • More advanced hash strategies (domain separation, byte-level concatenation).
  • Integration with APIs that require verifiable responses.

Conclusion

Merkle trees are often taught as abstract cryptographic structures behind big systems like Bitcoin. But as this tutorial shows, you can implement them locally with just Python and SQLite.

The result is a verifiable, tamper-evident database in a single .db file—no servers, no dependencies. Whether you're prototyping, teaching, or just exploring, it's a powerful way to make Merkle trees tangible.

Top comments (0)