Posted on Nov 25

Scaling SQLite with Node worker threads and better-sqlite3

#javascript #webdev #database #programming

Hello, I'm Maneshwar. I'm working on FreeDevTools online currently building **one place for all dev tools, cheat codes, and TLDRs* — a free, open-source hub where developers can quickly find and use tools without any hassle of searching all over the internet.*

SQLite is fast, but your Node app isn’t

You put data into SQLite, point your Node app at it, throw more traffic at the service… and CPU usage climbs while latency stays flat or gets worse.

SQLite itself can be very fast, but a single Node process with a single connection usually leaves a lot of performance on the table—especially on multi-core machines.

This post walks through a concrete setup that:

Uses multiple processes and worker threads.
Opens one better-sqlite3 connection per worker.
Tunes SQLite pragmas specifically for heavy read throughput.
Backs everything with real benchmark numbers and logs.

Context: a real SVG icons database

The test database (svg-icons-db.db) contains 51,898 rows in an icon table. Each row has an id, cluster, name, and base64 payload.

The benchmark task is intentionally simple:

Randomly pick an id between 1 and maxId.
Run SELECT id, cluster, name, base64 FROM icon WHERE id = ?.
Repeat this N times across different process/worker configurations.

We fix the total number of reads to 1,000 per run and only change how we spread those reads across processes and worker threads.

Architecture: processes, workers, and connections

High-level flow

cli.js parses CLI flags and decides how many processes and workers per process to run.
Each process runs process.js, which in turn spawns worker.js threads using worker_threads.
Every worker opens its own better-sqlite3 connection in read-only mode, runs its share of queries, measures time/CPU, and reports back up the chain.

At no point do we share a Database instance across threads.

CLI: configuration and orchestration

The CLI exposes a small set of flags:

--queries / -q: total number of queries for the whole run (default 10000).
--cpus / -c: how many child processes to spawn (default 2).
--pins / -p: optional explicit CPU pins, e.g. 0,1.
--workers / -w: comma-separated list of worker counts to benchmark, e.g. 1,2,4,8.

Example:

node svg_benchmark/cli.js --queries 1000 --cpus 2 --workers 1,2,4,8

Internally, the CLI:

Figures out queriesPerWorker = floor(totalQueries / (processes × workers)).
Calls runBenchmarkWithCpuPin() with those numbers.
Collects timing and CPU stats for each configuration.
Prints a table sorted from fastest to slowest.

Processes: `process.js`

Each process:

Optionally pins itself to a CPU core (Linux taskset).
Spawns workerCount Worker threads and passes them workerData:
- dbPath
- queriesPerWorker
- maxId
- a workerId
Waits for all workers, aggregates:
- totalQueries
- totalRows
- min/max/overall QPS
- per-worker metrics
Sends a single result object back to the CLI.

Workers: `worker.js`

Each worker:

Opens a new better-sqlite3 connection:
- readonly: true
- fileMustExist: true
Applies read-optimized PRAGMA settings (more on that below).
Prepares the hot query:

 SELECT id, cluster, name, base64 FROM icon WHERE id = ?

Runs a tight loop of queriesPerWorker random reads.
Measures:
- Wall time via process.hrtime.bigint().
- CPU time via process.cpuUsage(start).
Closes the DB and posts back:
- queryCount
- durationMs
- queriesPerSecond
- totalRowsReturned
- cpuPercent

This gives the CLI everything it needs to compute throughput and CPU usage per configuration.

Code: minimal, runnable snippets

CLI: flags, DB check, and optimizer

Stripped-down version of the important parts of svg_benchmark/cli.js:

const Database = require('better-sqlite3'); const path = require('path'); function parseArgs() { const args = process.argv.slice(2); const config = { totalQueries: 10000, cpus: 2, cpuPins: [0, 1], workers: [1, 2, 4, 8], help: false }; // ... parse -q/--queries, -c/--cpus, -p/--pins, -w/--workers ... return config; } async function checkDatabase(dbPath) { const db = new Database(dbPath, { readonly: true }); const { count } = db.prepare('SELECT COUNT(*) AS count FROM icon').get(); db.close(); return count; } async function optimizeDatabase(dbPath) { const db = new Database(dbPath); db.pragma('journal_mode = WAL'); db.pragma('synchronous = OFF'); // safe here because we’re just benchmarking db.pragma('cache_size = -64000'); // 64 MB db.pragma('temp_store = MEMORY'); db.pragma('mmap_size = 268435456'); // 256 MB db.pragma('page_size = 4096'); db.exec('ANALYZE icon'); db.pragma('optimize'); db.close(); }

The full file also handles spawning processes and printing tables, but this is the core of the DB setup.

Worker: one connection per thread

The essential logic from worker.js:

const { parentPort, workerData } = require('worker_threads'); const Database = require('better-sqlite3'); const { cpuUsage } = require('process'); const { dbPath, queriesPerWorker, maxId } = workerData; const cpuStart = cpuUsage(); const db = new Database(dbPath, { readonly: true, fileMustExist: true }); db.pragma('cache_size = -32000'); // 32 MB per worker db.pragma('temp_store = MEMORY'); db.pragma('mmap_size = 1073741824'); // 1 GB mmap db.pragma('query_only = ON'); const stmt = db.prepare('SELECT id, cluster, name, base64 FROM icon WHERE id = ?'); const start = process.hrtime.bigint(); let totalRowsReturned = 0; for (let i = 0; i < queriesPerWorker; i++) { const id = (Math.random() * maxId | 0) + 1; if (stmt.get(id)) totalRowsReturned++; } const end = process.hrtime.bigint(); const durationMs = Number(end - start) / 1e6; db.close(); const cpuEnd = cpuUsage(cpuStart); const cpuPercent = ((cpuEnd.user + cpuEnd.system) / 1000) / durationMs * 100; parentPort.postMessage({ queryCount: queriesPerWorker, durationMs, totalRowsReturned, queriesPerSecond: (queriesPerWorker / durationMs) * 1000, cpuPercent });

You can lift this pattern into your own project: pass dbPath, maxId, and queriesPerWorker, and keep the “one connection per worker” rule.

Performance tuning: pragmas that actually matter

Process-level tuning

Before any worker threads run, optimizeDatabase() configures the database:

journal_mode = WAL: enables concurrent readers without blocking.
synchronous = OFF: disables durability safeguards for maximum speed (OK in a benchmark, not OK for production writes).
cache_size = -64000: ~64 MB cache; negative values are in KiB.
temp_store = MEMORY: avoids temp files.
mmap_size = 268435456: 256 MB of memory-mapped I/O.
page_size = 4096: reasonable default for many workloads.
ANALYZE icon + PRAGMA optimize: warms up query planning.

This is about making the database ready for heavy read traffic.

Worker-level tuning

Each worker connection tweaks a smaller set of pragmas:

cache_size = -32000 (32 MB per worker).
temp_store = MEMORY.
mmap_size = 1073741824 (1 GB).
query_only = ON to guarantee no writes.

These keep per-thread handles light but fast, and keep the workload purely read-only.

Results: how far do workers take us?

Command:

node cli.js --rows 50000 --queries 1000 --cpus 2 --workers 1,2,4,8

Output (trimmed):

Database rows: 51,898
Fixed workload: 1,000 queries
CPU pins: [0, 1]
Worker configs: [1, 2, 4, 8]

Table (fastest to slowest):

2p × 1w (2 total workers, 500 queries/worker) → 0.121s, 8,274 QPS, avg query 120.86µs.
2p × 2w (4 workers, 250 queries/worker) → 0.192s, 5,199 QPS, avg query 192.35µs.
2p × 4w (8 workers, 125 queries/worker) → 0.342s, 2,922 QPS, avg query 342.24µs.
2p × 8w (16 workers, 62 queries/worker) → 0.609s, 1,629 QPS, avg query 613.73µs.

The CLI also prints:

Per-configuration CPU usage across cores.
A summary like:
- Fastest config: 2p-1w → 0.121s (8,274.23 QPS)
- Slowest config: 2p-8w → 0.609s (1,629.39 QPS)
- Time saved: 0.488s (403.7% faster).

What this tells us

Adding workers beyond available cores produces diminishing returns and eventually regressions.
On this 2‑core machine, 2 processes × 1 worker is the sweet spot—about 5.04× faster than 2p-8w for the same 1,000-query workload.
Fixing the total number of queries makes configs directly comparable.

Query patterns: why primary keys dominate

So far we’ve focused on how many threads are hitting SQLite. Just as important is what query pattern they’re running.

To show this, the repo includes a tiny script, compare-queries.js, that compares a primary-key lookup against a secondary-index lookup on the same table:

const Database = require('better-sqlite3'); const db = new Database('benchmark.db', { readonly: true }); db.pragma('cache_size = -64000'); db.pragma('mmap_size = 268435456'); db.pragma('temp_store = MEMORY'); console.log('🔍 Comparing Query Performance (1000 queries each)\n'); // Test 1: Primary key lookup const stmt1 = db.prepare('SELECT a, b, c, d FROM table1 WHERE id = ?'); let start = Date.now(); for (let i = 0; i < 1000; i++) { const id = Math.floor(Math.random() * 50000) + 1; stmt1.get(id); } const pkTime = Date.now() - start; // Test 2: Index scan const stmt2 = db.prepare('SELECT a, b, c, d FROM table1 WHERE x = ?'); start = Date.now(); for (let i = 0; i < 1000; i++) { const x = Math.floor(Math.random() * 100); stmt2.all(x); } const indexTime = Date.now() - start;

Sample output:

🔍 Comparing Query Performance (1000 queries each) Primary Key Lookup (WHERE id = ?): Time: 6ms (6.0µs per query) QPS: 166667 Index Scan (WHERE x = ?): Time: 930ms (930.0µs per query) QPS: 1075 ⚡ Speedup: 155.00x faster with primary key!

Same database, same table, same number of queries—just a different predicate:

The primary-key lookup (WHERE id = ?) can jump directly to the row using the row’s key in a B‑Tree.
The secondary index scan (WHERE x = ?) has to touch many more rows per query and move more data through the CPU caches.

The big picture: if your workers are hammering the database with poor query shapes, no amount of threading or pragma tuning will save you.

How this ties back to rowid, indexes, and WITHOUT ROWID

Under the hood, every “normal” SQLite table has a hidden rowid column:

Rows are stored on disk in order of increasing rowid.
A lookup like SELECT rating FROM TopRatedFilm WHERE rowid = 2; can use a B‑Tree binary search on rowid instead of scanning the whole table.

Primary keys and indexes build on top of that:

A secondary index (for example, on movie) is just another B‑Tree that maps movie → rowid.
A lookup like WHERE movie = 'Starwars':
1. Uses the index B‑Tree to find matching rowids.
2. Uses rowid lookups on the main table to fetch the full rows.

That’s still two binary searches per row, but it’s far better than a full table scan.

SQLite also supports WITHOUT ROWID tables, where the PRIMARY KEY itself is the B‑Tree key:

CREATE TABLE TopRatedFilm(movie TEXT PRIMARY KEY, studio TEXT, rating REAL) WITHOUT ROWID;
The main B‑Tree is keyed directly by movie, so a lookup by primary key uses one binary search instead of two.
For the right workloads (non-integer or multi-column keys, relatively small rows), this can cut space roughly in half and speed up lookups.

For this project, the key takeaway is:

Design hot paths to use primary-key (or rowid) lookups wherever possible.
When your “real” key is 2–3 columns (a, b, c), hash them into a single lookup_hash column and query by that: in Node, build a canonical string like ${a.toLowerCase()}|${b}|${c}, hash it (e.g. with SHA-256 or a 64‑bit hash), store it in lookup_hash, and then make that column UNIQUE or PRIMARY KEY so workers can still do a single-column WHERE lookup_hash = ? lookup.
Use secondary indexes when you need them, but understand that they always carry extra work compared to a clean primary-key (or hashed-key) lookup.
If you have a natural non-integer primary key and read-heavy workloads, WITHOUT ROWID is a powerful optimization to experiment with in your benchmark harness—especially when combined with a fixed-size hash key.

Takeaways

Use one SQLite connection per worker thread. It keeps the threading model simple and avoids undefined behavior.
Fix the total work, then vary the parallelism. This is the only way to compare configurations fairly.
Tune pragmas based on workload. The settings here are great for read-only benchmarks, not for OLTP-style writes.
Measure CPU and latency together. High CPU with flat QPS is a signal that your concurrency model is the bottleneck.
Start with a small, real database. A 50k–100k row dataset like the SVG icons DB is enough to expose bad patterns without making iteration painful.

👉 Check out: FreeDevTools

Any feedback or contributors are welcome!

It’s online, open-source, and ready for anyone to use.

⭐ Star it on GitHub: freedevtools

DEV Community

Scaling SQLite with Node worker threads and better-sqlite3

Context: a real SVG icons database

Architecture: processes, workers, and connections

High-level flow

CLI: configuration and orchestration

Processes: `process.js`

Workers: `worker.js`

Code: minimal, runnable snippets

CLI: flags, DB check, and optimizer

Worker: one connection per thread

Performance tuning: pragmas that actually matter

Process-level tuning

Worker-level tuning

Results: how far do workers take us?

What this tells us

Query patterns: why primary keys dominate

How this ties back to rowid, indexes, and WITHOUT ROWID

Takeaways

Top comments (0)

Context: a real SVG icons database

Architecture: processes, workers, and connections

High-level flow

CLI: configuration and orchestration

Processes: process.js

Workers: worker.js

Code: minimal, runnable snippets

CLI: flags, DB check, and optimizer

Worker: one connection per thread

Performance tuning: pragmas that actually matter

Process-level tuning

Worker-level tuning

Results: how far do workers take us?

What this tells us

Query patterns: why primary keys dominate

How this ties back to rowid, indexes, and WITHOUT ROWID

Takeaways

Processes: `process.js`

Workers: `worker.js`