Hello, I'm Maneshwar. I'm working on FreeDevTools online currently building **one place for all dev tools, cheat codes, and TLDRs* — a free, open-source hub where developers can quickly find and use tools without any hassle of searching all over the internet.*
SQLite is fast, but your Node app isn’t
You put data into SQLite, point your Node app at it, throw more traffic at the service… and CPU usage climbs while latency stays flat or gets worse.
SQLite itself can be very fast, but a single Node process with a single connection usually leaves a lot of performance on the table—especially on multi-core machines.
This post walks through a concrete setup that:
- Uses multiple processes and worker threads.
- Opens one
better-sqlite3connection per worker. - Tunes SQLite pragmas specifically for heavy read throughput.
- Backs everything with real benchmark numbers and logs.
Context: a real SVG icons database
The test database (svg-icons-db.db) contains 51,898 rows in an icon table. Each row has an id, cluster, name, and base64 payload.
The benchmark task is intentionally simple:
- Randomly pick an
idbetween1andmaxId. - Run
SELECT id, cluster, name, base64 FROM icon WHERE id = ?. - Repeat this N times across different process/worker configurations.
We fix the total number of reads to 1,000 per run and only change how we spread those reads across processes and worker threads.
Architecture: processes, workers, and connections
High-level flow
-
cli.jsparses CLI flags and decides how many processes and workers per process to run. - Each process runs
process.js, which in turn spawnsworker.jsthreads usingworker_threads. - Every worker opens its own
better-sqlite3connection in read-only mode, runs its share of queries, measures time/CPU, and reports back up the chain.
At no point do we share a Database instance across threads.
CLI: configuration and orchestration
The CLI exposes a small set of flags:
-
--queries/-q: total number of queries for the whole run (default10000). -
--cpus/-c: how many child processes to spawn (default2). -
--pins/-p: optional explicit CPU pins, e.g.0,1. -
--workers/-w: comma-separated list of worker counts to benchmark, e.g.1,2,4,8.
Example:
node svg_benchmark/cli.js --queries 1000 --cpus 2 --workers 1,2,4,8 Internally, the CLI:
- Figures out
queriesPerWorker = floor(totalQueries / (processes × workers)). - Calls
runBenchmarkWithCpuPin()with those numbers. - Collects timing and CPU stats for each configuration.
- Prints a table sorted from fastest to slowest.
Processes: process.js
Each process:
- Optionally pins itself to a CPU core (Linux
taskset). - Spawns
workerCountWorkerthreads and passes themworkerData:dbPathqueriesPerWorkermaxId- a
workerId
- Waits for all workers, aggregates:
totalQueriestotalRows- min/max/overall QPS
- per-worker metrics
- Sends a single result object back to the CLI.
Workers: worker.js
Each worker:
- Opens a new
better-sqlite3connection:readonly: truefileMustExist: true
- Applies read-optimized PRAGMA settings (more on that below).
- Prepares the hot query:
SELECT id, cluster, name, base64 FROM icon WHERE id = ? - Runs a tight loop of
queriesPerWorkerrandom reads. - Measures:
- Wall time via
process.hrtime.bigint(). - CPU time via
process.cpuUsage(start).
- Wall time via
- Closes the DB and posts back:
queryCountdurationMsqueriesPerSecondtotalRowsReturnedcpuPercent
This gives the CLI everything it needs to compute throughput and CPU usage per configuration.
Code: minimal, runnable snippets
CLI: flags, DB check, and optimizer
Stripped-down version of the important parts of svg_benchmark/cli.js:
const Database = require('better-sqlite3'); const path = require('path'); function parseArgs() { const args = process.argv.slice(2); const config = { totalQueries: 10000, cpus: 2, cpuPins: [0, 1], workers: [1, 2, 4, 8], help: false }; // ... parse -q/--queries, -c/--cpus, -p/--pins, -w/--workers ... return config; } async function checkDatabase(dbPath) { const db = new Database(dbPath, { readonly: true }); const { count } = db.prepare('SELECT COUNT(*) AS count FROM icon').get(); db.close(); return count; } async function optimizeDatabase(dbPath) { const db = new Database(dbPath); db.pragma('journal_mode = WAL'); db.pragma('synchronous = OFF'); // safe here because we’re just benchmarking db.pragma('cache_size = -64000'); // 64 MB db.pragma('temp_store = MEMORY'); db.pragma('mmap_size = 268435456'); // 256 MB db.pragma('page_size = 4096'); db.exec('ANALYZE icon'); db.pragma('optimize'); db.close(); } The full file also handles spawning processes and printing tables, but this is the core of the DB setup.
Worker: one connection per thread
The essential logic from worker.js:
const { parentPort, workerData } = require('worker_threads'); const Database = require('better-sqlite3'); const { cpuUsage } = require('process'); const { dbPath, queriesPerWorker, maxId } = workerData; const cpuStart = cpuUsage(); const db = new Database(dbPath, { readonly: true, fileMustExist: true }); db.pragma('cache_size = -32000'); // 32 MB per worker db.pragma('temp_store = MEMORY'); db.pragma('mmap_size = 1073741824'); // 1 GB mmap db.pragma('query_only = ON'); const stmt = db.prepare('SELECT id, cluster, name, base64 FROM icon WHERE id = ?'); const start = process.hrtime.bigint(); let totalRowsReturned = 0; for (let i = 0; i < queriesPerWorker; i++) { const id = (Math.random() * maxId | 0) + 1; if (stmt.get(id)) totalRowsReturned++; } const end = process.hrtime.bigint(); const durationMs = Number(end - start) / 1e6; db.close(); const cpuEnd = cpuUsage(cpuStart); const cpuPercent = ((cpuEnd.user + cpuEnd.system) / 1000) / durationMs * 100; parentPort.postMessage({ queryCount: queriesPerWorker, durationMs, totalRowsReturned, queriesPerSecond: (queriesPerWorker / durationMs) * 1000, cpuPercent }); You can lift this pattern into your own project: pass dbPath, maxId, and queriesPerWorker, and keep the “one connection per worker” rule.
Performance tuning: pragmas that actually matter
Process-level tuning
Before any worker threads run, optimizeDatabase() configures the database:
-
journal_mode = WAL: enables concurrent readers without blocking. -
synchronous = OFF: disables durability safeguards for maximum speed (OK in a benchmark, not OK for production writes). -
cache_size = -64000: ~64 MB cache; negative values are in KiB. -
temp_store = MEMORY: avoids temp files. -
mmap_size = 268435456: 256 MB of memory-mapped I/O. -
page_size = 4096: reasonable default for many workloads. -
ANALYZE icon+PRAGMA optimize: warms up query planning.
This is about making the database ready for heavy read traffic.
Worker-level tuning
Each worker connection tweaks a smaller set of pragmas:
-
cache_size = -32000(32 MB per worker). -
temp_store = MEMORY. -
mmap_size = 1073741824(1 GB). -
query_only = ONto guarantee no writes.
These keep per-thread handles light but fast, and keep the workload purely read-only.
Results: how far do workers take us?
Command:
node cli.js --rows 50000 --queries 1000 --cpus 2 --workers 1,2,4,8 Output (trimmed):
- Database rows:
51,898 - Fixed workload:
1,000queries - CPU pins:
[0, 1] - Worker configs:
[1, 2, 4, 8]
Table (fastest to slowest):
- 2p × 1w (2 total workers, 500 queries/worker) → 0.121s, 8,274 QPS, avg query 120.86µs.
- 2p × 2w (4 workers, 250 queries/worker) → 0.192s, 5,199 QPS, avg query 192.35µs.
- 2p × 4w (8 workers, 125 queries/worker) → 0.342s, 2,922 QPS, avg query 342.24µs.
- 2p × 8w (16 workers, 62 queries/worker) → 0.609s, 1,629 QPS, avg query 613.73µs.
The CLI also prints:
- Per-configuration CPU usage across cores.
- A summary like:
- Fastest config:
2p-1w → 0.121s (8,274.23 QPS) - Slowest config:
2p-8w → 0.609s (1,629.39 QPS) - Time saved:
0.488s(403.7% faster).
- Fastest config:
What this tells us
- Adding workers beyond available cores produces diminishing returns and eventually regressions.
- On this 2‑core machine, 2 processes × 1 worker is the sweet spot—about 5.04× faster than
2p-8wfor the same 1,000-query workload. - Fixing the total number of queries makes configs directly comparable.
Query patterns: why primary keys dominate
So far we’ve focused on how many threads are hitting SQLite. Just as important is what query pattern they’re running.
To show this, the repo includes a tiny script, compare-queries.js, that compares a primary-key lookup against a secondary-index lookup on the same table:
const Database = require('better-sqlite3'); const db = new Database('benchmark.db', { readonly: true }); db.pragma('cache_size = -64000'); db.pragma('mmap_size = 268435456'); db.pragma('temp_store = MEMORY'); console.log('🔍 Comparing Query Performance (1000 queries each)\n'); // Test 1: Primary key lookup const stmt1 = db.prepare('SELECT a, b, c, d FROM table1 WHERE id = ?'); let start = Date.now(); for (let i = 0; i < 1000; i++) { const id = Math.floor(Math.random() * 50000) + 1; stmt1.get(id); } const pkTime = Date.now() - start; // Test 2: Index scan const stmt2 = db.prepare('SELECT a, b, c, d FROM table1 WHERE x = ?'); start = Date.now(); for (let i = 0; i < 1000; i++) { const x = Math.floor(Math.random() * 100); stmt2.all(x); } const indexTime = Date.now() - start; Sample output:
🔍 Comparing Query Performance (1000 queries each) Primary Key Lookup (WHERE id = ?): Time: 6ms (6.0µs per query) QPS: 166667 Index Scan (WHERE x = ?): Time: 930ms (930.0µs per query) QPS: 1075 ⚡ Speedup: 155.00x faster with primary key! Same database, same table, same number of queries—just a different predicate:
- The primary-key lookup (
WHERE id = ?) can jump directly to the row using the row’s key in a B‑Tree. - The secondary index scan (
WHERE x = ?) has to touch many more rows per query and move more data through the CPU caches.
The big picture: if your workers are hammering the database with poor query shapes, no amount of threading or pragma tuning will save you.
How this ties back to rowid, indexes, and WITHOUT ROWID
Under the hood, every “normal” SQLite table has a hidden rowid column:
- Rows are stored on disk in order of increasing
rowid. - A lookup like
SELECT rating FROM TopRatedFilm WHERE rowid = 2;can use a B‑Tree binary search on rowid instead of scanning the whole table.
Primary keys and indexes build on top of that:
- A secondary index (for example, on
movie) is just another B‑Tree that mapsmovie → rowid. - A lookup like
WHERE movie = 'Starwars':- Uses the index B‑Tree to find matching rowids.
- Uses rowid lookups on the main table to fetch the full rows.
That’s still two binary searches per row, but it’s far better than a full table scan.
SQLite also supports WITHOUT ROWID tables, where the PRIMARY KEY itself is the B‑Tree key:
CREATE TABLE TopRatedFilm(movie TEXT PRIMARY KEY, studio TEXT, rating REAL) WITHOUT ROWID;- The main B‑Tree is keyed directly by
movie, so a lookup by primary key uses one binary search instead of two. - For the right workloads (non-integer or multi-column keys, relatively small rows), this can cut space roughly in half and speed up lookups.
For this project, the key takeaway is:
- Design hot paths to use primary-key (or rowid) lookups wherever possible.
- When your “real” key is 2–3 columns (
a,b,c), hash them into a singlelookup_hashcolumn and query by that: in Node, build a canonical string like${a.toLowerCase()}|${b}|${c}, hash it (e.g. withSHA-256or a 64‑bit hash), store it inlookup_hash, and then make that columnUNIQUEorPRIMARY KEYso workers can still do a single-columnWHERE lookup_hash = ?lookup. - Use secondary indexes when you need them, but understand that they always carry extra work compared to a clean primary-key (or hashed-key) lookup.
- If you have a natural non-integer primary key and read-heavy workloads,
WITHOUT ROWIDis a powerful optimization to experiment with in your benchmark harness—especially when combined with a fixed-size hash key.
Takeaways
- Use one SQLite connection per worker thread. It keeps the threading model simple and avoids undefined behavior.
- Fix the total work, then vary the parallelism. This is the only way to compare configurations fairly.
- Tune pragmas based on workload. The settings here are great for read-only benchmarks, not for OLTP-style writes.
- Measure CPU and latency together. High CPU with flat QPS is a signal that your concurrency model is the bottleneck.
- Start with a small, real database. A 50k–100k row dataset like the SVG icons DB is enough to expose bad patterns without making iteration painful.
👉 Check out: FreeDevTools
Any feedback or contributors are welcome!
It’s online, open-source, and ready for anyone to use.
⭐ Star it on GitHub: freedevtools

Top comments (0)