Posted on Dec 28, 2021

100 Languages Speedrun: Episode 39: SQLite

SQLite is likely the world's most popular database. It can be embedded in apps, and so many apps use it.

For example if you use Google Chrome, then you can open your history with sqlite3 and see which URL you visited most often:

$ sqlite3 ~/'Library/Application Support/Google/Chrome/Default/History' SQLite version 3.32.3 2020-06-18 14:16:19 Enter ".help" for usage hints. sqlite> .headers on sqlite> select url, title from urls order by visit_count desc limit 1; url|title https://twitter.com/|Home / Twitter

At least if Google Chrome is not currently running. If it's running SQLite will tell you it's locked instead, as by design it only lets one application use each database file at a time, so copy that file elsewhere to view or modify it.

Anyway, this episode is not about using SQLite for its intended purpose. We're going to have some fun.

SQL

Contrary to very common misconception, SQL is not a language, no more than "Lisp" or "Shell" are a language. Every database has its own "SQL" and they're vastly incompatible for even the simplest tasks. Over years there's been some standards, and some attempts at bringing them closer together, but it's really all failed, and every database has its own "SQL" language.

If you try to run SQL for SQLite on PostgreSQL or MySQL or whichever other system, you'd have about as much luck as trying to run Clojure Lisp code in Racket Lisp, or PowerShell shell code on ZSH shell.

Hello, World!

SQLite wasn't really designed to run SQL scripts without database from #!, so we need to write a small header to trick it into our special mode. Here's Hello, World!:

#!/bin/bash tail -n +5 "$0" | sqlite3 exit $? select "Hello, World!";

Which we can run with command line as expected:

$ ./hello.sql Hello, World!

The "$0" in shell means means current file name - and we need to put it in "" in case path contains spaces, as that breaks shell scripts. tail -n +5 means print everything from a file from line 5 onwards. | sqlite3 means start SQLite and pipe that stuff into its input.

There are of course many other ways to run it. The more reasonable way would be to simply save that SQL to a fire and do sqlite3 <file.sql:

$ cat hello2.sql select "Hello, World!"; $ sqlite3 <hello2.sql Hello, World!

Or echo it from within shell - this is not really recommended for anything nontrivial, as we need to deal with shell quoting and evaluation issues:

$ echo 'select "Hello, World!";' | sqlite3 Hello, World! $ sqlite3 <<<'select "Hello, World!";' Hello, World!

Going on, I won't be listing the tail header in the examples, just the SQL part.

Loops

The first problem we run into is looping. We have no database, and we'd like to print some integers, and that's surprisingly hard in any shared SQL.

Some databases save us here, for example this works in PostgreSQL returning all numbers from 1 to 10:

select * from generate_series(1, 10);

In SQLite we can list all the numbers, but that's of course not reasonable beyond just a few:

select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 10;

The best way to do so seems to be "recursive Common Table Expressions" (recursive CTEs). SQL doesn't allow recursion in general, but it's possible to do so in some limited cases.

with recursive generate_series(value) as ( select 1 union all select value+1 from generate_series where value + 1 <= 10 ) select value from generate_series;

FizzBuzz

Once we have the hard problem of looping solved, FizzBuzz itself is very easy:

#!/bin/bash tail -n +5 "$0" | sqlite3 exit $? with recursive generate_series(value) as ( select 1 union all select value+1 from generate_series where value + 1 <= 100 ) select case when value % 15 = 0 then 'FizzBuzz' when value % 5 = 0 then 'Buzz' when value % 3 = 0 then 'Fizz' else value end from generate_series;

Fibonacci

We can use a recursive CTE again, with all the extra variables we want. We just only select the ones we care about afterwards (a), and ignore all the rest (b, i):

#!/bin/bash tail -n +5 "$0" | sqlite3 exit $? with recursive fib(a, b, i) as ( select 1, 1, 1 union all select b, a + b, i + 1 from fib where i + 1 <= 100 ) select a from fib;

SQLite uses floats so at some point it will lose precision:

$ ./fib.sql 1 1 2 3 5 8 13 21 34 55 .. 5.16807088548583e+19 8.36211434898484e+19 1.35301852344707e+20 2.18922995834555e+20 3.54224848179262e+20

CSV

SQL can import and export to CSV, but let's pretend it doesn't, and we need to process some CSV data.

I'll actually use temporary tables for this one, so I can do this step by step - but it all works as a monster expression.

I asked GitHub Copilot to generate some data of people and their favorite programming languages:

create table csv(data); insert into csv values('first name,last name,favorite language Alice,Smith,Ruby Bob,Smith,Python Charlie,Brown,JavaScript Daniel,Clark,PHP Ed,Jones,C Frank,Smith,HTML Gary,Johnson,CSS Heather,Williams,JavaScript Ivan,Smith,Ruby Jack,Jones,JavaScript John,Brown,PHP Karen,Clark,JavaScript Larry,Jones,HTML Mike,Williams,CSS Nathan,Smith,Ruby Oscar,Brown,JavaScript Peter,Clark,PHP Quinn,Jones,JavaScript Ralph,Smith,HTML Sally,Williams,CSS Tom,Smith,Ruby Will,Jones,JavaScript Xavier,Brown,PHP Yvonne,Clark,JavaScript Zachary,Jones,HTML ');

Let's say we want to get a result where each line is programming language, and then list of everyone who likes it.

First, let's split it into lines:

create table lines(lineno, line); insert into lines with recursive split(lineno, line, str) as ( select 0, null, data from csv union all select lineno + 1, substr(str, 0, instr(str, char(10))), substr(str, instr(str, char(10))+1) from split where str != '' ) select lineno, line from split where line is not null;

This code recursively turns "A\nB\nC\n" into "A" and "B\nC\n" - until result is empty. Then it throws out those strings.

SQLite doesn't have any way to escape \n, so we need to do char(10). It's also important that this input ends with \n, otherwise the expression will loop forever. We could add some more checks to deal with it, but let's just deal with the happy path.

Data looks like this (first line is SQL header which you can get with .headers on, SQL output is separated by | by default):

lineno|line 1|first name,last name,favorite language 2|Alice,Smith,Ruby 3|Bob,Smith,Python 4|Charlie,Brown,JavaScript 5|Daniel,Clark,PHP ...

Now let's do this again for each line. As lines don't end with , we need to (line||',') for our loop to work (|| is string concatenation, not logical or):

create table cells(lineno, colno, cell); insert into cells with recursive split(lineno, colno, cell, str) as ( select lineno, 0, null, (line||',') from lines union all select lineno, colno + 1, substr(str, 0, instr(str, ',')), substr(str, instr(str, ',')+1) from split where str != '' ) select lineno, colno, cell from split where cell is not null;

Data looks like this:

lineno|colno|cell 1|1|first name 1|2|last name 1|3|favorite language 2|1|Alice 2|2|Smith 2|3|Ruby 3|1|Bob 3|2|Smith ...

Now we could work with those column numbers, but I'd much prefer to transform it so we can see keys and values.

create table cellvals(lineno, k, v); insert into cellvals select c.lineno, h.cell, c.cell from cells h inner join cells c on c.colno = h.colno and h.lineno = 1 and c.lineno != 1;

Data looks like this now:

lineno|k|v 2|first name|Alice 2|last name|Smith 2|favorite language|Ruby 3|first name|Bob 3|last name|Smith 3|favorite language|Python ...

Now let's turn this into something SQL friendly:

create table preferences(full_name, language); insert into preferences select fn.v||' '||ln.v, fl.v from cellvals fn inner join cellvals ln on fn.k='first name' and ln.k='last name' and fn.lineno = ln.lineno inner join cellvals fl on fl.k='favorite language' and fn.lineno = fl.lineno;

Data looks like this now:

full_name|language Alice Smith|Ruby Bob Smith|Python Charlie Brown|JavaScript ...

And finally we run the query we wanted:

select language || ',' || group_concat(full_name, ',') from preferences group by language;

After all this:

$ ./csv.sql C,Ed Jones CSS,Gary Johnson,Mike Williams,Sally Williams HTML,Frank Smith,Larry Jones,Ralph Smith,Zachary Jones JavaScript,Charlie Brown,Heather Williams,Jack Jones,Karen Clark,Oscar Brown,Quinn Jones,Will Jones,Yvonne Clark PHP,Daniel Clark,John Brown,Peter Clark,Xavier Brown Python,Bob Smith Ruby,Alice Smith,Ivan Smith,Nathan Smith,Tom Smith

The same as monster expression

Of course it would be more fun to do this without any temporary tables:

with csv(data) as ( select 'first name,last name,favorite language Alice,Smith,Ruby Bob,Smith,Python Charlie,Brown,JavaScript Daniel,Clark,PHP Ed,Jones,C Frank,Smith,HTML Gary,Johnson,CSS Heather,Williams,JavaScript Ivan,Smith,Ruby Jack,Jones,JavaScript John,Brown,PHP Karen,Clark,JavaScript Larry,Jones,HTML Mike,Williams,CSS Nathan,Smith,Ruby Oscar,Brown,JavaScript Peter,Clark,PHP Quinn,Jones,JavaScript Ralph,Smith,HTML Sally,Williams,CSS Tom,Smith,Ruby Will,Jones,JavaScript Xavier,Brown,PHP Yvonne,Clark,JavaScript Zachary,Jones,HTML ' ), lines(lineno, line) as ( with recursive split(lineno, line, str) as ( select 0, null, data from csv union all select lineno + 1, substr(str, 0, instr(str, char(10))), substr(str, instr(str, char(10))+1) from split where str != '' ) select lineno, line from split where line is not null ), cells(lineno, colno, cell) as ( with recursive split(lineno, colno, cell, str) as ( select lineno, 0, null, (line||',') from lines union all select lineno, colno + 1, substr(str, 0, instr(str, ',')), substr(str, instr(str, ',')+1) from split where str != '' ) select lineno, colno, cell from split where cell is not null ), cellvals(lineno, k, v) as ( select c.lineno, h.cell, c.cell from cells h inner join cells c on c.colno = h.colno and h.lineno = 1 and c.lineno != 1 ), preferences(full_name, language) as ( select fn.v||' '||ln.v, fl.v from cellvals fn inner join cellvals ln on fn.k='first name' and ln.k='last name' and fn.lineno = ln.lineno inner join cellvals fl on fl.k='favorite language' and fn.lineno = fl.lineno ) select language || ',' || group_concat(full_name, ',') from preferences group by language;

Should you use SQLite?

Double Yes!

It's totally great for its intended purpose of having tiny databases embedded in your apps. It's far less hassle than setting up an SQL or NoSQL database server, is expressive enough, performant enough, and everyone knows basic SQL (even if of a different kind) so learning curve is very low.

As for doing programming in SQL, also yes. Unlike let's say CSS where "writing games in pure CSS" is a skill nearly orthogonal to regular use of making websites look pretty, with SQL all those silly exercises can help you when you write real queries.