One-liner for running queries against CSV files with SQLite

I figured out how to run a SQL query directly against a CSV file using the sqlite3 command-line utility:

sqlite3 :memory: -cmd '.mode csv' -cmd '.import taxi.csv taxi' \ 'SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi GROUP BY passenger_count' 

This uses the special :memory: filename to open an in-memory database. Then it uses two -cmd options to turn on CSV mode and import the taxi.csv file into a table called taxi. Then it runs the SQL query.

Instead of setting the mode with .mode you can use .import -csv like this (thanks, Mark Lawrence):

sqlite3 :memory: -cmd '.import -csv taxi.csv taxi' \ 'SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi GROUP BY passenger_count' 

You can get taxi.csv by downloading the compressed file from here and running:

7z e -aos taxi.csv.7z 

I figured this out while commenting on this issue.

The output looks like this:

"",128020,32.2371511482553 0,42228,17.0214016766151 1,1533197,17.6418833067999 2,286461,18.0975870711456 3,72852,17.9153958710923 4,25510,18.452774990196 5,50291,17.2709248175672 6,32623,17.6002964166367 7,2,87.17 8,2,95.705 9,1,113.6 

Add -cmd '.mode column' to output in columns instead:

$ sqlite3 :memory: -cmd '.mode csv' -cmd '.import taxi.csv taxi' -cmd '.mode column' \ 'SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi GROUP BY passenger_count' passenger_count COUNT(*) AVG(total_amount) --------------- -------- ----------------- 128020 32.2371511482553 0 42228 17.0214016766151 1 1533197 17.6418833067999 2 286461 18.0975870711456 3 72852 17.9153958710923 4 25510 18.452774990196 5 50291 17.2709248175672 6 32623 17.6002964166367 7 2 87.17 8 2 95.705 9 1 113.6 

Or use -cmd '.mode markdown' to get a Markdown table:

passenger_count COUNT(*) AVG(total_amount)
128020 32.2371511482553
0 42228 17.0214016766151
1 1533197 17.6418833067999
2 286461 18.0975870711456
3 72852 17.9153958710923
4 25510 18.452774990196
5 50291 17.2709248175672
6 32623 17.6002964166367
7 2 87.17
8 2 95.705
9 1 113.6

A full list of output modes can be seen like this:

% sqlite3 -cmd '.help mode' .mode MODE ?TABLE? Set output mode MODE is one of: ascii Columns/rows delimited by 0x1F and 0x1E box Tables using unicode box-drawing characters csv Comma-separated values column Output in columns. (See .width) html HTML <table> code insert SQL insert statements for TABLE json Results in a JSON array line One value per line list Values delimited by "|" markdown Markdown table format quote Escape answers as for SQL table ASCII-art table tabs Tab-separated values tcl TCL list elements 

Other options

There are a whole bunch of other tools that can be used for this kind of thing!

My own sqlite-utils memory command can load data from JSON, CSV or TSV into an in-memory database and run a query against it. It's a LOT slower than using sqlite3 directly though.

dsq is a tool that does this kind of thing (and a lot more). Author Phil Eaton compiled a collection of benchmarks of other similar tools, and his benchmarking script demonstrates how to use each one of them.

Created 2022-06-20T16:00:51-07:00, updated 2022-06-22T09:32:00-07:00 · History · Edit