fizz.today

sqlite3 :memory: is the CSV query engine you already have

I needed to query a CSV file streaming from S3 without downloading it, without writing a script, and without leaving the terminal. I wrote this alias years ago and I still use it daily:

alias csvq="sqlite3 :memory: -cmd '.mode csv' -cmd '.import /dev/stdin s3' '.mode json'"

Pipe any CSV into it, write SQL, get JSON:

aws s3 cp s3://$bucket/$key - | csvq "select * from s3 where status='failed'" | jq '.[]' -c

The alias creates an in-memory SQLite database, imports stdin as CSV into a table called s3, switches output mode to JSON, and runs whatever query you pass. No temp files, no bespoke parser, no “real pipeline” required before you’re allowed to ask a question about your data.

The table name s3 is operationally honest, not academically pure. I use this alias mostly for files streaming from S3, so the table got the name that matches the mental model of the work.

TSV gets its own sibling:

alias tsvq="sqlite3 :memory: -cmd '.mode tabs' -cmd '.import /dev/stdin s3' '.mode json'"

The gist explicitly keeps both SQL and jq available in the same pipeline. Some questions arrive in relational form (where status='failed'). Some arrive in structural form (.[] | select(.count > 100)). Good CLI design doesn’t force one way of thinking when two are useful.

The pattern underneath is: take a raw stream, give it temporary relational structure, query it with a real query language, return the result to the stream so it can keep moving. The whole operation is ordinary. You have data, you want to interrogate it, and the machine shouldn’t make that weird.

gist

#cli #sqlite #data #platformengineering