gitmyhub

q

Python ★ 10k updated 4mo ago

q - Run SQL directly on delimited files and multi-file sqlite databases

A command-line tool that lets you run SQL queries directly against CSV and text files, filter, group, join, and count data without importing it into a database.

Pythonsetup: easycomplexity 2/5

q is a command-line tool that lets you run SQL queries directly against CSV and other delimited text files, without importing them into a database first. SQL is the standard language used to ask questions of structured data, for example to filter rows, count things, group results, or join two datasets together. Normally you need a database program to do that. q brings those same capabilities to plain files sitting on your hard drive.

You run it from a terminal by writing a SQL statement and pointing it at a file path instead of a database table name. It detects column names and data types automatically, so you do not need to define a schema. It also lets you pipe the output of other commands into it, so for instance you could take the list of running processes and query it with SQL to find out which user is running the most processes.

For larger files, q supports a caching feature that pre-processes the file and stores an index. The difference in speed can be dramatic: on a 4.8 GB file with five million rows, a query that takes nearly five minutes without caching takes under two seconds with caching enabled. Caching is turned off by default because it uses extra disk space, but can be enabled with a flag.

Beyond CSV files, q can also query SQLite database files directly, including joining across tables that live in different database files. SQLite is a common format for storing structured data in a single file.

The project is open source and free. Installation instructions for Windows, Mac, and Linux are available on the project website. The author donates Patreon proceeds to a domestic violence prevention center in Ramla, Israel.

Where it fits