1
1
mirror of https://github.com/harelba/q.git synced 2024-10-03 22:39:52 +03:00
q - Run SQL directly on delimited files and multi-file sqlite databases
Go to file
2023-12-21 12:09:47 +02:00
.github Create FUNDING.yml 2022-08-06 08:54:36 +03:00
bin Use https for links (#293) 2022-01-24 19:44:04 +02:00
dist q version 3.1.0-beta - automatic immutable caching and direct queries on sqlite databases (#280) + New Packaging Workflow 2021-10-22 01:48:53 +03:00
doc Use https for links (#293) 2022-01-24 19:44:04 +02:00
examples Fix typo in some docs and script 2017-12-27 22:52:18 +09:00
mkdocs Fixing doc (#325) 2023-12-21 12:09:47 +02:00
test Added filename parsing functions 2022-01-22 17:55:47 +02:00
.gitignore q version 3.1.0-beta - automatic immutable caching and direct queries on sqlite databases (#280) + New Packaging Workflow 2021-10-22 01:48:53 +03:00
benchmark-config.sh Release 3.1.3 (#286) 2021-11-26 15:13:09 +02:00
conftest.py q version 3.1.0-beta - automatic immutable caching and direct queries on sqlite databases (#280) + New Packaging Workflow 2021-10-22 01:48:53 +03:00
LICENSE Create LICENSE 2017-08-12 13:07:52 +03:00
prepare-benchmark-env q version 3.1.0-beta - automatic immutable caching and direct queries on sqlite databases (#280) + New Packaging Workflow 2021-10-22 01:48:53 +03:00
pyoxidizer.bzl Generalize Versioning (#288) 2022-01-22 16:09:23 +02:00
pytest.ini q version 3.1.0-beta - automatic immutable caching and direct queries on sqlite databases (#280) + New Packaging Workflow 2021-10-22 01:48:53 +03:00
QSQL-NOTES.md Release 3.1.3 (#286) 2021-11-26 15:13:09 +02:00
README.markdown Use https for links (#293) 2022-01-24 19:44:04 +02:00
requirements.txt Remove sqlitebck dependency + windows is working, but not in runner + other fixes (#284) 2021-10-23 22:43:29 +03:00
run-benchmark q version 3.1.0-beta - automatic immutable caching and direct queries on sqlite databases (#280) + New Packaging Workflow 2021-10-22 01:48:53 +03:00
run-coverage.sh q version 3.1.0-beta - automatic immutable caching and direct queries on sqlite databases (#280) + New Packaging Workflow 2021-10-22 01:48:53 +03:00
run-tests.sh q version 3.1.0-beta - automatic immutable caching and direct queries on sqlite databases (#280) + New Packaging Workflow 2021-10-22 01:48:53 +03:00
setup.py replace to continue-on-error, so release will be created... + bump to 3.1.6 2021-11-26 17:19:40 +02:00
test-requirements.txt Remove sqlitebck dependency + windows is working, but not in runner + other fixes (#284) 2021-10-23 22:43:29 +03:00

Build and Package

q - Text as Data

q's purpose is to bring SQL expressive power to the Linux command line and to provide easy access to text as actual data.

q allows the following:

  • Performing SQL-like statements directly on tabular text data, auto-caching the data in order to accelerate additional querying on the same file.
  • Performing SQL statements directly on multi-file sqlite3 databases, without having to merge them or load them into memory

The following table shows the impact of using caching:

Rows Columns File Size Query time without caching Query time with caching Speed Improvement
5,000,000 100 4.8GB 4 minutes, 47 seconds 1.92 seconds x149
1,000,000 100 983MB 50.9 seconds 0.461 seconds x110
1,000,000 50 477MB 27.1 seconds 0.272 seconds x99
100,000 100 99MB 5.2 seconds 0.141 seconds x36
100,000 50 48MB 2.7 seconds 0.105 seconds x25

Notice that for the current version, caching is not enabled by default, since the caches take disk space. Use -C readwrite or -C read to enable it for a query, or add caching_mode to .qrc to set a new default.

q's web site is https://harelba.github.io/q/ or https://q.textasdata.wiki It contains everything you need to download and use q immediately.

Usage Examples

q treats ordinary files as database tables, and supports all SQL constructs, such as WHERE, GROUP BY, JOINs, etc. It supports automatic column name and type detection, and provides full support for multiple character encodings.

Here are some example commands to get the idea:

$ q "SELECT COUNT(*) FROM ./clicks_file.csv WHERE c3 > 32.3"

$ ps -ef | q -H "SELECT UID, COUNT(*) cnt FROM - GROUP BY UID ORDER BY cnt DESC LIMIT 3"

$ q "select count(*) from some_db.sqlite3:::albums a left join another_db.sqlite3:::tracks t on (a.album_id = t.album_id)"

Detailed examples are in here

Installation.

New Major Version 3.1.6 is out with a lot of significant additions.

Instructions for all OSs are here.

The previous version 2.0.19 Can still be downloaded from here

Contact

Any feedback/suggestions/complaints regarding this tool would be much appreciated. Contributions are most welcome as well, of course.

Linkedin: Harel Ben Attia

Twitter @harelba

Email harelba@gmail.com

q on twitter: #qtextasdata

Patreon: harelba - All the money received is donated to the Center for the Prevention and Treatment of Domestic Violence in my hometown - Ramla, Israel.