1
1
mirror of https://github.com/harelba/q.git synced 2024-10-03 22:39:52 +03:00
q - Run SQL directly on delimited files and multi-file sqlite databases
Go to file
2021-10-23 23:07:53 +03:00
.github/workflows move back mac packaging to work with the master 2021-10-23 23:07:53 +03:00
bin Remove sqlitebck dependency + windows is working, but not in runner + other fixes (#284) 2021-10-23 22:43:29 +03:00
dist q version 3.1.0-beta - automatic immutable caching and direct queries on sqlite databases (#280) + New Packaging Workflow 2021-10-22 01:48:53 +03:00
doc man page and usage changes 2021-10-22 16:42:36 +03:00
examples Fix typo in some docs and script 2017-12-27 22:52:18 +09:00
mkdocs translate 'index.md' into 'index_cn.md' (#271) 2021-09-02 14:10:30 +03:00
test Remove sqlitebck dependency + windows is working, but not in runner + other fixes (#284) 2021-10-23 22:43:29 +03:00
.gitignore q version 3.1.0-beta - automatic immutable caching and direct queries on sqlite databases (#280) + New Packaging Workflow 2021-10-22 01:48:53 +03:00
benchmark-config.sh q version 3.1.0-beta - automatic immutable caching and direct queries on sqlite databases (#280) + New Packaging Workflow 2021-10-22 01:48:53 +03:00
conftest.py q version 3.1.0-beta - automatic immutable caching and direct queries on sqlite databases (#280) + New Packaging Workflow 2021-10-22 01:48:53 +03:00
LICENSE Create LICENSE 2017-08-12 13:07:52 +03:00
prepare-benchmark-env q version 3.1.0-beta - automatic immutable caching and direct queries on sqlite databases (#280) + New Packaging Workflow 2021-10-22 01:48:53 +03:00
pyoxidizer.bzl Remove sqlitebck dependency + windows is working, but not in runner + other fixes (#284) 2021-10-23 22:43:29 +03:00
pytest.ini q version 3.1.0-beta - automatic immutable caching and direct queries on sqlite databases (#280) + New Packaging Workflow 2021-10-22 01:48:53 +03:00
QSQL-NOTES.md Remove sqlitebck dependency + windows is working, but not in runner + other fixes (#284) 2021-10-23 22:43:29 +03:00
README.markdown Remove sqlitebck dependency + windows is working, but not in runner + other fixes (#284) 2021-10-23 22:43:29 +03:00
requirements.txt Remove sqlitebck dependency + windows is working, but not in runner + other fixes (#284) 2021-10-23 22:43:29 +03:00
run-benchmark q version 3.1.0-beta - automatic immutable caching and direct queries on sqlite databases (#280) + New Packaging Workflow 2021-10-22 01:48:53 +03:00
run-coverage.sh q version 3.1.0-beta - automatic immutable caching and direct queries on sqlite databases (#280) + New Packaging Workflow 2021-10-22 01:48:53 +03:00
run-tests.sh q version 3.1.0-beta - automatic immutable caching and direct queries on sqlite databases (#280) + New Packaging Workflow 2021-10-22 01:48:53 +03:00
setup.py Remove sqlitebck dependency + windows is working, but not in runner + other fixes (#284) 2021-10-23 22:43:29 +03:00
test-requirements.txt Remove sqlitebck dependency + windows is working, but not in runner + other fixes (#284) 2021-10-23 22:43:29 +03:00

Build Status

q - Text as Data

q is a command line tool that allows direct execution of SQL-like queries on CSVs/TSVs (and any other tabular text files).

q treats ordinary files as database tables, and supports all SQL constructs, such as WHERE, GROUP BY, JOINs, etc. It supports automatic column name and type detection, and q provides full support for multiple character encodings.

q's web site is http://harelba.github.io/q/ or https://q.textasdata.wiki It contains everything you need to download and use q immediately.

New beta version 3.1.0-beta is available

Full Details here, and an example of the caching is in here.

This is the list of new/changed functionality in this version, large changes, please make sure to read it and the details link as well.

  • Automatic Immutable Caching - Automatic caching of data files (into <my-csv-filename>.qsql files), with huge speedups for medium/large files. Enabled through -C readwrite or -C read
  • Direct querying of standard sqlite databases - Just use it as a table name in the query. Format is select ... from <sqlitedb_filename>:::<table_name>, or just <sqlitedb_filename> if the database contains only one table. Multiple separate sqlite databases are fully supported in the same query.
  • Direct querying of the qsql cache files - The user can query directly from the qsql files, removing the need for the original files. Just use select ... from <my-csv-filename>.qsql. Please wait until the non-beta version is out before thinking about deleting any of your original files...
  • Revamped .qrc mechanism - allows opting-in to caching without specifying it in every query. By default, caching is disabled, for backward compatibility and for finding usability issues.
  • Save-to-db is now reusable for queries - --save-db-to-disk option (-S) has been enhanced to match the new capabilities. You can query the resulting file directly through q, using the method mentioned above (it's just a standard sqlite database).
  • Only python3 is supported from now on - Shouldn't be an issue, since q is a self-contained binary executable which has its own python embedded in it. Internally, q is now packaged with Python 3.8. After everything cools down, I'll probably bump this to 3.9/3.10.
  • Minimal Linux Version Bumped - Works with CentOS 8, Ubuntu 18.04+, Debian 10+. Currently only for x86_64. Depends on glibc version 2.25+. Haven't tested it on other architectures. Issuing other architectures will be possible later on

Full details on the changes and the new usage is in here

The version is still in early testing, for two reasons:

  • Completely new build and packaging flow - Using pyoxidizer
  • It's a very large change in functionality, which might surface issues, new and backward compatibility ones

Please don't use it for production, until the final non-beta version is out

If you're testing it out, I'd be more than happy to get any feedback. Please write all your feedback in this issue, instead of opening separate issues. That would really help me with managing this.

Installation.

This will currently install the latest standard version 2.0.19. See below if you want to download the 3.1.0-beta version

The current production version 2.0.19 installation is extremely simple.

Instructions for all OSs are here.

Installation of the new beta release

For now, only Linux RPM, DEB, Mac OSX and Windows are supported. Packages for additional Linux Distros will be added later (it should be rather easy now, due to the use of fpm).

The beta OSX version is not in brew yet, you'll need to take the macos-q executable, put it in your filesystem and chmod +x it.

Note: For some reason showing the q manual (man q) does not work for Debian, even though it's packaged in the DEB file. I'll get around to fixing it later. If you have any thoughts about this, please drop me a line.

Download the relevant files directly from The Beta Release Assets.

Examples

q "SELECT COUNT(*) FROM ./clicks_file.csv WHERE c3 > 32.3"

ps -ef | q -H "SELECT UID, COUNT(*) cnt FROM - GROUP BY UID ORDER BY cnt DESC LIMIT 3"

Go here for more examples.

Benchmark

I have created a preliminary benchmark comparing q's speed between python2, python3, and comparing both to textql and octosql.

Your input about the validity of the benchmark and about the results would be greatly appreciated. More details are here.

Contact

Any feedback/suggestions/complaints regarding this tool would be much appreciated. Contributions are most welcome as well, of course.

Linkedin: Harel Ben Attia

Twitter @harelba

Email harelba@gmail.com

q on twitter: #qtextasdata