2023-04-22 06:28:07 +03:00
[//]: # ([![Go Coverage]( https://github.com/neilotoole/sq/wiki/coverage.svg) ]( https://raw.githack.com/wiki/neilotoole/sq/coverage.html) )
2023-04-22 06:27:29 +03:00
[![Go Reference ](https://pkg.go.dev/badge/github.com/neilotoole/sq.svg )](https://pkg.go.dev/github.com/neilotoole/sq)
2023-04-22 06:31:59 +03:00
![Main pipeline ](https://github.com/neilotoole/sq/actions/workflows/main.yml/badge.svg )
2023-05-19 17:24:18 +03:00
![Go Report Card ](https://goreportcard.com/report/github.com/neilotoole/sq )
2023-04-22 06:29:20 +03:00
2023-04-22 06:25:01 +03:00
2023-05-08 16:39:47 +03:00
# sq data wrangler
2016-10-17 07:14:01 +03:00
2023-05-08 16:47:16 +03:00
`sq` is a command line tool that provides jq-style access to
2022-12-25 07:04:18 +03:00
structured data sources: SQL databases, or document formats like CSV or Excel.
2021-01-02 09:31:30 +03:00
2023-05-26 06:58:43 +03:00
![sq ](.images/splash.png )
2023-03-16 05:33:45 +03:00
2023-05-08 16:39:47 +03:00
`sq` executes jq-like [queries ](https://sq.io/docs/query ), or database-native [SQL ](https://sq.io/docs/cmd/sql/ ).
It can perform cross-source [joins ](https://sq.io/docs/query/#cross-source-joins ).
2023-05-08 16:47:16 +03:00
`sq` outputs to a multitude of [formats ](https://sq.io/docs/output#formats )
including [JSON ](https://sq.io/docs/output#json ),
2023-05-08 16:39:47 +03:00
[Excel ](https://sq.io/docs/output#xlsx ), [CSV ](https://sq.io/docs/output#csv ),
[HTML ](https://sq.io/docs/output#html ), [Markdown ](https://sq.io/docs/output#markdown )
and [XML ](https://sq.io/docs/output#xml ), and can [insert ](https://sq.io/docs/output#insert ) query
results directly to a SQL database.
`sq` can also [inspect ](https://sq.io/docs/cmd/inspect ) sources to view metadata about the source structure (tables,
columns, size) and has commands for common database operations to
[copy ](https://sq.io/docs/cmd/tbl-copy ), [truncate ](https://sq.io/docs/cmd/tbl-truncate ),
and [drop ](https://sq.io/docs/cmd/tbl-drop ) tables.
2020-08-06 21:37:33 +03:00
2023-03-12 06:25:19 +03:00
Find out more at [sq.io ](https://sq.io ).
2016-10-17 07:14:01 +03:00
2023-03-12 06:25:19 +03:00
## Install
2021-01-04 07:46:25 +03:00
2021-01-04 03:40:32 +03:00
### macOS
2016-10-17 07:14:01 +03:00
2022-11-20 09:43:18 +03:00
```shell
2021-03-07 21:12:10 +03:00
brew install neilotoole/sq/sq
2021-01-04 03:40:32 +03:00
```
### Linux
2022-11-20 09:43:18 +03:00
```shell
2022-12-25 07:04:18 +03:00
/bin/sh -c "$(curl -fsSL https://sq.io/install.sh)"
2020-08-06 20:58:47 +03:00
```
2016-10-17 07:14:01 +03:00
2023-03-12 06:25:19 +03:00
### Windows
2022-11-20 09:43:18 +03:00
2023-03-12 06:25:19 +03:00
```shell
scoop bucket add sq https://github.com/neilotoole/sq
scoop install sq
```
2021-01-04 03:40:32 +03:00
2023-03-12 06:25:19 +03:00
### Go
2021-02-22 10:37:00 +03:00
2023-03-12 06:25:19 +03:00
```shell
go install github.com/neilotoole/sq
```
2021-02-22 10:37:00 +03:00
2023-03-12 06:25:19 +03:00
See other [install options ](https://sq.io/docs/install/ ).
2021-01-04 07:46:25 +03:00
2023-05-08 16:39:47 +03:00
## Overview
2021-01-04 03:40:32 +03:00
2023-03-12 06:25:19 +03:00
Use `sq help` to see command help. Docs are over at [sq.io ](https://sq.io ).
2023-05-08 16:39:47 +03:00
Read the [overview ](https://sq.io/docs/overview/ ), and
2023-03-12 06:25:19 +03:00
[tutorial ](https://sq.io/docs/tutorial/ ). The [cookbook ](https://sq.io/docs/cookbook/ ) has
2023-05-08 16:39:47 +03:00
recipes for common tasks, and the [query guide ](https://sq.io/docs/query ) covers `sq` 's query language.
2021-01-04 03:40:32 +03:00
2022-12-25 07:04:18 +03:00
The major concept is: `sq` operates on data sources, which are treated as SQL databases (even if the
2023-03-12 06:25:19 +03:00
source is really a CSV or XLSX file etc.).
2021-01-04 03:40:32 +03:00
2023-05-08 16:39:47 +03:00
In a nutshell, you [`sq add` ](https://sq.io/docs/cmd/add ) a source (giving it a [`handle` ](https://sq.io/docs/concepts#handle )), and then execute commands against the
2022-12-25 07:04:18 +03:00
source.
2021-01-04 03:40:32 +03:00
### Sources
2023-05-08 16:39:47 +03:00
Initially there are no [sources ](https://sq.io/docs/source ).
2021-01-04 03:40:32 +03:00
2022-12-25 07:04:18 +03:00
```shell
2021-01-04 03:40:32 +03:00
$ sq ls
```
2023-05-08 16:39:47 +03:00
Let's [add ](https://sq.io/docs/cmd/add ) a source. First we'll add a [SQLite ](https://sq.io/docs/drivers/sqlite )
database, but this could also be [Postgres ](https://sq.io/docs/drivers/postgres ),
[SQL Server ](https://sq.io/docs/drivers/sqlserver ), [Excel ](https://sq.io/docs/drivers/xlsx ), etc.
Download the sample DB, and `sq add` the source.
2021-01-04 03:40:32 +03:00
2022-12-25 07:04:18 +03:00
```shell
2021-01-04 03:40:32 +03:00
$ wget https://sq.io/testdata/sakila.db
2023-05-08 16:39:47 +03:00
$ sq add ./sakila.db
@sakila sqlite3 sakila.db
2021-01-04 03:40:32 +03:00
$ sq ls -v
2023-05-08 16:39:47 +03:00
HANDLE ACTIVE DRIVER LOCATION OPTIONS
@sakila active sqlite3 sqlite3:///Users/demo/sakila.db
2021-01-04 03:40:32 +03:00
2023-05-08 16:39:47 +03:00
$ sq ping @sakila
@sakila 1ms pong
2021-01-04 03:40:32 +03:00
$ sq src
2023-05-08 16:39:47 +03:00
@sakila sqlite3 sakila.db
2021-01-04 03:40:32 +03:00
```
2023-05-08 16:39:47 +03:00
The [`sq ping` ](https://sq.io/docs/cmd/ping ) command simply pings the source
to verify that it's available.
2021-01-04 03:40:32 +03:00
2023-05-08 16:39:47 +03:00
[`sq src` ](https://sq.io/docs/cmd/src ) lists the [_active source_ ](https://sq.io/docs/source#active-source ), which in our
case is `@sakila` .
2021-02-27 17:44:08 +03:00
You can change the active source using `sq src @other_src` .
When there's an active source specified, you can usually omit the handle from `sq` commands.
Thus you could instead do:
2021-01-04 03:40:32 +03:00
2022-12-25 07:04:18 +03:00
```shell
2021-01-04 03:40:32 +03:00
$ sq ping
2023-05-08 16:39:47 +03:00
@sakila 1ms pong
2021-01-04 03:40:32 +03:00
```
### Query
2023-05-08 16:39:47 +03:00
Fundamentally, `sq` is for querying data. The jq-style syntax is covered in
detail in the [query guide ](https://sq.io/docs/query ).
2021-01-04 03:40:32 +03:00
2022-12-25 07:04:18 +03:00
```shell
2021-01-04 03:40:32 +03:00
$ sq '.actor | .actor_id < 100 | . [ 0:3 ] '
actor_id first_name last_name last_update
1 PENELOPE GUINESS 2020-02-15T06:59:28Z
2 NICK WAHLBERG 2020-02-15T06:59:28Z
3 ED CHASE 2020-02-15T06:59:28Z
```
2021-02-27 17:44:08 +03:00
The above query selected some rows from the `actor` table. You could also
2023-05-08 16:39:47 +03:00
use [native SQL ](https://sq.io/docs/cmd/sql ), e.g.:
2021-01-04 03:40:32 +03:00
2022-12-25 07:04:18 +03:00
```shell
2021-01-04 03:40:32 +03:00
$ sq sql 'SELECT * FROM actor WHERE actor_id < 100 LIMIT 3 '
actor_id first_name last_name last_update
1 PENELOPE GUINESS 2020-02-15T06:59:28Z
2 NICK WAHLBERG 2020-02-15T06:59:28Z
3 ED CHASE 2020-02-15T06:59:28Z
```
But we're flying a bit blind here: how did we know about the `actor` table?
### Inspect
2023-05-08 16:39:47 +03:00
[`sq inspect` ](https://sq.io/docs/cmd/inspect ) is your friend (output abbreviated):
2021-01-04 03:40:32 +03:00
2022-12-25 07:04:18 +03:00
```shell
2023-05-08 16:39:47 +03:00
$ sq inspect
HANDLE DRIVER NAME FQ NAME SIZE TABLES LOCATION
@sakila sqlite3 sakila.db sakila.db/main 5.6MB 21 sqlite3:///Users/demo/sakila.db
2022-12-17 06:46:37 +03:00
TABLE ROWS COL NAMES
actor 200 actor_id, first_name, last_name, last_update
address 603 address_id, address, address2, district, city_id, postal_code, phone, last_update
category 16 category_id, name, last_update
2021-01-04 03:40:32 +03:00
```
2023-05-08 16:39:47 +03:00
Use [`sq inspect -v` ](https://sq.io/docs/output#verbose ) to see more detail.
Or use [`-j` ](https://sq.io/docs/output#json ) to get JSON output:
2021-01-04 03:40:32 +03:00
2023-05-08 16:39:47 +03:00
![sq inspect -j ](https://sq.io/images/sq_inspect_sakila_sqlite_json.png )
2021-01-04 03:40:32 +03:00
2023-05-08 16:39:47 +03:00
Combine `sq inspect` with [jq ](https://stedolan.github.io/jq/ ) for some useful capabilities.
Here's how to [list ](https://sq.io/docs/cookbook/#list-table-names )
2022-12-25 07:04:18 +03:00
all the table names in the active source:
2021-01-04 03:40:32 +03:00
2022-12-25 07:04:18 +03:00
```shell
2021-01-04 03:40:32 +03:00
$ sq inspect -j | jq -r '.tables[] | .name'
actor
address
category
city
country
customer
[...]
```
2022-12-25 07:04:18 +03:00
And here's how you
2023-03-15 08:40:18 +03:00
could [export ](https://sq.io/docs/cookbook/#export-all-table-data-to-csv ) each table
2022-12-25 07:04:18 +03:00
to a CSV file:
2021-01-04 03:40:32 +03:00
2022-12-25 07:04:18 +03:00
```shell
2021-01-04 03:40:32 +03:00
$ sq inspect -j | jq -r '.tables[] | .name' | xargs -I % sq .% --csv --output %.csv
$ ls
actor.csv city.csv customer_list.csv film_category.csv inventory.csv rental.csv staff.csv
address.csv country.csv film.csv film_list.csv language.csv sales_by_film_category.csv staff_list.csv
category.csv customer.csv film_actor.csv film_text.csv payment.csv sales_by_store.csv store.csv
```
Note that you can also inspect an individual table:
2022-12-25 07:04:18 +03:00
```shell
2023-05-26 06:58:43 +03:00
$ sq inspect @sakila .actor
TABLE ROWS TYPE SIZE NUM COLS COL NAMES
actor 200 table - 4 actor_id, first_name, last_name, last_update
2021-01-04 03:40:32 +03:00
```
2023-05-26 06:58:43 +03:00
### Diff
Use [`sq diff` ](https://sq.io/docs/diff ) to compare source metadata, or row data.
![sq diff ](.images/sq_diff_table_data.png )
2023-05-08 16:39:47 +03:00
### Insert query results
2021-01-04 07:41:36 +03:00
2023-05-08 16:39:47 +03:00
`sq` query results can be [output ](https://sq.io/docs/output ) in various formats
(JSON, XML, CSV, etc), and can also be "outputted" as an
[*insert* ](https://sq.io/docs/output#insert ) into database sources.
2021-01-04 07:41:36 +03:00
2023-05-08 16:39:47 +03:00
That is, you can use `sq` to insert results from a Postgres query into a MySQL table,
or copy an Excel worksheet into a SQLite table, or a push a CSV file into
a SQL Server table etc.
2021-01-04 07:41:36 +03:00
2023-05-08 16:39:47 +03:00
> **Note:** If you want to copy a table inside the same (database) source,
> use [`sq tbl copy`](https://sq.io/docs/cmd/tbl-copy) instead, which uses the database's native table copy functionality.
2021-01-04 07:41:36 +03:00
2023-05-08 16:39:47 +03:00
For this example, we'll insert an Excel worksheet into our `@sakila`
SQLite database. First, we
2022-12-25 07:04:18 +03:00
download the XLSX file, and `sq add` it as a source.
2021-01-04 07:41:36 +03:00
2022-12-25 07:04:18 +03:00
```shell
2021-01-04 07:41:36 +03:00
$ wget https://sq.io/testdata/xl_demo.xlsx
2023-05-08 16:39:47 +03:00
$ sq add ./xl_demo.xlsx --ingest.header=true
@xl_demo xlsx xl_demo.xlsx
2021-01-04 07:41:36 +03:00
2023-05-08 16:39:47 +03:00
$ sq @xl_demo .person
2021-01-04 07:41:36 +03:00
uid username email address_id
1 neilotoole neilotoole@apache.org 1
2 ksoze kaiser@soze.org 2
3 kubla kubla@khan.mn NULL
[...]
```
2023-05-08 16:39:47 +03:00
Now, execute the same query, but this time `sq` inserts the results into a new
table (`person`)
in the SQLite `@sakila` source:
2021-01-04 07:41:36 +03:00
2021-01-04 07:44:09 +03:00
```shell
2023-05-08 16:39:47 +03:00
$ sq @xl_demo .person --insert @sakila .person
Inserted 7 rows into @sakila .person
2021-01-04 07:41:36 +03:00
2023-05-08 16:39:47 +03:00
$ sq inspect @sakila .person
TABLE ROWS COL NAMES
person 7 uid, username, email, address_id
2021-01-04 07:41:36 +03:00
2023-05-08 16:39:47 +03:00
$ sq @sakila .person
2021-01-04 07:44:09 +03:00
uid username email address_id
1 neilotoole neilotoole@apache.org 1
2 ksoze kaiser@soze.org 2
3 kubla kubla@khan.mn NULL
[...]
```
2021-01-04 07:41:36 +03:00
2023-05-08 16:39:47 +03:00
### Cross-source join
2021-01-04 03:40:32 +03:00
2023-05-08 16:39:47 +03:00
`sq` has rudimentary support for cross-source [joins ](https://sq.io/docs/query#join ). That is, you can join an Excel worksheet with a
2022-12-25 07:04:18 +03:00
CSV file, or Postgres table, etc.
2021-01-04 03:40:32 +03:00
2023-03-12 06:25:19 +03:00
See the [tutorial ](https://sq.io/docs/tutorial/#join ) for further details, but
2022-12-25 07:04:18 +03:00
given an Excel source `@xl_demo` and a CSV source `@csv_demo` , you can do:
2021-01-04 03:40:32 +03:00
2022-12-25 07:04:18 +03:00
```shell
2021-01-04 03:40:32 +03:00
$ sq '@csv_demo.data, @xl_demo .address | join(.D == .address_id) | .C, .city'
C city
neilotoole@apache.org Washington
kaiser@soze.org Ulan Bator
nikola@tesla.rs Washington
augustus@caesar.org Ulan Bator
plato@athens.gr Washington
```
2023-05-08 16:39:47 +03:00
### Table commands
2021-01-04 03:40:32 +03:00
2023-05-08 16:39:47 +03:00
`sq` provides several handy commands for working with tables:
[`tbl copy` ](/docs/cmd/tbl-copy ), [`tbl truncate` ](/docs/cmd/tbl-truncate )
and [`tbl drop` ](/docs/cmd/tbl-drop ).
Note that these commands work directly
2022-12-25 07:04:18 +03:00
against SQL database sources, using their native SQL commands.
2021-01-04 03:40:32 +03:00
2022-12-25 07:04:18 +03:00
```shell
2021-01-04 03:40:32 +03:00
$ sq tbl copy .actor .actor_copy
2023-05-08 16:39:47 +03:00
Copied table: @sakila .actor --> @sakila .actor_copy (200 rows copied)
2021-01-04 03:40:32 +03:00
$ sq tbl truncate .actor_copy
2023-05-08 16:39:47 +03:00
Truncated 200 rows from @sakila .actor_copy
2021-01-04 03:40:32 +03:00
$ sq tbl drop .actor_copy
2023-05-08 16:39:47 +03:00
Dropped table @sakila .actor_copy
2016-10-17 07:14:01 +03:00
```
2016-10-21 19:14:48 +03:00
2023-05-08 16:39:47 +03:00
### UNIX pipes
2021-01-04 03:40:32 +03:00
2023-05-08 16:39:47 +03:00
For file-based sources (such as CSV or XLSX), you can `sq add` the source file,
but you can also pipe it:
2021-01-04 03:40:32 +03:00
2021-01-04 10:39:15 +03:00
```shell
2021-01-04 10:39:43 +03:00
$ cat ./example.xlsx | sq .Sheet1
2021-01-04 10:39:15 +03:00
```
2021-01-04 10:40:05 +03:00
Similarly, you can inspect:
2021-01-04 10:39:15 +03:00
```shell
2021-01-04 10:39:43 +03:00
$ cat ./example.xlsx | sq inspect
2021-01-04 10:39:15 +03:00
```
2021-01-04 03:40:32 +03:00
2023-05-08 16:39:47 +03:00
## Drivers
2021-01-04 03:40:32 +03:00
2023-05-08 16:39:47 +03:00
`sq` knows how to deal with a data source type via a [driver ](https://sq.io/docs/drivers )
implementation. To view the installed/supported drivers:
2022-12-25 07:04:18 +03:00
```shell
$ sq driver ls
DRIVER DESCRIPTION
sqlite3 SQLite
postgres PostgreSQL
sqlserver Microsoft SQL Server / Azure SQL Edge
mysql MySQL
csv Comma-Separated Values
tsv Tab-Separated Values
json JSON
jsona JSON Array: LF-delimited JSON arrays
jsonl JSON Lines: LF-delimited JSON objects
xlsx Microsoft Excel XLSX
```
2021-01-04 07:44:09 +03:00
2023-05-08 16:39:47 +03:00
## Output formats
2022-12-25 07:04:18 +03:00
2023-05-08 16:39:47 +03:00
`sq` has many [output formats ](https://sq.io/docs/output ):
2021-01-04 03:40:32 +03:00
2023-05-08 16:39:47 +03:00
- `--text` : [Text ](https://sq.io/docs/output#text )
- `--json` : [JSON ](https://sq.io/docs/output#json )
- `--jsona` : [JSON Array ](https://sq.io/docs/output#jsona )
- `--jsonl` : [JSON Lines ](https://sq.io/docs/output#jsonl )
- `--csv` / `--tsv` : [CSV ](https://sq.io/docs/output#csv ) / [TSV ](https://sq.io/docs/output#tsv )
- `--xlsx` : [XLSX ](https://sq.io/docs/output#xlsx ) (Microsoft Excel)
- `--html` : [HTML ](https://sq.io/docs/output#html )
- `--xml` : [XML ](https://sq.io/docs/output#xml )
2023-05-22 18:08:14 +03:00
- `--yaml` : [YAML ](https://sq.io/docs/output#yaml )
2023-05-08 16:39:47 +03:00
- `--markdown` : [Markdown ](https://sq.io/docs/output#markdown )
- `--raw` : [Raw ](https://sq.io/docs/output#raw ) (bytes)
2021-01-04 07:44:09 +03:00
2023-05-08 16:39:47 +03:00
## CHANGELOG
2022-12-23 19:32:07 +03:00
See [CHANGELOG.md ](./CHANGELOG.md ).
2016-10-21 19:14:48 +03:00
2020-08-06 20:58:47 +03:00
## Acknowledgements
2016-10-21 19:14:48 +03:00
2023-03-12 06:25:19 +03:00
- Thanks to [Diego Souza ](https://github.com/diegosouza ) for creating
the [Arch Linux package ](https://aur.archlinux.org/packages/sq-bin ).
2020-08-06 21:37:33 +03:00
- Much inspiration is owed to [jq ](https://stedolan.github.io/jq/ ).
2022-12-25 07:04:18 +03:00
- See [`go.mod` ](https://github.com/neilotoole/sq/blob/master/go.mod ) for a list of third-party
packages.
2020-08-06 21:37:33 +03:00
- Additionally, `sq` incorporates modified versions of:
2023-03-12 06:25:19 +03:00
- [`olekukonko/tablewriter` ](https://github.com/olekukonko/tablewriter )
- [`segmentio/encoding` ](https://github.com/segmentio/encoding ) for JSON encoding.
2022-12-25 07:04:18 +03:00
- The [_Sakila_ ](https://dev.mysql.com/doc/sakila/en/ ) example databases were lifted
from [jOOQ ](https://github.com/jooq/jooq ), which in turn owe their heritage to earlier work on
Sakila.
2023-05-08 16:39:47 +03:00
- Date rendering via [`ncruces/go-strftime` ](https://github.com/ncruces/go-strftime ).
2021-01-04 03:40:32 +03:00
2023-05-08 16:39:47 +03:00
## Similar, related, or noteworthy projects
2021-01-04 03:40:32 +03:00
- [usql ](https://github.com/xo/usql )
- [textql ](https://github.com/dinedal/textql )
- [golang-migrate ](https://github.com/golang-migrate/migrate )
- [octosql ](https://github.com/cube2222/octosql )
- [rq ](https://github.com/dflemstr/rq )