sq/README.md

329 lines
11 KiB
Markdown
Raw Normal View History

2023-04-22 06:28:07 +03:00
[//]: # ([![Go Coverage](https://github.com/neilotoole/sq/wiki/coverage.svg)](https://raw.githack.com/wiki/neilotoole/sq/coverage.html))
2023-04-22 06:27:29 +03:00
[![Go Reference](https://pkg.go.dev/badge/github.com/neilotoole/sq.svg)](https://pkg.go.dev/github.com/neilotoole/sq)
2023-06-15 17:43:18 +03:00
[![Go Report Card](https://goreportcard.com/badge/neilotoole/sq)](https://goreportcard.com/report/neilotoole/sq)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/neilotoole/sq/blob/master/LICENSE)
2023-04-22 06:31:59 +03:00
![Main pipeline](https://github.com/neilotoole/sq/actions/workflows/main.yml/badge.svg)
2023-04-22 06:29:20 +03:00
# sq data wrangler
2016-10-17 07:14:01 +03:00
2023-05-08 16:47:16 +03:00
`sq` is a command line tool that provides jq-style access to
structured data sources: SQL databases, or document formats like CSV or Excel.
2021-01-02 09:31:30 +03:00
![sq](.images/splash.png)
2023-03-16 05:33:45 +03:00
`sq` executes jq-like [queries](https://sq.io/docs/query), or database-native [SQL](https://sq.io/docs/cmd/sql/).
2023-08-23 15:44:44 +03:00
It can [join](https://sq.io/docs/query#cross-source-joins) across sources: join a CSV file to a Postgres table, or
MySQL with Excel.
2023-05-08 16:47:16 +03:00
`sq` outputs to a multitude of [formats](https://sq.io/docs/output#formats)
including [JSON](https://sq.io/docs/output#json),
[Excel](https://sq.io/docs/output#xlsx), [CSV](https://sq.io/docs/output#csv),
[HTML](https://sq.io/docs/output#html), [Markdown](https://sq.io/docs/output#markdown)
and [XML](https://sq.io/docs/output#xml), and can [insert](https://sq.io/docs/output#insert) query
results directly to a SQL database.
2023-06-22 20:34:35 +03:00
`sq` can also [inspect](https://sq.io/docs/inspect) sources to view metadata about the source structure (tables,
columns, size). You can use [`sq diff`](https://sq.io/docs/diff) to compare tables, or
entire databases. `sq` has commands for common database operations to
[copy](https://sq.io/docs/cmd/tbl-copy), [truncate](https://sq.io/docs/cmd/tbl-truncate),
and [drop](https://sq.io/docs/cmd/tbl-drop) tables.
2023-03-12 06:25:19 +03:00
Find out more at [sq.io](https://sq.io).
2016-10-17 07:14:01 +03:00
2023-03-12 06:25:19 +03:00
## Install
2021-01-04 07:46:25 +03:00
2021-01-04 03:40:32 +03:00
### macOS
2016-10-17 07:14:01 +03:00
2022-11-20 09:43:18 +03:00
```shell
2021-03-07 21:12:10 +03:00
brew install neilotoole/sq/sq
2021-01-04 03:40:32 +03:00
```
### Linux
2022-11-20 09:43:18 +03:00
```shell
/bin/sh -c "$(curl -fsSL https://sq.io/install.sh)"
2020-08-06 20:58:47 +03:00
```
2016-10-17 07:14:01 +03:00
2023-03-12 06:25:19 +03:00
### Windows
2022-11-20 09:43:18 +03:00
2023-03-12 06:25:19 +03:00
```shell
scoop bucket add sq https://github.com/neilotoole/sq
scoop install sq
```
2021-01-04 03:40:32 +03:00
2023-03-12 06:25:19 +03:00
### Go
2023-03-12 06:25:19 +03:00
```shell
go install github.com/neilotoole/sq
```
2023-03-12 06:25:19 +03:00
See other [install options](https://sq.io/docs/install/).
2021-01-04 07:46:25 +03:00
## Overview
2021-01-04 03:40:32 +03:00
2023-03-12 06:25:19 +03:00
Use `sq help` to see command help. Docs are over at [sq.io](https://sq.io).
Read the [overview](https://sq.io/docs/overview/), and
2023-03-12 06:25:19 +03:00
[tutorial](https://sq.io/docs/tutorial/). The [cookbook](https://sq.io/docs/cookbook/) has
recipes for common tasks, and the [query guide](https://sq.io/docs/query) covers `sq`'s query language.
2021-01-04 03:40:32 +03:00
The major concept is: `sq` operates on data sources, which are treated as SQL databases (even if the
2023-03-12 06:25:19 +03:00
source is really a CSV or XLSX file etc.).
2021-01-04 03:40:32 +03:00
In a nutshell, you [`sq add`](https://sq.io/docs/cmd/add) a source (giving it a [`handle`](https://sq.io/docs/concepts#handle)), and then execute commands against the
source.
2021-01-04 03:40:32 +03:00
### Sources
Initially there are no [sources](https://sq.io/docs/source).
2021-01-04 03:40:32 +03:00
```shell
2021-01-04 03:40:32 +03:00
$ sq ls
```
Let's [add](https://sq.io/docs/cmd/add) a source. First we'll add a [SQLite](https://sq.io/docs/drivers/sqlite)
database, but this could also be [Postgres](https://sq.io/docs/drivers/postgres),
[SQL Server](https://sq.io/docs/drivers/sqlserver), [Excel](https://sq.io/docs/drivers/xlsx), etc.
Download the sample DB, and `sq add` the source.
2021-01-04 03:40:32 +03:00
```shell
2021-01-04 03:40:32 +03:00
$ wget https://sq.io/testdata/sakila.db
$ sq add ./sakila.db
@sakila sqlite3 sakila.db
2021-01-04 03:40:32 +03:00
$ sq ls -v
HANDLE ACTIVE DRIVER LOCATION OPTIONS
@sakila active sqlite3 sqlite3:///Users/demo/sakila.db
2021-01-04 03:40:32 +03:00
$ sq ping @sakila
@sakila 1ms pong
2021-01-04 03:40:32 +03:00
$ sq src
@sakila sqlite3 sakila.db
2021-01-04 03:40:32 +03:00
```
The [`sq ping`](https://sq.io/docs/cmd/ping) command simply pings the source
to verify that it's available.
2021-01-04 03:40:32 +03:00
[`sq src`](https://sq.io/docs/cmd/src) lists the [_active source_](https://sq.io/docs/source#active-source), which in our
case is `@sakila`.
You can change the active source using `sq src @other_src`.
When there's an active source specified, you can usually omit the handle from `sq` commands.
Thus you could instead do:
2021-01-04 03:40:32 +03:00
```shell
2021-01-04 03:40:32 +03:00
$ sq ping
@sakila 1ms pong
2021-01-04 03:40:32 +03:00
```
### Query
Fundamentally, `sq` is for querying data. The jq-style syntax is covered in
detail in the [query guide](https://sq.io/docs/query).
2021-01-04 03:40:32 +03:00
![sq query where slq](./.images/sq_query_where_slq.png)
2021-01-04 03:40:32 +03:00
The above query selected some rows from the `actor` table. You could also
use [native SQL](https://sq.io/docs/cmd/sql), e.g.:
2021-01-04 03:40:32 +03:00
![sq query where sql](./.images/sq_query_where_sql.png)
2021-01-04 03:40:32 +03:00
But we're flying a bit blind here: how did we know about the `actor` table?
### Inspect
[`sq inspect`](https://sq.io/docs/inspect) is your friend.
2021-01-04 03:40:32 +03:00
![sq inspect](./.images/sq_inspect_source_text.png)
2021-01-04 03:40:32 +03:00
2023-06-22 18:56:21 +03:00
Use [`sq inspect -v`](https://sq.io/docs/cmd/inspect) to see more detail.
Or use [`-j`](https://sq.io/docs/output#json) to get JSON output:
2021-01-04 03:40:32 +03:00
![sq inspect -j](./.images/sq_inspect_sakila_sqlite_json.png)
2021-01-04 03:40:32 +03:00
2023-06-22 18:56:21 +03:00
Combine `sq inspect` with [jq](https://jqlang.github.io/jq/) for some useful capabilities.
2023-08-23 15:44:44 +03:00
Here's how to [list](https://sq.io/docs/cookbook#list-table-names)
all the table names in the active source:
2021-01-04 03:40:32 +03:00
```shell
2021-01-04 03:40:32 +03:00
$ sq inspect -j | jq -r '.tables[] | .name'
actor
address
category
city
country
customer
[...]
```
And here's how you
2023-08-23 15:44:44 +03:00
could [export](https://sq.io/docs/cookbook#export-all-table-data-to-csv) each table
to a CSV file:
2021-01-04 03:40:32 +03:00
```shell
2021-01-04 03:40:32 +03:00
$ sq inspect -j | jq -r '.tables[] | .name' | xargs -I % sq .% --csv --output %.csv
$ ls
actor.csv city.csv customer_list.csv film_category.csv inventory.csv rental.csv staff.csv
address.csv country.csv film.csv film_list.csv language.csv sales_by_film_category.csv staff_list.csv
category.csv customer.csv film_actor.csv film_text.csv payment.csv sales_by_store.csv store.csv
```
Note that you can also inspect an individual table:
![sq inspect actor verbose](./.images/sq_inspect_actor_verbose.png)
Read more about [`sq inspect`](https://sq.io/docs/inspect).
2021-01-04 03:40:32 +03:00
### Diff
2023-08-23 15:44:44 +03:00
Use [`sq diff`](https://sq.io/docs/diff) to compare metadata, or row data, for sources, or individual tables.
The default behavior is to diff table schema and row counts. Table row data is not compared in this mode.
![sq diff](.images/sq_diff_src_default.png)
2023-08-23 16:34:31 +03:00
Use [`--data`](https://sq.io/docs/diff#--data) to compare row data.
2023-08-23 15:44:44 +03:00
![sq diff data](.images/sq_diff_table_data.png)
There are many more options available. See the [diff docs](https://sq.io/docs/diff).
### Insert query results
2021-01-04 07:41:36 +03:00
`sq` query results can be [output](https://sq.io/docs/output) in various formats
([`text`](https://sq.io/docs/output#text),
[`json`](https://sq.io/docs/output#json),
[`csv`](https://sq.io/docs/output#csv), etc.). Those results can also be "outputted"
as an [*insert*](https://sq.io/docs/output#insert) into a database table.
2021-01-04 07:41:36 +03:00
That is, you can use `sq` to insert results from a Postgres query into a MySQL table,
or copy an Excel worksheet into a SQLite table, or a push a CSV file into
a SQL Server table etc.
2021-01-04 07:41:36 +03:00
> **Note:** If you want to copy a table inside the same (database) source,
> use [`sq tbl copy`](https://sq.io/docs/cmd/tbl-copy) instead, which uses the database's native table copy functionality.
2021-01-04 07:41:36 +03:00
Here we query a CSV file, and insert the results into a Postgres table.
2021-01-04 07:41:36 +03:00
![sq query insert inspect](./.images/sq_query_insert_inspect.png)
2021-01-04 07:41:36 +03:00
### Cross-source joins
2021-01-04 07:41:36 +03:00
`sq` can perform the usual [joins](https://sq.io/docs/query#joins). Here's how you would
join tables `actor`, `film_actor`, and `film`:
2021-01-04 07:41:36 +03:00
2021-01-04 07:44:09 +03:00
```shell
$ sq '.actor | join(.film_actor, .actor_id) | join(.film, .film_id) | .first_name, .last_name, .title'
2021-01-04 07:44:09 +03:00
```
2021-01-04 07:41:36 +03:00
But `sq` can also join across data sources. That is, you can join an Excel worksheet with a
Postgres table, or join a CSV file with MySQL, and so on.
2021-01-04 03:40:32 +03:00
This example joins a Postgres database, an Excel worksheet, and a CSV file.
2021-01-04 03:40:32 +03:00
![sq join multi source](./.images/sq_join_multi_source.png)
2021-01-04 03:40:32 +03:00
2023-07-03 19:18:02 +03:00
Read more about cross-source joins in the [query guide](https://sq.io/docs/query#joins).
2021-01-04 03:40:32 +03:00
### Table commands
2021-01-04 03:40:32 +03:00
`sq` provides several handy commands for working with tables:
[`tbl copy`](/docs/cmd/tbl-copy), [`tbl truncate`](/docs/cmd/tbl-truncate)
and [`tbl drop`](/docs/cmd/tbl-drop).
Note that these commands work directly
against SQL database sources, using their native SQL commands.
2021-01-04 03:40:32 +03:00
```shell
2021-01-04 03:40:32 +03:00
$ sq tbl copy .actor .actor_copy
Copied table: @sakila.actor --> @sakila.actor_copy (200 rows copied)
2021-01-04 03:40:32 +03:00
$ sq tbl truncate .actor_copy
Truncated 200 rows from @sakila.actor_copy
2021-01-04 03:40:32 +03:00
$ sq tbl drop .actor_copy
Dropped table @sakila.actor_copy
2016-10-17 07:14:01 +03:00
```
2016-10-21 19:14:48 +03:00
### UNIX pipes
2021-01-04 03:40:32 +03:00
For file-based sources (such as CSV or XLSX), you can `sq add` the source file,
but you can also pipe it:
2021-01-04 03:40:32 +03:00
2021-01-04 10:39:15 +03:00
```shell
2021-01-04 10:39:43 +03:00
$ cat ./example.xlsx | sq .Sheet1
2021-01-04 10:39:15 +03:00
```
2021-01-04 10:40:05 +03:00
Similarly, you can inspect:
2021-01-04 10:39:15 +03:00
```shell
2021-01-04 10:39:43 +03:00
$ cat ./example.xlsx | sq inspect
2021-01-04 10:39:15 +03:00
```
2021-01-04 03:40:32 +03:00
## Drivers
2021-01-04 03:40:32 +03:00
`sq` knows how to deal with a data source type via a [driver](https://sq.io/docs/drivers)
implementation. To view the installed/supported drivers:
```shell
$ sq driver ls
DRIVER DESCRIPTION
sqlite3 SQLite
postgres PostgreSQL
sqlserver Microsoft SQL Server / Azure SQL Edge
mysql MySQL
csv Comma-Separated Values
tsv Tab-Separated Values
json JSON
jsona JSON Array: LF-delimited JSON arrays
jsonl JSON Lines: LF-delimited JSON objects
xlsx Microsoft Excel XLSX
```
2021-01-04 07:44:09 +03:00
## Output formats
`sq` has many [output formats](https://sq.io/docs/output):
2021-01-04 03:40:32 +03:00
- `--text`: [Text](https://sq.io/docs/output#text)
- `--json`: [JSON](https://sq.io/docs/output#json)
- `--jsona`: [JSON Array](https://sq.io/docs/output#jsona)
- `--jsonl`: [JSON Lines](https://sq.io/docs/output#jsonl)
- `--csv` / `--tsv` : [CSV](https://sq.io/docs/output#csv) / [TSV](https://sq.io/docs/output#tsv)
- `--xlsx`: [XLSX](https://sq.io/docs/output#xlsx) (Microsoft Excel)
- `--html`: [HTML](https://sq.io/docs/output#html)
- `--xml`: [XML](https://sq.io/docs/output#xml)
- `--yaml`: [YAML](https://sq.io/docs/output#yaml)
- `--markdown`: [Markdown](https://sq.io/docs/output#markdown)
- `--raw`: [Raw](https://sq.io/docs/output#raw) (bytes)
2021-01-04 07:44:09 +03:00
## CHANGELOG
See [CHANGELOG.md](./CHANGELOG.md).
2016-10-21 19:14:48 +03:00
2020-08-06 20:58:47 +03:00
## Acknowledgements
2016-10-21 19:14:48 +03:00
2023-03-12 06:25:19 +03:00
- Thanks to [Diego Souza](https://github.com/diegosouza) for creating
the [Arch Linux package](https://aur.archlinux.org/packages/sq-bin).
- Much inspiration is owed to [jq](https://stedolan.github.io/jq/).
- See [`go.mod`](https://github.com/neilotoole/sq/blob/master/go.mod) for a list of third-party
packages.
- Additionally, `sq` incorporates modified versions of:
2023-03-12 06:25:19 +03:00
- [`olekukonko/tablewriter`](https://github.com/olekukonko/tablewriter)
- [`segmentio/encoding`](https://github.com/segmentio/encoding) for JSON encoding.
- The [_Sakila_](https://dev.mysql.com/doc/sakila/en/) example databases were lifted
from [jOOQ](https://github.com/jooq/jooq), which in turn owe their heritage to earlier work on
Sakila.
- Date rendering via [`ncruces/go-strftime`](https://github.com/ncruces/go-strftime).
2021-01-04 03:40:32 +03:00
## Similar, related, or noteworthy projects
2021-01-04 03:40:32 +03:00
- [usql](https://github.com/xo/usql)
- [textql](https://github.com/dinedal/textql)
- [golang-migrate](https://github.com/golang-migrate/migrate)
- [octosql](https://github.com/cube2222/octosql)
- [rq](https://github.com/dflemstr/rq)