mirror of https://github.com/wader/fq.git synced 2024-10-06 00:17:58 +03:00

521 lines
19 KiB
Raw Normal View History

2021-09-24 16:41:23 +03:00
## Basic usage
2021-09-18 20:27:46 +03:00
2021-09-19 01:51:15 +03:00
fq tries to behave the same way as jq as much as possible, so you can do:
2021-12-22 21:33:21 +03:00
fq . file
fq < file
cat file | fq
2021-12-22 21:33:21 +03:00
fq . < file
fq . *.png *.mp3
fq '.frames[0]' *.mp3
2021-09-19 01:51:15 +03:00
2021-12-22 21:33:21 +03:00
### Common usages
2021-12-22 21:33:21 +03:00
# recursively display decode tree but truncate long arrays
fq d file
# same as
2021-12-22 21:33:21 +03:00
fq display file
# display all bytes for each value
2021-12-22 21:33:21 +03:00
fq 'd({display_bytes: 0})' file
# display 200 bytes for each value
fq 'd({display_bytes: 200})' file
2021-12-22 21:33:21 +03:00
2022-01-29 15:22:10 +03:00
# recursively display decode tree without truncating
fq da file
2021-12-22 21:33:21 +03:00
# recursively and verbosely display decode tree
fq dv file
2021-12-22 21:33:21 +03:00
# JSON repersenation for whole file
2021-12-22 21:33:21 +03:00
fq tovalue file
# recursively look for decode value roots for a format
fq '.. | select(format=="jpeg")' file
# can also use grep_by
fq 'grep_by(format=="jpeg")' file
2021-12-22 21:33:21 +03:00
# recursively look for first decode value root for a format
fq 'first(.. | select(format=="jpeg"))' file
fq 'first(grep_by(format=="jpeg"))' file
2021-12-22 21:33:21 +03:00
# recursively look for objects fullfilling condition
fq '.. | select(.type=="trak")?' file
fq 'grep_by(.type=="trak")' file
# grep whole tree
fq 'grep("^prefix")' file
fq 'grep(123)' file
fq 'grep_by(. >= 100 and . =< 100)' file
# decode file as mp4 and return a result even if there are some errors
fq -d mp4 file.mp4
# decode file as mp4 and also ignore validity assertions
fq -o force=true -d mp4 file.mp4
2021-09-18 20:27:46 +03:00
### Display output
`display` or `d` is the main function for displying values and is also the function that will be used if no other output function is explicitly used. If its input is a decode value it will output a dump and tree structure or otherwise it will output as JSON.
Below demonstrates some usages:
First and second example does the same thing, inputs `"hello"` to `display`.
![fq demo](display_json.svg)
In the next few examples we select out the first "edit list" box in an mp4 file, it's a list of which part of media track to be included during playback, and displays it in various ways.
Default if not explicitly used `display` will only show the root level:
![fq demo](display_decode_value.svg)
First row shows ruler with byte offset into the line and JSON path for the value.
The columns are:
- Start address for the line. For example we see that `type` starts at `0xd60`+`0x09`.
- Hex repersenation of input bits for value. Will show the whole byte even if the value only partially uses bits from it.
- ASCII representation of input bits for value. Will show the whole byte even if the value only partially uses bits from it.
- Tree structure of decoded value, symbolic value and description.
- `{}` value is an object that might have nested values.
- `[start:end]` value is an array with index starting at `start` and ending at `end` (exclusive).
With `display` or `d` it will recursively show the whole tree:
![fq demo](display_decode_value_d.svg)
Same but verbose `dv`:
![fq demo](display_decode_value_dv.svg)
In verbose mode bit ranges and array element names as shown.
Bit range uses `bytes.bits` notation. For example `type` start at byte `0xd69` bit `0` (left out if zero) and ends at `0xd6c` bit `7` (inclusive) and have byte size of `4`.
There are also some other `display` aliases:
- `da` same as `display({array_truncate: 0})` which will not truncate long arrays.
- `dd` same as `display({array_truncate: 0, display_bytes: 0})` which will not truncate long ranges.
- `dv` same as `display({array_truncate: 0, verbose: true})`
- `ddv` same as `display({array_truncate: 0, display_bytes: 0 verbose: true})` which will not truncate long and also display verbosely.
2021-09-24 16:41:23 +03:00
## Interactive REPL
2021-09-18 20:27:46 +03:00
The interactive [REPL](https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop)
has auto completion and nested REPL support:
2021-09-18 20:27:46 +03:00
2021-09-18 20:27:46 +03:00
# start REPL with null input
2021-10-29 19:46:19 +03:00
$ fq -i
2021-09-18 20:27:46 +03:00
# same as
2021-10-29 19:46:19 +03:00
$ fq -ni
# in the REPL you will see a prompt indicating current input and you can type jq expression to evaluate.
2021-09-18 20:27:46 +03:00
# start REPL with one file as input
$ fq -i . doc/file.mp3
2021-10-29 19:46:19 +03:00
2021-09-18 20:27:46 +03:00
$ fq -i . doc/file.mp3
# basic arithmetics and jq expressions
2021-09-18 20:27:46 +03:00
mp3> 1+1
mp3> 1, 2, 3 | . * 2
mp3> [1, 2, 3] | add
2021-10-29 19:46:19 +03:00
# "." is the identity function which just returns current input, the mp3 file.
2021-09-18 20:27:46 +03:00
mp3> .
# access the first frame in the mp3 file
mp3> .frames[0]
2021-10-29 19:46:19 +03:00
# start a new nested REPL with first frame as input
2021-09-18 20:27:46 +03:00
mp3> .frames[0] | repl
2021-09-24 16:41:23 +03:00
# prompt shows "path" to current input and that it's an mp3_frame.
# Ctrl-D to exit REPL or to shell if last REPL
2021-09-24 16:41:23 +03:00
> .frames[0] mp3_frame> ^D
# "jq" value of layer in first frame
mp3> .frames[0].header.layer | tovalue
mp3> .frames[0].header.layer * 2
# symbolic value, same as "jq" value
mp3> .frames[0].header.layer | tosym
# actual underlaying decoded value
mp3> .frames[0].header.layer | toactual
# description of value
mp3> .frames[0].header.layer | todescription
"MPEG Layer 3"
2021-09-24 16:41:23 +03:00
mp3> ^D
2021-09-18 20:27:46 +03:00
2021-12-22 21:33:21 +03:00
Use Ctrl-D to exit and Ctrl-C to interrupt current evaluation.
2021-09-24 16:41:23 +03:00
2021-12-20 17:15:43 +03:00
## Example usages
#### Second mp3 frame header as JSON
2021-12-20 17:15:43 +03:00
2021-12-20 19:21:07 +03:00
fq '.frames[1].header | tovalue' file.mp3
#### Byte start position for the first 10 mp3 frames in an array
2021-12-20 19:21:07 +03:00
fq '.frames[0:10] | map(tobytesrange.start)' file.mp3
2021-12-20 17:15:43 +03:00
2021-12-20 19:21:07 +03:00
#### Decode at range
# decode byte range 100 to end
fq -d raw 'tobytes[100:] | mp3_frame | d' file.mp3
# decode byte range 10 bytes into .somefield and preseve relative position in file
fq '.somefield | tobytesrange[10:] | mp3_frame | d' file.mp3
#### Show AVC SPS difference between two mp4 files
2021-12-20 19:21:07 +03:00
2021-12-20 17:15:43 +03:00
`-n` tells fq to not have an implicit `input`, `f` is function to select out some interesting value, call `diff` with two arguments,
decoded value for `a.mp4` and `b.mp4` filtered thru `f`.
2021-12-20 19:21:07 +03:00
fq -n 'def f: .. | select(format=="avc_sps"); diff(input|f; input|f)' a.mp4 b.mp4
#### Extract first JPEG found in file
2021-12-20 19:21:07 +03:00
Recursively look for first value that is a `jpeg` decode value root. Use `tobytes` to get bytes for value. Redirect bytes to a file.
2021-12-20 19:21:07 +03:00
2021-12-20 17:15:43 +03:00
fq 'first(.. | select(format=="jpeg")) | tobytes' file > file.jpeg
#### Sample size histogram
2021-12-20 19:21:07 +03:00
Recursively look for a all sample size boxes "stsz" and use `?` to ignore errors when doing `.type` on arrays etc. Save reference to box, count unique values, save the max, output the path to the box and output a historgram scaled to 0-100.
2021-12-20 17:15:43 +03:00
fq '.. | select(.type=="stsz")? as $stsz | .entries | count | max_by(.[1])[1] as $m | ($stsz | topath | path_to_expr), (.[] | "\(.[0]): \((100*.[1]/$m)*"=") \(.[1])") | println' file.mp4
#### Find TCP streams that looks like HTTP GET requests in a PCAP file
2021-12-20 19:21:07 +03:00
Use `grep` to recursively find strings matching a regexp.
2021-12-20 17:15:43 +03:00
fq '.tcp_connections | grep("GET /.* HTTP/1.?")' file.pcap
#### Use representation of a format
Some formats like `msgpack`, `bson` etc are used to represent some data structure. In those cases the `torepr`
function can be used to get the representation.
# whole represented value
fq -d msgpack torepr file.msgpack
# value of the key "field" from the represented value
fq -d msgpack `torepr.field` file.msgpack
# query or transform represented value
fq -d msgpack 'torepr | ...' file.msgpack
#### Widest PNG in a directory
2021-12-20 17:15:43 +03:00
$ fq -rn '[inputs | [input_filename, first(.chunks[] | select(.type=="IHDR") | .width)]] | max_by(.[1]) | .[0]' *.png
#### What values include the byte at position 0x123
2021-12-20 17:15:43 +03:00
$ fq '.. | select(scalars and in_bytes_range(0x123))' file
## Support formats
See [formats](formats.md)
2021-12-30 16:56:50 +03:00
## The jq language
2021-09-24 16:41:23 +03:00
fq is based on the [jq language](https://stedolan.github.io/jq/) and for basic usage its syntax
is similar to how object and array access looks in JavaScript or JSON path, `.food[10]` etc. but
it can do much more and is a very expressive language.
2021-09-24 16:41:23 +03:00
To get the most out of fq it's recommended to learn more about jq, here are some good starting points:
- [jq manual](https://stedolan.github.io/jq/manual/)
- [Peter Koppstein's A Stream oriented Introduction to jq](https://github.com/pkoppstein/jq/wiki/A-Stream-oriented-Introduction-to-jq)
- [jq wiki: Language Description](https://github.com/stedolan/jq/wiki/jq-Language-Description)
- [jq wiki: page Cookbook](https://github.com/stedolan/jq/wiki/Cookbook)
- [jq wiki: Pitfalls](https://github.com/stedolan/jq/wiki/How-to:-Avoid-Pitfalls)
- [FAQ](https://github.com/stedolan/jq/wiki/FAQ)
Common beginner gotcha are:
- jq's use of `;` and `,`. jq uses `;` as argument separator
and `,` as output separator. To call a function `f` with two arguments use `f(1; 2)`. If you do `f(1, 2)` you pass a
single argument `1, 2` (a lambda expression that output `1` and then output `2`) to `f`.
- Expressions can return or "output" zero or more values. This is how loops, foreach etc is
- Expressions have one implicit input and output value. This how pipelines like `1 | . * 2` work.
2021-09-18 20:27:46 +03:00
## Types specific to fq
fq has two additional types compared to jq, decode value and binary. In standard jq expressions they will in most case behave as some standard jq type.
### Decode value
This type is returned by decoders and it used to represent parts of the decoed input. It can act as all JSON types, object, array, number, string etc.
Each decode value has these propties:
- A bit range in the input
- Can be accessed as a binary using `tobits`/`tobytes`. Use the `start` and `size` keys to postion and size.
- `.name` as bytes `.name | tobytes`
- Bit 4-8 of `.name` as bits `.name | tobits[4:8]`
Each non-compound decode value has these propties:
- An actual value:
- This is the decoded representation of the bits, a number, string, bool etc.
- Can be accessed using `toactual`.
- An optional symbolic value:
- Is usually a mapping of the actual to symbolic value, ex: map number to a string value.
- Can be accessed using `tosym`.
- An optional description:
- Can be accessed using `todescription`
The JSON value of a decode value is the symbolic value if available otherwise the actual value. To explicitly access the JSON value use `tovalue`. In most expression this is not needed as it will be done automactically.
### Binary
Raw bits with a unit size, 1 (bits) or 8 (bytes). Will act as a string in standard jq expressions.
Are created using `tobits`/`tobytes` functions from decode values or binary lists.
Can be sliced using the jq `[start:end]` slice syntax.
#### Binary array
Is an array of numbers, strings, binaries or other nested binary arrays. When used as input to `tobits`/`tobytes` the following rules are used:
- Number is a byte so has to be 0-255
- String it's UTF8 code point representation
- Binary as is
- Binary array used recursively
Similar to and inspired by erlang io-lists.
Some examples:
`[0, 123, 255] | tobytes` will be 3 bytes 0, 123 and 255
`[0, [123, 255]] | tobytes` same as above
`[0, 1, 1, 0, 0, 1, 1, 0 | tobits]` will be 1 byte 0x66 an "f"
`[(.a | tobytes[-10:]), 255, (.b | tobits[:10])]` the concatenation of the last 10 bytes of `.a`, a byte with value 255 and the first 10 bits of `.b`.
TODO: padding and alignment
## Functions
2021-09-19 11:27:56 +03:00
- All standard library functions from jq
- Adds a few new general functions:
- `print`, `println`, `printerr`, `printerrln` prints to stdout and stderr.
- `streaks`, `streaks_by(f)` like `group` but groups streaks based on condition.
- `count`, `count_by(f)` like `group` but counts groups lengths.
- `debug(f)` like `debug` but uses arg to produce debug message. `{a: 123} | debug({a}) | ...`.
- `path_to_expr` from `["key", 1]` to `".key[1]"`.
- `expr_to_path` from `".key[1]"` to `["key", 1]`.
- `diff($a; $b)` produce diff object between two values.
- `delta`, `delta_by(f)`, array with difference between all consecutive pairs.
- `chunk(f)`, split array or string into even chunks
- Bitwise functions `band`, `bor`, `bxor`, `bsl`, `bsr` and `bnot`. Works the same as jq math functions,
unary uses input and if more than one argument all as arguments ignoring the input. Ex: `1 | bnot` `bsl(1; 3)`
- Adds some decode value specific functions:
- `root` tree root for value
- `buffer_root` root value of buffer for value
- `format_root` root value of format for value
- `parent` parent value
- `parents` output parents of value
- `topath` path of value. Use `path_to_expr` to get a string representation.
- `tovalue`, `tovalue($opts)` symbolic value if available otherwise actual value
- `toactual` actual value (decoded etc)
- `tosym` symbolic value (mapped etc)
- `todescription` description of value
- `torepr` convert decode value into what it reptresents. For example convert msgpack decode value
into a value representing its JSON representation.
- All regexp functions work with binary as input and pattern argument with these differences
compared to when using string input:
- All offset and length will be in bytes.
- For `capture` the `.string` value is a binary.
- If pattern is a binary it will be matched literally and not as a regexp.
- If pattern is a binary or flags include "b" each input byte will be read as separate code points
- `scan_toend($v)`, `scan_toend($v; $flags)` works the same as `scan` but output binary are from start of match to
end of binary.
instead of possibly multi-byte UTF-8 codepoints. This allows to match raw bytes. Ex: `match("\u00ff"; "b")`
will match the byte `0xff` and not the UTF-8 encoded codepoint for 255, `match("[^\u00ff]"; "b")` will match
all non-`0xff` bytes.
- `grep` functions take 1 or 2 arguments. First is a scalar to match, where a string is
treated as a regexp. A binary will be matches exact bytes. Second argument are regexp
flags with addition that "b" will treat each byte in the input binary as a code point, this
makes it possible to match exact bytes.
- `grep($v)`, `grep($v; $flags)` recursively match value and binary
- `vgrep($v)`, `vgrep($v; $flags)` recursively match value
- `bgrep($v)`, `bgrep($v; $flags)` recursively match binary
- `fgrep($v)`, `fgrep($v; $flags)` recursively match field name
- `grep_by(f)` recursively match using a filter. Ex: `grep_by(. > 180 and . < 200)`, `first(grep_by(format == "id3v2"))`.
- Binary:
- `tobits` - Transform input to binary with bit as unit, does not preserving source range, will start at zero.
- `tobitsrange` - Transform input to binary with bit as unit, preserves source range if possible.
- `tobytes` - Transform input to binary with byte as unit, does not preserving source range, will start at zero.
- `tobytesrange` - Transform input binary with byte as unit, preserves source range if possible.
- `.[start:end]`, `.[:end]`, `.[start:]` - Slice binary from start to end preserving source range.
- `open` open file for reading
- All decode function takes a optional option argument. The only option currently is `force` to ignore decoder asserts.
For example to decode as mp3 and ignore assets do `mp3({force: true})` or `decode("mp3"; {force: true})`, from command line
you currently have to do `fq -d raw 'mp3({force: true})' file`.
- `decode`, `decode($format)`, `decode($format; $opts)` decode format
- `probe`, `probe($opts)` probe and decode format
- `mp3`, `mp3($opts)`, ..., `<name>`, `<name>($opts)` same as `decode(<name>)($opts)`, `decode($format; $opts)` decode as format
- Display shows hexdump/ASCII/tree for decode values and JSON for other values.
- `d`/`d($opts)` display value and truncate long arrays and binaries
- `da`/`da($opts)` display value and don't truncate arrays
- `dd`/`dd($opts)` display value and don't truncate arrays or binaries
- `dv`/`dv($opts)` verbosely display value and don't truncate arrays but truncate binaries
- `ddv`/`ddv($opts)` verbosely display value and don't truncate arrays or binaries
- `p`/`preview` show preview of field tree
- `hd`/`hexdump` hexdump value
- `repl` nested REPL, must be last in a pipeline. `1 | repl`, can "slurp" outputs `1, 2, 3 | repl`.
2021-09-19 11:27:56 +03:00
## Color and unicode output
fq by default tries to use colors if possible, this can be disabled with `-M`. You can also
enable useage of unicode characters for improved output by setting the environment
variable `CLIUNICODE`.
## Configuration
To add own functions you can use `init.fq` that will be read from
- `$HOME/Library/Application Support/fq/init.jq` on macOS
- `$HOME/.config/fq/init.jq` on Linux, BSD etc
- `%AppData%\fq\init.jq` on Windows
2021-09-24 16:41:23 +03:00
## Use as script interpreter
2020-06-08 03:29:51 +03:00
2021-09-24 16:41:23 +03:00
fq can be used as a scrip interpreter:
2020-06-08 03:29:51 +03:00
2021-09-24 16:41:23 +03:00
#!/usr/bin/env fq -d mp3 -rf
[.frames[].header | .sample_count / .sample_rate] | add
2021-09-14 13:56:09 +03:00
2021-09-24 16:41:23 +03:00
## Differences to jq
2020-06-08 03:29:51 +03:00
- [gojq's differences to jq](https://github.com/itchyny/gojq#difference-to-jq),
notable is support for arbitrary-precision integers.
2021-09-24 16:41:23 +03:00
- Supports hexdecimal `0xab`, octal `0o77` and binary `0b101` integer literals.
- Try include `include "file?";` that don't fail if file is missing.
- Some values can act as a object with keys even when it's an array, number etc.
2021-12-31 19:13:16 +03:00
- There can be keys hidden from `keys` and `[]`.
2021-09-24 16:41:23 +03:00
- Some values are readonly and can't be updated.
2020-06-08 03:29:51 +03:00
## Decoded values
2020-06-08 03:29:51 +03:00
2021-09-24 16:41:23 +03:00
When you decode something you will get a decode value. A decode values work like
normal jq values but has special abilities and is used to represent a tree structure of the decoded
2020-06-08 03:29:51 +03:00
binary data. Each value always has a name, type and a bit range.
A value has these special keys (TODO: remove, are internal)
2020-06-08 03:29:51 +03:00
- `_name` name of value
- `_value` jq value of value
- `_start` bit range start
- `_stop` bit range stop
- `_len` bit range length (TODO: rename)
- `_bits` bits in range as a binary
- `_bytes` bits in range as binary using byte units
- `_path` jq path to value
- `_unknown` value is un-decoded gap
- `_symbol` symbolic string representation of value (optional)
- `_description` longer description of value (optional)
- `_format` name of decoded format (optional)
- `_error` error message (optional)
- TODO: unknown gaps
## Own decoders and use as library
2021-09-07 02:38:52 +03:00
## Known issues and useful tricks
2021-09-06 02:24:51 +03:00
2021-09-07 02:38:52 +03:00
### Run interactive mode with no input
2021-09-06 02:24:51 +03:00
fq -i
2020-06-08 03:29:51 +03:00
2021-09-24 16:41:23 +03:00
### `select` fails with `expected an ... but got: ...`
2020-06-08 03:29:51 +03:00
2021-09-24 16:41:23 +03:00
Try add `select(...)?` to catch and ignore type errors in the select expression.
2020-06-08 03:29:51 +03:00
2021-09-07 02:38:52 +03:00
### Manual decode
2020-06-08 03:29:51 +03:00
Sometimes fq fails to decode or you know there is valid data buried inside some binary or maybe
you know the format of some unknown value. Then you can decode manually.
# try decode a `mp3_frame` that failed to decode
2021-10-29 19:46:19 +03:00
$ fq -d mp3 '.unknown0 | mp3_frame' file.mp3
2020-06-08 03:29:51 +03:00
# skip first 10 bytes then decode as `mp3_frame`
2021-10-29 19:46:19 +03:00
$ fq -d raw 'tobytes[10:] | mp3_frame' file.mp3
2020-06-08 03:29:51 +03:00
2021-09-24 16:41:23 +03:00
### Use `.` as input and in a positional argument
2021-09-24 16:41:23 +03:00
The expression `.a | f(.b)` might not work as expected. `.` is `.a` when evaluating the arguments so
the positional argument will end up being `.a.b`. Instead do `. as $c | .a | f($c.b)`.
2021-09-07 02:38:52 +03:00
### Building array is slow
2020-06-08 03:29:51 +03:00
2021-09-06 02:24:51 +03:00
Try to use `map` or `foreach` to avoid rebuilding the whole array for each append.
2020-06-08 03:29:51 +03:00
2021-09-07 02:38:52 +03:00
### Use `print` and `println` to produce more friendly compact output
2020-06-08 03:29:51 +03:00
> [[0,"a"],[1,"b"]]
> [[0,"a"],[1,"b"]] | .[] | "\(.[0]): \(.[1])" | println
0: a
1: b
2021-09-06 02:24:51 +03:00
### `repl` argument using function or variable causes `variable not defined`
`true as $verbose | repl({verbose: $verbose})` will currently fail as `repl` is
implemented by rewriting the query to `map(true as $verbose | .) | repl({verbose: $verbose})`.
2021-09-07 02:38:52 +03:00
### `error` produces no output
`null | error` behaves as `empty`.