mirror of
https://github.com/wader/fq.git
synced 2024-11-25 23:13:19 +03:00
239 lines
5.4 KiB
Plaintext
239 lines
5.4 KiB
Plaintext
# fq
|
|
|
|
jq for binary formats
|
|
|
|
Mattias Wadman
|
|
mattias.wadman@gmail.com
|
|
https://github.com/wader/fq
|
|
@mwader
|
|
|
|
## Background
|
|
|
|
.html style.html
|
|
|
|
- Use various tools to extract data
|
|
- ffprobe, gm identify, mp4dump, mediainfo, wireshark, one off programs, ...
|
|
- Convert to usable format and do queries
|
|
- jq, grep, sqlite, sort, awk, sed, one off programs, ...
|
|
- Digging into and slicing binaries
|
|
- Hexfiend, hexdump, dd, cat, one off programs, ...
|
|
|
|
|
|
## Wishlist
|
|
|
|
"Want to see everything about this picture except the picture"
|
|
|
|
- A very verbose version of file(1)
|
|
- gdb for files
|
|
- Select and query things using a language
|
|
- Make all parts of a file symbolically addressable
|
|
- Nested formats and binaries
|
|
- Convenient bit-oriented decoder DSL
|
|
|
|
|
|
## Experiments and prototypes
|
|
|
|
- Decoder DSL
|
|
- TCL, lisp, tengo, Starlark, JavaScript, Go
|
|
- Query language
|
|
- JSONPath, SQL, jq, JavaScript
|
|
- How to use
|
|
- IR-JSON: `fq file | jq ... | fq`
|
|
- Extend existing project
|
|
- Decode and query in same tool
|
|
|
|
|
|
## Result
|
|
|
|
Go
|
|
|
|
- Tests showed fast enough to decode big files
|
|
- Found gojq
|
|
- Previous good experience
|
|
- Good tooling
|
|
|
|
|
|
## jq
|
|
|
|
"The JSON indenter"
|
|
|
|
- JSON in/out
|
|
- Syntax kind of a superset of JSON with same types
|
|
- Functional language based on generators and backtracking
|
|
- Expressions can return or "output" zero, one or more values
|
|
- No more outputs backtracks
|
|
- Implicit input and output similar to shell pipes
|
|
- Extraordinary iteration and combinatorial abilities
|
|
- Great for traversing tree structures
|
|
|
|
|
|
## Examples
|
|
|
|
.code jq1
|
|
|
|
## Examples
|
|
|
|
.code jq2
|
|
|
|
## Examples
|
|
|
|
.code jq3
|
|
|
|
|
|
## Examples
|
|
|
|
.code jq4
|
|
|
|
|
|
## fq
|
|
|
|
"The binary indenter"
|
|
|
|
- Superset of jq
|
|
- Re-implements most of jq's CLI interface
|
|
- 83 input formats, 22 supports probe
|
|
- Additional standard library functions
|
|
- Additional types that act as standard jq types but has special abilities
|
|
- _Decode value_ has bit range, actual and symbolic value, description, ...
|
|
- _Binary_ has a unit size, bit or bytes, and can be sliced
|
|
- Output fancy hexdump, JSON and binary
|
|
- Interactive REPL with completion and sub-REPL support
|
|
|
|
|
|
##
|
|
|
|
.image formats.svg _ 1024
|
|
|
|
## Usage
|
|
|
|
- Basic usage
|
|
- `fq . file`, `cat file | fq`
|
|
- Multiple input files
|
|
- `fq 'grep_by(format == "exif")' *.png *.jpeg`
|
|
- Hexdump, JSON and binary output
|
|
- `fq '.frames[10] | d' file.mp3`
|
|
- `fq '[grep_by(format == "dns").questions[].name.value]' file.pcap`
|
|
- `fq 'first(grep_by(format == "jpeg")) | tobytes' file > file.jpeg`
|
|
- Interactive REPL
|
|
- `fq -i . *.png`
|
|
|
|
|
|
##
|
|
|
|
.background data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAIAAAACCAYAAABytg0kAAAAAXNSR0IArs4c6QAAAAlwSFlzAAAWJQAAFiUBSVIk8AAAABNJREFUCB1jZGBg+A/EDEwgAgQADigBA//q6GsAAAAASUVORK5CYII%3D
|
|
.image usage.svg _ 900
|
|
|
|
|
|
## fq specific functions
|
|
|
|
- Standard library
|
|
- `streaks`, `count`, `delta`, `chunk`, `diff`, `grep`, `grep_by`, ...
|
|
- `toradix`, `fromradix`, `hex`, `base64`, ...
|
|
- Decode value
|
|
- `display` (alias `d`, `dv`, `da` ...)
|
|
- `parent`, `format`, ...
|
|
- `tobytes`, `tovalue`, `toactual`, ...
|
|
- `torepr`, ...
|
|
- Binary
|
|
- Regexp functions `test`, `match`, ...
|
|
- Decode functions `probe`, `mp3_frame`, ...
|
|
|
|
|
|
## Binary and binary array
|
|
|
|
- A binary is created using `tobits`, `tobytes`, `tobitsrange` or `tobytesrange`.
|
|
- From decode value `.frames[1] | tobytes`
|
|
- String or number `"hello" | tobits`
|
|
- Binary array `[0xab, ["hello", .name]] | tobytes`
|
|
- Can be sliced using normal jq slice syntax.
|
|
- `"hello" | tobits[8:8+16]` are the bits for `"el"`
|
|
- Can be decoded
|
|
- `[tobytes[-10:], 0, 0, 0, 0] | flac_frame`
|
|
|
|
|
|
## Example queries
|
|
|
|
- Slice and decode
|
|
- `tobits[8:8+8000] | mp3_frame | d`
|
|
- `match([0xff,0xd8]) as $m | tobytes[$m.offset:] | jpeg`
|
|
- ASN1 BER, CBOR, msgpack, BSON, ... has `torepr` support
|
|
- `fq -d cbor torepr file.cbor`
|
|
- `fq -d msgpack '[torepr.items[].name]' file.msgpack`
|
|
- PCAP with TCP reassembly, look for GET requests
|
|
- `fq 'grep("GET .*")' file.pcap`
|
|
- Parent of scalar value that includes bit 100
|
|
- `grep_by(scalars and in_bits_range(100)) | parent`
|
|
|
|
|
|
## Use as script interpreter
|
|
|
|
.code fqscript
|
|
|
|
|
|
## Use as script interpreter
|
|
|
|
.code fqscriptout
|
|
|
|
|
|
## Implementation
|
|
|
|
- Library of jq function implemented in Go
|
|
- Decoders, decode value, binary, bit reader, IO, tty, ...
|
|
- CLI and REPL is mostly written in jq
|
|
```
|
|
( open
|
|
| decode
|
|
| if $repl then repeat(read as $expr | eval($expr) | print)
|
|
else eval($arg) | print
|
|
end
|
|
)
|
|
```
|
|
- All current decoders in Go
|
|
- Uses a forked version of gojq
|
|
- Helped add native functions and iterators support
|
|
- JQValue interface, bin/hex/oct literals, reflection, query AST functions, ...
|
|
|
|
## Decode API
|
|
|
|
SPS HRD parameters from ITU-T H.264 specification
|
|
|
|
.code avc_sps_hdr_params.go
|
|
|
|
## Decode API
|
|
|
|
.image avc_sps_hdr_params.png _ 900
|
|
|
|
|
|
## Decode API
|
|
|
|
Formats can use other formats. Simplified version of mp3 decoder:
|
|
|
|
.code mp3.go
|
|
|
|
|
|
## Future
|
|
|
|
- Declarative decoding like kaitai struct, decoder in jq
|
|
- Nicer way to handle checksums, encoding, validation etc
|
|
- Schemas for ASN1, protobuf, ...
|
|
- Better support for modifying data
|
|
- More formats like tls, http, http2, grpc, filesystems, ...
|
|
- Encoders
|
|
- More efficient, lazy decoding, smarter representation
|
|
- GUI
|
|
- Streaming input, read network traffic `tap("eth0") | select(...)`?
|
|
- Hope for more contributors
|
|
|
|
|
|
## Thanks and useful tools
|
|
|
|
- @itchyny for gojq
|
|
- Stephen Dolan and others for jq
|
|
- HexFiend
|
|
- GNU poke
|
|
- Kaitai struct
|
|
- Wireshark
|
|
- [vscode-jq](https://github.com/wader/vscode-jq)
|
|
- [jq-lsp](https://github.com/wader/jq-lsp)
|
|
|