1
1
mirror of https://github.com/wader/fq.git synced 2024-10-04 23:48:00 +03:00
fq/doc/presentations/bts2022/fq.slide
2022-03-08 18:20:18 +01:00

239 lines
5.4 KiB
Plaintext

# fq
jq for binary formats
Mattias Wadman
mattias.wadman@gmail.com
https://github.com/wader/fq
@mwader
## Background
.html style.html
- Use various tools to extract data
- ffprobe, gm identify, mp4dump, mediainfo, wireshark, one off programs, ...
- Convert to usable format and do queries
- jq, grep, sqlite, sort, awk, sed, one off programs, ...
- Digging into and slicing binaries
- Hexfiend, hexdump, dd, cat, one off programs, ...
## Wishlist
"Want to see everything about this picture except the picture"
- A very verbose version of file(1)
- gdb for files
- Select and query things using a language
- Make all parts of a file symbolically addressable
- Nested formats and binaries
- Convenient bit-oriented decoder DSL
## Experiments and prototypes
- Decoder DSL
- TCL, lisp, tengo, Starlark, JavaScript, Go
- Query language
- JSONPath, SQL, jq, JavaScript
- How to use
- IR-JSON: `fq file | jq ... | fq`
- Extend existing project
- Decode and query in same tool
## Result
Go
- Tests showed fast enough to decode big files
- Found gojq
- Previous good experience
- Good tooling
## jq
"The JSON indenter"
- JSON in/out
- Syntax kind of a superset of JSON with same types
- Functional language based on generators and backtracking
- Expressions can return or "output" zero, one or more values
- No more outputs backtracks
- Implicit input and output similar to shell pipes
- Extraordinary iteration and combinatorial abilities
- Great for traversing tree structures
## Examples
.code jq1
## Examples
.code jq2
## Examples
.code jq3
## Examples
.code jq4
## fq
"The binary indenter"
- Superset of jq
- Re-implements most of jq's CLI interface
- 83 input formats, 22 supports probe
- Additional standard library functions
- Additional types that act as standard jq types but has special abilities
- _Decode value_ has bit range, actual and symbolic value, description, ...
- _Binary_ has a unit size, bit or bytes, and can be sliced
- Output fancy hexdump, JSON and binary
- Interactive REPL with completion and sub-REPL support
##
.image formats.svg _ 1024
## Usage
- Basic usage
- `fq . file`, `cat file | fq`
- Multiple input files
- `fq 'grep_by(format == "exif")' *.png *.jpeg`
- Hexdump, JSON and binary output
- `fq '.frames[10] | d' file.mp3`
- `fq '[grep_by(format == "dns").questions[].name.value]' file.pcap`
- `fq 'first(grep_by(format == "jpeg")) | tobytes' file > file.jpeg`
- Interactive REPL
- `fq -i . *.png`
##
.background data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAIAAAACCAYAAABytg0kAAAAAXNSR0IArs4c6QAAAAlwSFlzAAAWJQAAFiUBSVIk8AAAABNJREFUCB1jZGBg+A/EDEwgAgQADigBA//q6GsAAAAASUVORK5CYII%3D
.image usage.svg _ 900
## fq specific functions
- Standard library
- `streaks`, `count`, `delta`, `chunk`, `diff`, `grep`, `grep_by`, ...
- `toradix`, `fromradix`, `hex`, `base64`, ...
- Decode value
- `display` (alias `d`, `dv`, `da` ...)
- `parent`, `format`, ...
- `tobytes`, `tovalue`, `toactual`, ...
- `torepr`, ...
- Binary
- Regexp functions `test`, `match`, ...
- Decode functions `probe`, `mp3_frame`, ...
## Binary and binary array
- A binary is created using `tobits`, `tobytes`, `tobitsrange` or `tobytesrange`.
- From decode value `.frames[1] | tobytes`
- String or number `"hello" | tobits`
- Binary array `[0xab, ["hello", .name]] | tobytes`
- Can be sliced using normal jq slice syntax.
- `"hello" | tobits[8:8+16]` are the bits for `"el"`
- Can be decoded
- `[tobytes[-10:], 0, 0, 0, 0] | flac_frame`
## Example queries
- Slice and decode
- `tobits[8:8+8000] | mp3_frame | d`
- `match([0xff,0xd8]) as $m | tobytes[$m.offset:] | jpeg`
- ASN1 BER, CBOR, msgpack, BSON, ... has `torepr` support
- `fq -d cbor torepr file.cbor`
- `fq -d msgpack '[torepr.items[].name]' file.msgpack`
- PCAP with TCP reassembly, look for GET requests
- `fq 'grep("GET .*")' file.pcap`
- Parent of scalar value that includes bit 100
- `grep_by(scalars and in_bits_range(100)) | parent`
## Use as script interpreter
.code fqscript
## Use as script interpreter
.code fqscriptout
## Implementation
- Library of jq function implemented in Go
- Decoders, decode value, binary, bit reader, IO, tty, ...
- CLI and REPL is mostly written in jq
```
( open
| decode
| if $repl then repeat(read as $expr | eval($expr) | print)
else eval($arg) | print
end
)
```
- All current decoders in Go
- Uses a forked version of gojq
- Helped add native functions and iterators support
- JQValue interface, bin/hex/oct literals, reflection, query AST functions, ...
## Decode API
SPS HRD parameters from ITU-T H.264 specification
.code avc_sps_hdr_params.go
## Decode API
.image avc_sps_hdr_params.png _ 900
## Decode API
Formats can use other formats. Simplified version of mp3 decoder:
.code mp3.go
## Future
- Declarative decoding like kaitai struct, decoder in jq
- Nicer way to handle checksums, encoding, validation etc
- Schemas for ASN1, protobuf, ...
- Better support for modifying data
- More formats like tls, http, http2, grpc, filesystems, ...
- Encoders
- More efficient, lazy decoding, smarter representation
- GUI
- Streaming input, read network traffic `tap("eth0") | select(...)`?
- Hope for more contributors
## Thanks and useful tools
- @itchyny for gojq
- Stephen Dolan and others for jq
- HexFiend
- GNU poke
- Kaitai struct
- Wireshark
- [vscode-jq](https://github.com/wader/vscode-jq)
- [jq-lsp](https://github.com/wader/jq-lsp)