2022-01-09 18:57:11 +03:00
## Implement a decoder
### Steps to add new decoder
- Create a directory `format/<name>`
- Copy some similar decoder, `format/format/bson.go` is quite small, to `format/<name>/<name>.go`
2022-01-11 14:28:21 +03:00
- Cleanup and fill in the register struct, rename `format.BSON` and add it
to `format/fromat.go` and don't forget to change the string constant.
2022-01-09 18:57:11 +03:00
- Add an import to `format/all/all.go`
2022-01-11 14:28:21 +03:00
### Some general tips
2020-06-08 03:29:51 +03:00
2022-01-11 14:28:21 +03:00
- Main goal is to produce a tree structure that is user-friendly and easy to work with.
Prefer a nice and easy query tree structure over nice decoder implementation.
- Use same names, symbols, constant number bases etc as in specification.
But maybe in lowercase to be jq/JSON-ish.
2022-01-09 18:57:11 +03:00
- Decode only ranges you know what they are. If possible let "parent" decide what to do with unknown
2022-01-11 14:28:21 +03:00
bits by using `*Decode*Len/Range/Limit` functions. fq will also automatically add "unknown" fields if
it finds gaps.
- Try to not decode too much as one value.
2021-12-04 18:48:59 +03:00
A length encoded int could be two fields, but maybe a length prefixed string should be one.
Flags can be struct with bit-fields.
2022-01-11 14:28:21 +03:00
- Map as many value as possible to more symbolic values.
2022-01-09 18:57:11 +03:00
- Endian is inherited inside one format decoder, defaults to big endian for new format decoder
2022-01-11 14:28:21 +03:00
- Make sure zero length or no frames found etc fails decoding
- If format is in the probe group make sure to validate input to make it non-ambiguous with other decoders
2021-12-04 18:48:59 +03:00
- Try keep decoder code as declarative as possible
- Split into multiple sub formats if possible. Makes it possible to use them separately.
- Validate/Assert
- Error/Fatal/panic
2021-12-31 19:13:16 +03:00
- Is format probeable or not
2021-12-04 18:48:59 +03:00
- Can new formats be added to other formats
2021-12-31 19:13:16 +03:00
- Does the new format include existing formats
2022-02-12 18:13:20 +03:00
### Decoder API
2022-02-13 18:28:59 +03:00
`*decode.D` reader methods use this name convention:
`<Field>?(<reader<length>?>|<type>Fn>)(...[, scalar.Mapper...]) <type>`
2022-02-12 18:13:20 +03:00
- If starts with `Field` a field will be added and first argument will be name of field. If not it will just read.
- `<reader<length>?>|<type>Fn>` a reader or a reader function
2022-02-13 18:28:59 +03:00
- `<reader<length>?>` Read bits using some decoder.
- `U16` unsigned 16 bit integer.
- `UTF8` UTF8 with byte length as argument.
2022-02-12 18:13:20 +03:00
- `<type>Fn>` read using a `func(d *decode.D) <type>` function.
- This can be used to implement own custom readers.
All `Field` functions takes a var args of `scalar.Mapper` :s that will be applied after reading.
2022-02-13 18:28:59 +03:00
`<type>` are these types:
| `<type>` | Go type | jq type |
| -------- | ------- | ------- |
| U | uint64 | number |
| S | int64 | number |
| F | float64 | number |
| Str | string | string |
| Bool | bool | boolean |
| Nil | nil | null |
2022-02-12 18:13:20 +03:00
TODO: there are some more (BitBuf etc, should be renamed)
To add a struct or array use `d.FieldStruct(...)` and `d.FieldArray(...)` .
2022-02-13 18:28:59 +03:00
TODO: nested formats, buffers, own decoders, scalar mappers
TODO: seeking, framed/limited/range decode
2022-02-12 18:13:20 +03:00
For example this decoder:
```go
2022-02-13 18:28:59 +03:00
// read 4 byte UTF8 string and add it as "magic", return a string
d.FieldUTF8("magic", 4)
// create a new struct and add it as "headers", returns a *decode.D
d.FieldStruct("headers", func(d *decode.D) {
// read 8 bit unsigned integer, map it and add it as "type", returns a uint64
d.FieldU8("type", scalar.UToSymStr{
2022-02-12 18:13:20 +03:00
1: "start",
// ...
})
})
```
will produce something like this:
```go
*decode.Value{
Parent: nil,
V: *decode.Compound{
IsArray: false, // is struct
Children: []*decode.Value{
*decode.Value{
Name: "magic",
V: scalar.S{
Actual: "abcd", // read and set by UTF8 reader
},
2022-02-13 18:28:59 +03:00
Range: ranges.Range{Start: 0, Len: 32},
2022-02-12 18:13:20 +03:00
},
*decode.Value{
Parent: & ... // ref parent *decode.Value>,
Name: "headers",
V: *decode.Compound{
IsArray: false, // is struct
Children: []*decode.Value{
*decode.Value{
Name: "type",
V: scalar.S{
Actual: uint64(1), // read and set by U8 reader
Sym: "start", // set by UToSymStr scalar.Mapper
},
2022-02-13 18:28:59 +03:00
Range: ranges.Range{Start: 32, Len: 8},
2022-02-12 18:13:20 +03:00
},
},
},
2022-02-13 18:28:59 +03:00
Range: ranges.Range{Start: 32, Len: 8},
2022-02-12 18:13:20 +03:00
},
},
},
2022-02-13 18:28:59 +03:00
Range: ranges.Range{Start: 0, Len: 40},
2022-02-12 18:13:20 +03:00
}
```
and will look like this in jq/JSON:
```json
{
"magic": "abcd",
"headers": {
"type": "start"
}
}
```
2022-02-13 18:28:59 +03:00
#### `*decode.D` type
2022-02-12 18:13:20 +03:00
This is the main type used during decoding. It keeps track of:
2022-02-13 18:28:59 +03:00
- A current array or struct [`*decode.Value` ](#decodevalue-type ) where fields will be added.
2022-02-12 18:13:20 +03:00
- Current bit reader
2022-02-13 18:28:59 +03:00
- Current default endian
2022-02-12 18:13:20 +03:00
- Decode options
2022-02-13 18:28:59 +03:00
New [`*decode.D` ](#decoded-type ) are created during decoding when `d.FieldStruct` etc is used. It is also a kitchen sink of all kind functions for reading various standard number and string encodings etc.
2022-02-12 18:13:20 +03:00
Decoder authors do not have to create them.
2022-02-13 18:28:59 +03:00
#### `*decode.Value` type
2022-02-12 18:13:20 +03:00
2022-02-13 18:28:59 +03:00
Is what [`*decode.D` ](#decoded-type ) produces and it used to represent the decoded structure. Can be array, struct, number, string etc. It is the underlaying type used by `interp.DecodeValue` that implements `gojq.JQValue` to expose it as various jq types, which in turn is used to produce JSON.
2022-02-12 18:13:20 +03:00
It stores:
2022-02-13 18:28:59 +03:00
- Parent [`*decode.Value` ](#decodevalue-type ) unless it's a root.
- A decoded value, a [`scalar.S` ](#scalars-type ) or [`*decode.Compound` ](#decodecompound-type ) (struct or array)
2022-02-12 18:13:20 +03:00
- Name in parent struct or array. If parent is a struct the name is unique.
- Index in parent array. Not used if parent is a struct.
- A bit range. Also struct and array have a range that is the min/max range of its children.
- A bit reader where the bit range can be read from.
Decoder authors will probably not have to create them.
2022-02-13 18:28:59 +03:00
#### `scalar.S` type
2022-02-12 18:13:20 +03:00
Keeps track of
- Actual value. Decoded value represented using a go type like `uint64` , `string` etc. For example a value reader by a utf8 or utf16 reader both will ends up as a `string` .
- Symbolic value. Optional symbolic representation of the actual value. For example a `scalar.UToSymStr` would map an actual `uint64` to a symbolic `string` .
- String description of the value.
- Number representation
2022-02-13 18:28:59 +03:00
The `scalar` package has `scalar.Mapper` implementations for all types to map actual to whole [`scalar.S` ](#scalars-type ) value `scalar.<type>ToScalar` or to just to set symbolic value `scalar.<type>ToSym<type>` . There is also mappers to just set values or to change number representations `scalar.Hex` /`scalar.SymHex` etc.
2022-02-12 18:13:20 +03:00
Decoder authors will probably not have to create them. But you might implement your own `scalar.Mapper` to modify them.
2022-02-13 18:28:59 +03:00
#### `*decode.Compound` type
2022-02-12 18:13:20 +03:00
2022-02-13 18:28:59 +03:00
Used to store struct or array of [`*decode.Value` ](#decodevalue-type ).
2022-02-12 18:13:20 +03:00
Decoder authors do not have to create them.
2022-02-13 18:28:59 +03:00
## Development tips
2021-12-31 19:13:16 +03:00
2022-01-29 00:29:02 +03:00
I ususally use `-d <format>` and `dv` while developing, that way you will get a decode tree
even if it fails. `dv` gives verbose output and also includes stacktrace.
2021-12-31 19:13:16 +03:00
2022-01-11 14:28:21 +03:00
```sh
2022-01-29 15:22:10 +03:00
go run fq.go -d < format > dv file
2022-01-11 14:28:21 +03:00
```
2020-06-08 03:29:51 +03:00
2022-01-11 14:28:21 +03:00
If the format is inside some other format it can be handy to first extract the bits and run
the decode directly. For example if working a `aac_frame` decoder issue:
2020-06-08 03:29:51 +03:00
2021-12-04 18:48:59 +03:00
```sh
2022-01-11 14:28:21 +03:00
fq '.tracks[0].samples[1234] | tobytes' file.mp4 > aac_frame_1234
2022-01-29 15:22:10 +03:00
fq -d aac_frame dv aac_frame_1234
2022-01-11 14:28:21 +03:00
```
Sometimes nested decoding fails then maybe a good way is to change the parent decoder to
use `d.RawLen()` etc instead of `d.FormatLen()` etc temporary to extract the bits. Hopefully
there will be some option to do this in the future.
When researching or investinging something I can recommend to use `watchexec` , `modd` etc to
make things more comfortable. Also using vscode/delve for debugging should work fine once
launch `args` are setup etc.
```
2022-01-29 15:22:10 +03:00
watchexec "go run fq.go -d aac_frame dv aac_frame"
2022-01-11 14:28:21 +03:00
```
Some different ways to run tests:
```sh
# run all tests
make test
# run all go tests
go test ./...
2021-12-31 19:13:16 +03:00
# run all tests for one format
go test -run TestFQTests/mp4 ./format/
# write all actual outputs
2022-01-29 14:25:11 +03:00
WRITE_ACTUAL=1 go test ./...
2022-01-11 14:28:21 +03:00
# write actual output for specific tests
2021-12-31 19:13:16 +03:00
WRITE_ACTUAL=1 go run -run ...
2022-01-11 14:28:21 +03:00
# color diff
DIFF_COLOR=1 go test ...
```
To lint source use:
```
make lint
2021-12-04 18:48:59 +03:00
```
2020-06-08 03:29:51 +03:00
2022-01-11 14:28:21 +03:00
Generate documentation. Requires [FFmpeg ](https://github.com/FFmpeg/FFmpeg ) and [Graphviz ](https://gitlab.com/graphviz/graphviz ):
```sh
make doc
```
TODO: `make fuzz`
2021-08-27 10:47:43 +03:00
## Debug
2021-12-04 18:48:59 +03:00
Split debug and normal output even when using repl:
Write `log` package output and stderr to a file that can be `tail -f` :ed in another terminal:
2021-08-27 10:47:43 +03:00
```sh
2021-12-03 12:35:52 +03:00
LOGFILE=/tmp/log go run fq.go ... 2>>/tmp/log
2021-08-27 10:47:43 +03:00
```
gojq execution debug:
```sh
2021-12-03 12:35:52 +03:00
GOJQ_DEBUG=1 go run -tags debug fq.go ...
2021-08-27 10:47:43 +03:00
```
2021-11-06 01:52:31 +03:00
2021-12-04 18:48:59 +03:00
Memory and CPU profile (will open a browser):
```sh
make memprof ARGS=". file"
make cpuprof ARGS=". test.mp3"
```
2021-11-06 01:52:31 +03:00
## From start to decoded value
```
main:main()
cli.Main(default registry)
interp.New(registry, std os interp implementation)
interp.(*Interp).Main()
interp.jq _main/0:
args.jq _args_parse/2
populate filenames for input/0
interp.jq inputs/0
foreach valid input/0 output
interp.jq open
funcs.go _open
interp.jq decode
funcs.go _decode
decode.go Decode(...)
...
interp.jq eval expr
funcs.go _eval
interp.jq display
funcs.go _display
for interp.(decodeValueBase).Display()
dump.go
print tree
empty output
```
2021-11-21 23:55:53 +03:00
## bitio and other io packages
```
*os.File, *bytes.Buffer
^
ctxreadseeker.Reader defers blocking io operations to a goroutine to make them cancellable
^
progressreadseeker.Reader approximates how much of a file has been read
^
aheadreadseeker.Reader does readahead caching
^
| (io.ReadSeeker interface)
|
2022-01-24 23:21:48 +03:00
bitio.IOBitReader (implements bitio.Bit* interfaces)
2021-11-21 23:55:53 +03:00
SectionBitReader
MultiBitReader
```
2021-12-04 18:48:59 +03:00
## jq oddities
```
jq -n '[1,2,3,4] | .[null:], .[null:2], .[2:null], .[:null]'
```
2022-01-05 00:14:57 +03:00
## Setup docker desktop with golang windows container
```sh
git clone https://github.com/StefanScherer/windows-docker-machine.git
cd windows-docker-machine
vagrant up 2016-box
cd ../fq
docker --context 2016-box run --rm -ti -v "C:${PWD//\//\\}:C:${PWD//\//\\}" -w "$PWD" golang:1.17.5-windowsservercore-ltsc2016
2022-01-06 11:57:47 +03:00
```
2022-01-12 20:28:02 +03:00
## Implementation details
2022-01-11 14:28:21 +03:00
- fq uses a gojq fork that can be found at https://github.com/wader/gojq/tree/fq (the "fq" branch)
- fq uses a readline fork that can be found at https://github.com/wader/readline/tree/fq (the "fq" branch)
2022-02-13 18:28:59 +03:00
- cli readline uses raw mode to blocks ctrl-c to become a SIGINT
2022-01-11 14:28:21 +03:00
2022-01-12 20:28:02 +03:00
## Dependencies and source origins
- [gojq ](https://github.com/itchyny/gojq ) fork that can be found at https://github.com/wader/gojq/tree/fq< br >
Issues and PR:s related to fq:< br >
[#43 ](https://github.com/itchyny/gojq/issues/43 ) Support for functions written in go when used as a library< br >
[#46 ](https://github.com/itchyny/gojq/pull/46 ) Support custom internal functions< br >
[#56 ](https://github.com/itchyny/gojq/issues/56 ) String format query with no operator using %#v or %#+v panics
[#65 ](https://github.com/itchyny/gojq/issues/65 ) Try-catch with custom function< br >
[#67 ](https://github.com/itchyny/gojq/pull/67 ) Add custom iterator function support which enables implementing a REPL in jq< br >
[#81 ](https://github.com/itchyny/gojq/issues/81 ) path/1 behaviour and path expression question< br >
[#86 ](https://github.com/itchyny/gojq/issues/86 ) ER: basic TCO
[#109 ](https://github.com/itchyny/gojq/issues/109 ) jq halt_error behaviour difference< br >
[#113 ](https://github.com/itchyny/gojq/issues/113 ) error/0 and error/1 behavior difference< br >
[#117 ](https://github.com/itchyny/gojq/issues/117 ) Negative number modulus *big.Int behaves differently to int< br >
[#118 ](https://github.com/itchyny/gojq/issues/118 ) Regression introduced by "remove fork analysis from tail call optimization (ref #86 )"< br >
[#122 ](https://github.com/itchyny/gojq/issues/122 ) Slow performance for large error values that ends up using typeErrorPreview()< br >
[#125 ](https://github.com/itchyny/gojq/pull/125 ) improve performance of join by make it internal< br >
[#141 ](https://github.com/itchyny/gojq/issues/141 ) Empty array flatten regression since "improve flatten performance by reducing copy"
- [readline ](https://github.com/chzyer/readline ) fork that can be found at https://github.com/wader/readline/tree/fq
- [gopacket ](https://github.com/google/gopacket ) for TCP and IPv4 reassembly
- [mapstructure ](https://github.com/mitchellh/mapstructure ) for convenient JSON/map conversion
- [go-difflib ](https://github.com/pmezard/go-difflib ) for diff tests
- [golang.org/x/text ](https://pkg.go.dev/golang.org/x/text ) for text encoding conversions
- [float16.go ](https://android.googlesource.com/platform/tools/gpu/+/gradle_2.0.0/binary/float16.go ) to convert bits into 16-bit floats
2022-01-06 11:57:47 +03:00
## Release process
Run and follow instructions:
```
2022-03-09 23:02:13 +03:00
make release VERSION=1.2.3
2022-01-06 11:57:47 +03:00
```
2022-02-10 01:58:54 +03:00
Commits since release
```
git log --no-decorate --no-merges --oneline v0.0.4..wader/master | sort -t " " -k 2 | sed 's/\(.*\)/* \1/'
```