As decoder now can know they are decoding as part of probing we can now
use some heuristics to see if we should decode as html.
The reason heuristics is needed is that x/html parser will alwaus succeed.
Add lazyre package to help delay compile of RE and make it concurrency safe.
Replaces []Format with a Group type.
A bit more type safe.
Breaking change for RegisterFormat, now takes a first argument that is a "single" format group.
Lots of naming cleanup.
This is also preparation for decode group argument which will enable doing intresting
probing, ex a format decoder could know it's decode as part of probe group (html could
be probed possibly), or have "arg probe" group for decoder who inspect args to know
if they should probe (-d /path/to/schema etc) to enable nice CLI-ergonomics.
encoding/xml and github.com/BurntSushi/toml both reads a lot before detecting
that it can't decode. Now we instead read one UTF-8 and make sure it's valid
xml or toml.
Should speed up probing
Related to #586 bigzero-zip.zip
This will allow passing both cli options and format options to sub decoder.
Ex: pass keylog option to a tls decoder when decoding a pcap.
Ex: pass decode options to a format inside a http body inside a pcap.
Add ArgAs method to lookup argument based on type. This also makes the format
decode function have same signature as sub decoders in the decode API.
This change decode.Format a bit:
DecodeFn is now just func(d *D) any
DecodeInArg renamed to DefaultInArg
Feels less cluttered, easier to read and more consistent.
Still keep tovalue, tobytes etc that are more basic functions this
only renamed format related functions.
Also there is an exceptin for to/fromjson as it comes from jq.
Also fixes lots of spelling errors while reading thru.
Preparation to make decoder use less memory and API more type safe.
Now each scalar type has it's own struct type so it can store different
things and enables to have a scalar interface.
Also own types will enable experimenting with decode DLS designs like
using chained methods that are type aware.
raw format was a hack to skip decoding to be able to get a binary using tobyte etc.
Now you can do fq -d bytes ... instead of fq -d raw 'tobytes | ...'
Markdown is used as is in online documentation and in cli the markdown decoder
is used to decode and the some jq code massages it into something cli friendly.
Was just too much of a mess to have doc in jq.
json, yaml, toml, xml, html, csv are now normal formats and most of them also particiate
in probing (not html and csv).
Also fixes a bunch of bugs in to/fromxml, to/fromjq etc.