As decoder now can know they are decoding as part of probing we can now
use some heuristics to see if we should decode as html.
The reason heuristics is needed is that x/html parser will alwaus succeed.
Add lazyre package to help delay compile of RE and make it concurrency safe.
Replaces []Format with a Group type.
A bit more type safe.
Breaking change for RegisterFormat, now takes a first argument that is a "single" format group.
Lots of naming cleanup.
This is also preparation for decode group argument which will enable doing intresting
probing, ex a format decoder could know it's decode as part of probe group (html could
be probed possibly), or have "arg probe" group for decoder who inspect args to know
if they should probe (-d /path/to/schema etc) to enable nice CLI-ergonomics.
This preserves the callstack on non-recoverable panics so that using
a debugger and fuzzing is much easier.
Add vscode debug config.
Remove fuzz stacktrace log workaround.
What it can do:
- Decodes records and most standard messages and extensions.
- Decryptes records and reassemples application data stream if a keylog is provided
and the cipher suite is supported.
- Supports most recommended and used ciphers and a bunch of older ones.
What it can't do:
- SSL v3 maybe supported, is similar to TLS 1.0, not tested.
- Decryption and renegotiation/cipher change.
- Record defragmentation not supported, seems rare over TCP.
- TLS 1.3
- SSL v2 but v2 compat header is supported.
- Some key exchange messages not decoded yet
Decryption code is heavly based on golang crypto/tls and zmap/zcrypto.
Will be base for decoding http2 and other TLS based on protocols.
Fixes#587
This will allow passing both cli options and format options to sub decoder.
Ex: pass keylog option to a tls decoder when decoding a pcap.
Ex: pass decode options to a format inside a http body inside a pcap.
Add ArgAs method to lookup argument based on type. This also makes the format
decode function have same signature as sub decoders in the decode API.
This change decode.Format a bit:
DecodeFn is now just func(d *D) any
DecodeInArg renamed to DefaultInArg
Preparation to make decoder use less memory and API more type safe.
Now each scalar type has it's own struct type so it can store different
things and enables to have a scalar interface.
Also own types will enable experimenting with decode DLS designs like
using chained methods that are type aware.
Refactor mp4 decoder to be simpler and have fallback for unknown box type
Cleanup some old ilst hacks
Add generic string reader to decode API that takes an encoding parameters
raw format was a hack to skip decoding to be able to get a binary using tobyte etc.
Now you can do fq -d bytes ... instead of fq -d raw 'tobytes | ...'
Will make it faster for struct with logs of fields and seems to
not cuase any significant difference for small structs.
All this really needs a rewrite somehow, maybe refactor into interfaces somehow? getting messy.
Split fat macho into own decoder macho_fat. This also fixes issue with section
offset etc not being correct as they are from the start of each embedded file.
Make all address and offset field be in hex.
Decode __cstring, __ustring and __cfstring sections.
Fix LC_ENCRYPTION_INFO_64 missing pading issue.
Skip ranging for __bss and __common as they dont have any data in the file.
Simplifed magic handling a bit and add symbols.
Simplified state struct field, had redudant struct.
Doing it thru a propery in the decode fn feels a bit hidden and will
also not get set on failed decoding.
Now array is not range sorted, logic is you care about index number and ordering.
Struct is range sorted as you will prefer to fields by name.
json, yaml, toml, xml, html, csv are now normal formats and most of them also particiate
in probing (not html and csv).
Also fixes a bunch of bugs in to/fromxml, to/fromjq etc.