format,intepr: Refactor json, yaml, etc into formats also move out related functions

json, yaml, toml, xml, html, csv are now normal formats and most of them also particiate in probing (not html and csv). Also fixes a bunch of bugs in to/fromxml, to/fromjq etc.
2024-11-22 07:16:49 +03:00 · 2022-06-01 16:55:55 +02:00 · 2022-06-01 16:55:55 +02:00 · cae288e6be
commit cae288e6be
parent 8b5cc89641
95 changed files with 2826 additions and 2048 deletions
--- a/.gitattributes
+++ b/.gitattributes
@ -2,3 +2,4 @@
 *.fqtest eol=lf
 *.json eol=lf
 *.jq eol=lf
+*.xml eol=lf
--- a/README.md
+++ b/README.md
@ -61,6 +61,7 @@ bsd_loopback_frame,
 [bson](doc/formats.md#bson),
 bzip2,
 [cbor](doc/formats.md#cbor),
+[csv](doc/formats.md#csv),
 dns,
 dns_tcp,
 elf,
@ -82,6 +83,7 @@ hevc_nalu,
 hevc_pps,
 hevc_sps,
 hevc_vps,
+[html](doc/formats.md#html),
 icc_profile,
 icmp,
 icmpv6,
@ -120,6 +122,7 @@ sll_packet,
 tar,
 tcp_segment,
 tiff,
+toml,
 udp_datagram,
 vorbis_comment,
 vorbis_packet,
@ -130,6 +133,8 @@ vpx_ccr,
 wav,
 webp,
 xing,
+[xml](doc/formats.md#xml),
+yaml,
 [zip](doc/formats.md#zip)

 [#]: sh-end
@ -286,3 +291,4 @@ Licenses of direct dependencies:
 - golang/snappy https://github.com/golang/snappy/blob/master/LICENSE (BSD)
 - github.com/BurntSushi/toml https://github.com/BurntSushi/toml/blob/master/COPYING (MIT)
 - gopkg.in/yaml.v3 https://github.com/go-yaml/yaml/blob/v3/LICENSE (MIT)
+- github.com/creasty/defaults https://github.com/creasty/defaults/blob/master/LICENSE (MIT)
--- a/doc/formats.md
+++ b/doc/formats.md
@ -31,6 +31,7 @@
 |[`bson`](#bson)             |Binary&nbsp;JSON                                                                         |<sub></sub>|
 |`bzip2`                     |bzip2&nbsp;compression                                                                   |<sub>`probe`</sub>|
 |[`cbor`](#cbor)             |Concise&nbsp;Binary&nbsp;Object&nbsp;Representation                                      |<sub></sub>|
+|[`csv`](#csv)               |Comma&nbsp;separated&nbsp;values                                                         |<sub></sub>|
 |`dns`                       |DNS&nbsp;packet                                                                          |<sub></sub>|
 |`dns_tcp`                   |DNS&nbsp;packet&nbsp;(TCP)                                                               |<sub></sub>|
 |`elf`                       |Executable&nbsp;and&nbsp;Linkable&nbsp;Format                                            |<sub></sub>|
@ -52,6 +53,7 @@
 |`hevc_pps`                  |H.265/HEVC&nbsp;Picture&nbsp;Parameter&nbsp;Set                                          |<sub></sub>|
 |`hevc_sps`                  |H.265/HEVC&nbsp;Sequence&nbsp;Parameter&nbsp;Set                                         |<sub></sub>|
 |`hevc_vps`                  |H.265/HEVC&nbsp;Video&nbsp;Parameter&nbsp;Set                                            |<sub></sub>|
+|[`html`](#html)             |HyperText&nbsp;Markup&nbsp;Language                                                      |<sub></sub>|
 |`icc_profile`               |International&nbsp;Color&nbsp;Consortium&nbsp;profile                                    |<sub></sub>|
 |`icmp`                      |Internet&nbsp;Control&nbsp;Message&nbsp;Protocol                                         |<sub></sub>|
 |`icmpv6`                    |Internet&nbsp;Control&nbsp;Message&nbsp;Protocol&nbsp;v6                                 |<sub></sub>|
@ -61,7 +63,7 @@
 |`ipv4_packet`               |Internet&nbsp;protocol&nbsp;v4&nbsp;packet                                               |<sub>`ip_packet`</sub>|
 |`ipv6_packet`               |Internet&nbsp;protocol&nbsp;v6&nbsp;packet                                               |<sub>`ip_packet`</sub>|
 |`jpeg`                      |Joint&nbsp;Photographic&nbsp;Experts&nbsp;Group&nbsp;file                                |<sub>`exif` `icc_profile`</sub>|
-|`json`                      |JSON                                                                                     |<sub></sub>|
+|`json`                      |JavaScript&nbsp;Object&nbsp;Notation                                                     |<sub></sub>|
 |[`macho`](#macho)           |Mach-O&nbsp;macOS&nbsp;executable                                                        |<sub></sub>|
 |[`matroska`](#matroska)     |Matroska&nbsp;file                                                                       |<sub>`aac_frame` `av1_ccr` `av1_frame` `avc_au` `avc_dcr` `flac_frame` `flac_metadatablocks` `hevc_au` `hevc_dcr` `image` `mp3_frame` `mpeg_asc` `mpeg_pes_packet` `mpeg_spu` `opus_packet` `vorbis_packet` `vp8_frame` `vp9_cfm` `vp9_frame`</sub>|
 |[`mp3`](#mp3)               |MP3&nbsp;file                                                                            |<sub>`id3v2` `id3v1` `id3v11` `apev2` `mp3_frame`</sub>|
@ -90,6 +92,7 @@
 |`tar`                       |Tar&nbsp;archive                                                                         |<sub>`probe`</sub>|
 |`tcp_segment`               |Transmission&nbsp;control&nbsp;protocol&nbsp;segment                                     |<sub></sub>|
 |`tiff`                      |Tag&nbsp;Image&nbsp;File&nbsp;Format                                                     |<sub>`icc_profile`</sub>|
+|`toml`                      |Tom's&nbsp;Obvious,&nbsp;Minimal&nbsp;Language                                           |<sub></sub>|
 |`udp_datagram`              |User&nbsp;datagram&nbsp;protocol                                                         |<sub>`udp_payload`</sub>|
 |`vorbis_comment`            |Vorbis&nbsp;comment                                                                      |<sub>`flac_picture`</sub>|
 |`vorbis_packet`             |Vorbis&nbsp;packet                                                                       |<sub>`vorbis_comment`</sub>|
@ -100,12 +103,14 @@
 |`wav`                       |WAV&nbsp;file                                                                            |<sub>`id3v2` `id3v1` `id3v11`</sub>|
 |`webp`                      |WebP&nbsp;image                                                                          |<sub>`vp8_frame`</sub>|
 |`xing`                      |Xing&nbsp;header                                                                         |<sub></sub>|
+|[`xml`](#xml)               |Extensible&nbsp;Markup&nbsp;Language                                                     |<sub></sub>|
+|`yaml`                      |YAML&nbsp;Ain't&nbsp;Markup&nbsp;Language                                                |<sub></sub>|
 |[`zip`](#zip)               |ZIP&nbsp;archive                                                                         |<sub>`probe`</sub>|
 |`image`                     |Group                                                                                    |<sub>`gif` `jpeg` `mp4` `png` `tiff` `webp`</sub>|
 |`inet_packet`               |Group                                                                                    |<sub>`ipv4_packet` `ipv6_packet`</sub>|
 |`ip_packet`                 |Group                                                                                    |<sub>`icmp` `icmpv6` `tcp_segment` `udp_datagram`</sub>|
 |`link_frame`                |Group                                                                                    |<sub>`bsd_loopback_frame` `ether8023_frame` `sll2_packet` `sll_packet`</sub>|
-|`probe`                     |Group                                                                                    |<sub>`adts` `ar` `avro_ocf` `bitcoin_blkdat` `bzip2` `elf` `flac` `gif` `gzip` `jpeg` `json` `macho` `matroska` `mp3` `mp4` `mpeg_ts` `ogg` `pcap` `pcapng` `png` `tar` `tiff` `wav` `webp` `zip`</sub>|
+|`probe`                     |Group                                                                                    |<sub>`adts` `ar` `avro_ocf` `bitcoin_blkdat` `bzip2` `elf` `flac` `gif` `gzip` `jpeg` `json` `macho` `matroska` `mp3` `mp4` `mpeg_ts` `ogg` `pcap` `pcapng` `png` `tar` `tiff` `toml` `wav` `webp` `xml` `yaml` `zip`</sub>|
 |`tcp_stream`                |Group                                                                                    |<sub>`dns` `rtmp`</sub>|
 |`udp_payload`               |Group                                                                                    |<sub>`dns`</sub>|

@ -280,6 +285,27 @@ Supports `torepr`
 - https://en.wikipedia.org/wiki/CBOR
 - https://www.rfc-editor.org/rfc/rfc8949.html

+### csv
+
+#### Options
+
+|Name     |Default|Description|
+|-        |-      |-|
+|`comma`  |,      |Separator character|
+|`comment`|#      |Comment line character|
+
+#### Examples
+
+Decode file using csv options
+```
+$ fq -d csv -o comma="," -o comment="#" . file
+```
+
+Decode value as csv
+```
+... | csv({comma:",",comment:"#"})
+```
+
 ### flac_frame

 #### Options
@ -320,6 +346,27 @@ Decode value as hevc_au
 ... | hevc_au({length_size:4})
 ```

+### html
+
+#### Options
+
+|Name   |Default|Description|
+|-      |-      |-|
+|`array`|false  |Decode as nested arrays|
+|`seq`  |false  |Use seq attribute to preserve element order|
+
+#### Examples
+
+Decode file using html options
+```
+$ fq -d html -o array=false -o seq=false . file
+```
+
+Decode value as html
+```
+... | html({array:false,seq:false})
+```
+
 ### macho

 Supports decoding vanilla and FAT Mach-O binaries.
@ -456,6 +503,27 @@ Current only supports plain RTMP (not RTMPT or encrypted variants etc) with AMF0
 - https://rtmp.veriskope.com/docs/spec/
 - https://rtmp.veriskope.com/pdf/video_file_format_spec_v10.pdf

+### xml
+
+#### Options
+
+|Name   |Default|Description|
+|-      |-      |-|
+|`array`|false  |Decode as nested arrays|
+|`seq`  |false  |Use seq attribute to preserve element order|
+
+#### Examples
+
+Decode file using xml options
+```
+$ fq -d xml -o array=false -o seq=false . file
+```
+
+Decode value as xml
+```
+... | xml({array:false,seq:false})
+```
+
 ### zip

 Supports ZIP64.
--- a/doc/formats.svg
+++ b/doc/formats.svg
--- a/doc/usage.md
+++ b/doc/usage.md
@ -620,14 +620,14 @@ zip> ^D
 - `fromxmlentities` Decode XML entities.
 - `toxmlentities` Encode XML entities.
 - `fromurlpath` Decode URL path component.
- `tourlpath` Encode URL path component.
+- `tourlpath` Encode URL path component. Whitespace as %20.
 - `fromurlencode` Decode URL query encoding.
- `tourlencode` Encode URL to query encoding.
+- `tourlencode` Encode URL to query encoding. Whitespace as "+".
 - `fromurlquery` Decode URL query into object. For duplicates keys value will be an array.
 - `tourlquery` Encode objet into query string.
 - `fromurl` Decode URL into object.
  ```jq
-  > "schema://user:pass@host/path?key=value#fragement" | fromurl
+  > "schema://user:pass@host/path?key=value#fragment" | fromurl
  {
    "fragment": "fragement",
    "host": "host",
--- a/format/all/all.fqtest
+++ b/format/all/all.fqtest
@ -21,8 +21,11 @@ $ fq -n _registry.groups.probe
  "tiff",
  "webp",
  "zip",
+  "mp3",
  "mpeg_ts",
  "wav",
-  "mp3",
-  "json"
+  "json",
+  "toml",
+  "xml",
+  "yaml"
 ]
--- a/format/all/all.go
+++ b/format/all/all.go
@ -13,6 +13,8 @@ import (
 	_ "github.com/wader/fq/format/bson"
 	_ "github.com/wader/fq/format/bzip2"
 	_ "github.com/wader/fq/format/cbor"
+	_ "github.com/wader/fq/format/crypto"
+	_ "github.com/wader/fq/format/csv"
 	_ "github.com/wader/fq/format/dns"
 	_ "github.com/wader/fq/format/elf"
 	_ "github.com/wader/fq/format/fairplay"
@ -25,6 +27,7 @@ import (
 	_ "github.com/wader/fq/format/jpeg"
 	_ "github.com/wader/fq/format/json"
 	_ "github.com/wader/fq/format/macho"
+	_ "github.com/wader/fq/format/math"
 	_ "github.com/wader/fq/format/matroska"
 	_ "github.com/wader/fq/format/mp3"
 	_ "github.com/wader/fq/format/mp4"
@ -38,10 +41,14 @@ import (
 	_ "github.com/wader/fq/format/raw"
 	_ "github.com/wader/fq/format/rtmp"
 	_ "github.com/wader/fq/format/tar"
+	_ "github.com/wader/fq/format/text"
 	_ "github.com/wader/fq/format/tiff"
+	_ "github.com/wader/fq/format/toml"
 	_ "github.com/wader/fq/format/vorbis"
 	_ "github.com/wader/fq/format/vpx"
 	_ "github.com/wader/fq/format/wav"
 	_ "github.com/wader/fq/format/webp"
+	_ "github.com/wader/fq/format/xml"
+	_ "github.com/wader/fq/format/yaml"
 	_ "github.com/wader/fq/format/zip"
 )
--- a/format/all/help.fqtest
+++ b/format/all/help.fqtest
@ -250,6 +250,20 @@ out   ... | cbor | torepr
 out References and links
 out   https://en.wikipedia.org/wiki/CBOR
 out   https://www.rfc-editor.org/rfc/rfc8949.html
+"help(csv)"
+out csv: Comma separated values decoder
+out Options:
+out   comma=,    Separator character
+out   comment=#  Comment line character
+out Examples:
+out   # Decode file as csv
+out   $ fq -d csv . file
+out   # Decode value as csv
+out   ... | csv
+out   # Decode file using csv options
+out   $ fq -d csv -o comma="," -o comment="#" . file
+out   # Decode value as csv
+out   ... | csv({comma:",",comment:"#"})
 "help(dns)"
 out dns: DNS packet decoder
 out Examples:
@ -409,6 +423,20 @@ out   # Decode file as hevc_vps
 out   $ fq -d hevc_vps . file
 out   # Decode value as hevc_vps
 out   ... | hevc_vps
+"help(html)"
+out html: HyperText Markup Language decoder
+out Options:
+out   array=false  Decode as nested arrays
+out   seq=false    Use seq attribute to preserve element order
+out Examples:
+out   # Decode file as html
+out   $ fq -d html . file
+out   # Decode value as html
+out   ... | html
+out   # Decode file using html options
+out   $ fq -d html -o array=false -o seq=false . file
+out   # Decode value as html
+out   ... | html({array:false,seq:false})
 "help(icc_profile)"
 out icc_profile: International Color Consortium profile decoder
 out Examples:
@ -473,7 +501,7 @@ out   $ fq -d jpeg . file
 out   # Decode value as jpeg
 out   ... | jpeg
 "help(json)"
-out json: JSON decoder
+out json: JavaScript Object Notation decoder
 out Examples:
 out   # Decode file as json
 out   $ fq -d json . file
@ -726,6 +754,13 @@ out   # Decode file as tiff
 out   $ fq -d tiff . file
 out   # Decode value as tiff
 out   ... | tiff
+"help(toml)"
+out toml: Tom's Obvious, Minimal Language decoder
+out Examples:
+out   # Decode file as toml
+out   $ fq -d toml . file
+out   # Decode value as toml
+out   ... | toml
 "help(udp_datagram)"
 out udp_datagram: User datagram protocol decoder
 out Examples:
@ -796,6 +831,27 @@ out   # Decode file as xing
 out   $ fq -d xing . file
 out   # Decode value as xing
 out   ... | xing
+"help(xml)"
+out xml: Extensible Markup Language decoder
+out Options:
+out   array=false  Decode as nested arrays
+out   seq=false    Use seq attribute to preserve element order
+out Examples:
+out   # Decode file as xml
+out   $ fq -d xml . file
+out   # Decode value as xml
+out   ... | xml
+out   # Decode file using xml options
+out   $ fq -d xml -o array=false -o seq=false . file
+out   # Decode value as xml
+out   ... | xml({array:false,seq:false})
+"help(yaml)"
+out yaml: YAML Ain't Markup Language decoder
+out Examples:
+out   # Decode file as yaml
+out   $ fq -d yaml . file
+out   # Decode value as yaml
+out   ... | yaml
 "help(zip)"
 out zip: ZIP archive decoder
 out Supports ZIP64.
--- a/format/cbor/testdata/appendix_a.fqtest
+++ b/format/cbor/testdata/appendix_a.fqtest
@ -1,8 +1,5 @@
 # appendix_a.json from https://github.com/cbor/test-vectors
 # TODO: "w0kBAAAAAAAAAAA=" "wkkBAAAAAAAAAAA=" semantic bigint
-# NOTE: "O///////////" test uses bigint and is correct but test success currently relay on -18446744073709551616
-# in input json being turned into a float as it can't be represented in json and cbor decoded bigint will also be
-# converted to a float when comparing.
 $ fq -i -d json . appendix_a.json
 json> length
 82
@ -16,7 +13,7 @@ json> map(select(.decoded) | (.cbor | frombase64 | cbor | torepr) as $a | select
    },
    "test": {
      "cbor": "wkkBAAAAAAAAAAA=",
-      "decoded": 18446744073709552000,
+      "decoded": 18446744073709551616,
      "hex": "c249010000000000000000",
      "roundtrip": true
    }
@ -29,7 +26,7 @@ json> map(select(.decoded) | (.cbor | frombase64 | cbor | torepr) as $a | select
    },
    "test": {
      "cbor": "w0kBAAAAAAAAAAA=",
-      "decoded": -18446744073709552000,
+      "decoded": -18446744073709551617,
      "hex": "c349010000000000000000",
      "roundtrip": true
    }
--- a/format/crypto/hash.go
+++ b/format/crypto/hash.go
@ -0,0 +1,80 @@
+package crypto
+
+import (
+	"crypto/md5"
+	//nolint: gosec
+	"crypto/sha1"
+	"crypto/sha256"
+	"crypto/sha512"
+	"embed"
+	"fmt"
+	"hash"
+	"io"
+
+	"github.com/wader/fq/pkg/bitio"
+	"github.com/wader/fq/pkg/interp"
+
+	//nolint: staticcheck
+	"golang.org/x/crypto/md4"
+	"golang.org/x/crypto/sha3"
+)
+
+//go:embed hash.jq
+var hashFS embed.FS
+
+func init() {
+	interp.RegisterFunc1("_tohash", toHash)
+	interp.RegisterFS(hashFS)
+}
+
+func hashFn(s string) hash.Hash {
+	switch s {
+	case "md4":
+		return md4.New()
+	case "md5":
+		return md5.New()
+	case "sha1":
+		return sha1.New()
+	case "sha256":
+		return sha256.New()
+	case "sha512":
+		return sha512.New()
+	case "sha3_224":
+		return sha3.New224()
+	case "sha3_256":
+		return sha3.New256()
+	case "sha3_384":
+		return sha3.New384()
+	case "sha3_512":
+		return sha3.New512()
+	default:
+		return nil
+	}
+}
+
+type toHashOpts struct {
+	Name string
+}
+
+func toHash(_ *interp.Interp, c any, opts toHashOpts) any {
+	inBR, err := interp.ToBitReader(c)
+	if err != nil {
+		return err
+	}
+
+	h := hashFn(opts.Name)
+	if h == nil {
+		return fmt.Errorf("unknown hash function %s", opts.Name)
+	}
+	if _, err := io.Copy(h, bitio.NewIOReader(inBR)); err != nil {
+		return err
+	}
+
+	outBR := bitio.NewBitReader(h.Sum(nil), -1)
+
+	bb, err := interp.NewBinaryFromBitReader(outBR, 8, 0)
+	if err != nil {
+		return err
+	}
+	return bb
+}
--- a/format/crypto/hash.jq
+++ b/format/crypto/hash.jq
@ -0,0 +1,9 @@
+def tomd4: _tohash({name: "md4"});
+def tomd5: _tohash({name: "md5"});
+def tosha1: _tohash({name: "sha1"});
+def tosha256: _tohash({name: "sha256"});
+def tosha512: _tohash({name: "sha512"});
+def tosha3_224: _tohash({name: "sha3_224"});
+def tosha3_256: _tohash({name: "sha3_256"});
+def tosha3_384: _tohash({name: "sha3_384"});
+def tosha3_512: _tohash({name: "sha3_512"});
--- a/format/crypto/pem.go
+++ b/format/crypto/pem.go
@ -0,0 +1,14 @@
+package crypto
+
+import (
+	"embed"
+
+	"github.com/wader/fq/pkg/interp"
+)
+
+//go:embed pem.jq
+var pemFS embed.FS
+
+func init() {
+	interp.RegisterFS(pemFS)
+}
--- a/format/crypto/pem.jq
+++ b/format/crypto/pem.jq
@ -0,0 +1,20 @@
+# https://en.wikipedia.org/wiki/Privacy-Enhanced_Mail
+def frompem:
+  ( tobytes
+  | tostring
+  | capture("-----BEGIN(.*?)-----(?<s>.*?)-----END(.*?)-----"; "mg").s
+  | _frombase64({encoding: "std"})
+  ) // error("no pem header or footer found");
+
+def topem($label):
+  ( tobytes
+  | _tobase64({encoding: "std"})
+  | ($label | if $label != "" then " " + $label end) as $label
+  | [ "-----BEGIN\($label)-----"
+    , .
+    , "-----END\($label)-----"
+    , ""
+    ]
+  | join("\n")
+  );
+def topem: topem("");
--- a/pkg/interp/testdata/encoding/hash.fqtest
+++ b/pkg/interp/testdata/encoding/hash.fqtest
--- a/format/crypto/testdata/pem.fqtest
+++ b/format/crypto/testdata/pem.fqtest
@ -0,0 +1,7 @@
+$ fq -i
+null> "abc" | topem
+"-----BEGIN-----\nYWJj\n-----END-----\n"
+null> "abc" | topem | "before" + . + "between" + . + "after" | frompem | tostring
+"abc"
+"abc"
+null> ^D
--- a/format/csv/csv.go
+++ b/format/csv/csv.go
@ -0,0 +1,106 @@
+package csv
+
+import (
+	"bytes"
+	"embed"
+	"encoding/csv"
+	"errors"
+	"fmt"
+	"io"
+
+	"github.com/wader/fq/format"
+	"github.com/wader/fq/internal/gojqextra"
+	"github.com/wader/fq/pkg/bitio"
+	"github.com/wader/fq/pkg/decode"
+	"github.com/wader/fq/pkg/interp"
+	"github.com/wader/fq/pkg/scalar"
+)
+
+//go:embed csv.jq
+var csvFS embed.FS
+
+func init() {
+	interp.RegisterFormat(decode.Format{
+		Name:        format.CSV,
+		Description: "Comma separated values",
+		ProbeOrder:  format.ProbeOrderText,
+		DecodeFn:    decodeCSV,
+		DecodeInArg: format.CSVLIn{
+			Comma:   ",",
+			Comment: "#",
+		},
+		Functions: []string{"_todisplay"},
+		Files:     csvFS,
+	})
+	interp.RegisterFunc1("_tocsv", toCSV)
+}
+
+func decodeCSV(d *decode.D, in any) any {
+	ci, _ := in.(format.CSVLIn)
+
+	var rvs []any
+	br := d.RawLen(d.Len())
+	r := csv.NewReader(bitio.NewIOReader(br))
+	r.TrimLeadingSpace = true
+	r.LazyQuotes = true
+	if ci.Comma != "" {
+		r.Comma = rune(ci.Comma[0])
+	}
+	if ci.Comment != "" {
+		r.Comment = rune(ci.Comment[0])
+	}
+	for {
+		r, err := r.Read()
+		if errors.Is(err, io.EOF) {
+			break
+		} else if err != nil {
+			return err
+		}
+		var vs []any
+		for _, s := range r {
+			vs = append(vs, s)
+		}
+		rvs = append(rvs, vs)
+	}
+
+	d.Value.V = &scalar.S{Actual: rvs}
+	d.Value.Range.Len = d.Len()
+
+	return nil
+}
+
+type ToCSVOpts struct {
+	Comma string
+}
+
+func toCSV(_ *interp.Interp, c []any, opts ToCSVOpts) any {
+	b := &bytes.Buffer{}
+	w := csv.NewWriter(b)
+	if opts.Comma != "" {
+		w.Comma = rune(opts.Comma[0])
+	}
+	for _, row := range c {
+		rs, ok := gojqextra.Cast[[]any](row)
+		if !ok {
+			return fmt.Errorf("expected row to be an array, got %s", gojqextra.TypeErrorPreview(row))
+		}
+		vs, ok := gojqextra.NormalizeToStrings(rs).([]any)
+		if !ok {
+			panic("not array")
+		}
+		var ss []string
+		for _, v := range vs {
+			s, ok := v.(string)
+			if !ok {
+				return fmt.Errorf("expected row record to be scalars, got %s", gojqextra.TypeErrorPreview(v))
+			}
+			ss = append(ss, s)
+		}
+		if err := w.Write(ss); err != nil {
+			return err
+		}
+	}
+	w.Flush()
+
+	return b.String()
+}
--- a/format/csv/csv.jq
+++ b/format/csv/csv.jq
@ -0,0 +1,3 @@
+def tocsv($opts): _tocsv($opts);
+def tocsv: _tocsv(null);
+def _csv__todisplay: tovalue;
--- a/pkg/interp/testdata/encoding/csv.fqtest
+++ b/pkg/interp/testdata/encoding/csv.fqtest
@ -1,3 +1,13 @@
+/test:
+1,2,3
+$ fq -d csv . /test
+[
+  [
+    "1",
+    "2",
+    "3"
+  ]
+]
 $ fq -i
 null> "a,b,c,d" | fromcsv | ., tocsv
 [
@ -27,4 +37,11 @@ null> "a\t\"b\t c\"\td" | fromcsv({comma:"\t"}) | ., tocsv({comma: "\t"})
  ]
 ]
 "a\t\"b\t c\"\td\n"
+null> [[bsl(1;100)]] | tocsv | ., fromcsv
+"1267650600228229401496703205376\n"
+[
+  [
+    "1267650600228229401496703205376"
+  ]
+]
 null> ^D
--- a/format/format.go
+++ b/format/format.go
@ -1,5 +1,13 @@
 package format

+// TODO: do before-format somehow and topology sort?
+const (
+	ProbeOrderBinUnique = 0   // binary with unlikely overlap
+	ProbeOrderBinFuzzy  = 50  // binary with possible overlap
+	ProbeOrderText      = 100 // text format
+)
+
+// TODO: change to CamelCase?
 //nolint:revive
 const (
 	ALL = "all"
@ -39,6 +47,7 @@ const (
 	BSON                = "bson"
 	BZIP2               = "bzip2"
 	CBOR                = "cbor"
+	CSV                 = "csv"
 	DNS                 = "dns"
 	DNS_TCP             = "dns_tcp"
 	ELF                 = "elf"
@ -61,6 +70,7 @@ const (
 	HEVC_PPS            = "hevc_pps"
 	HEVC_SPS            = "hevc_sps"
 	HEVC_VPS            = "hevc_vps"
+	HTML                = "html"
 	ICC_PROFILE         = "icc_profile"
 	ICMP                = "icmp"
 	ICMPV6              = "icmpv6"
@ -99,6 +109,7 @@ const (
 	TAR                 = "tar"
 	TCP_SEGMENT         = "tcp_segment"
 	TIFF                = "tiff"
+	TOML                = "toml"
 	UDP_DATAGRAM        = "udp_datagram"
 	VORBIS_COMMENT      = "vorbis_comment"
 	VORBIS_PACKET       = "vorbis_packet"
@ -109,6 +120,8 @@ const (
 	WAV                 = "wav"
 	WEBP                = "webp"
 	XING                = "xing"
+	XML                 = "xml"
+	YAML                = "yaml"
 	ZIP                 = "zip"
 )

@ -274,3 +287,18 @@ type Mp4In struct {
 type ZipIn struct {
 	Uncompress bool `doc:"Uncompress and probe files"`
 }
+
+type XMLIn struct {
+	Seq   bool `doc:"Use seq attribute to preserve element order"`
+	Array bool `doc:"Decode as nested arrays"`
+}
+
+type HTMLIn struct {
+	Seq   bool `doc:"Use seq attribute to preserve element order"`
+	Array bool `doc:"Decode as nested arrays"`
+}
+
+type CSVLIn struct {
+	Comma   string `doc:"Separator character"`
+	Comment string `doc:"Comment line character"`
+}
--- a/format/json/jq.go
+++ b/format/json/jq.go
@ -0,0 +1,14 @@
+package json
+
+import (
+	"embed"
+
+	"github.com/wader/fq/pkg/interp"
+)
+
+//go:embed jq.jq
+var jqFS embed.FS
+
+func init() {
+	interp.RegisterFS(jqFS)
+}
--- a/format/json/jq.jq
+++ b/format/json/jq.jq
@ -0,0 +1,96 @@
+# to jq-flavoured json
+def _tojq($opts):
+  def _is_ident: test("^[a-zA-Z_][a-zA-Z_0-9]*$");
+  def _key: if _is_ident | not then tojson end;
+  def _f($opts; $indent):
+    def _r($prefix):
+      ( type as $t
+      | if $t == "null" then tojson
+        elif $t == "string" then tojson
+        elif $t == "number" then tojson
+        elif $t == "boolean" then tojson
+        elif $t == "array" then
+          if length == 0 then "[]"
+          else
+            [ "[", $opts.compound_newline
+            , ( [ .[]
+                | $prefix, $indent
+                , _r($prefix+$indent), $opts.array_sep
+                ]
+              | .[0:-1]
+              )
+            , $opts.compound_newline
+            , $prefix, "]"
+            ]
+          end
+        elif $t == "object" then
+          if length == 0 then "{}"
+          else
+            [ "{", $opts.compound_newline
+            , ( [ to_entries[]
+                | $prefix, $indent
+                , (.key | _key), $opts.key_sep
+                , (.value | _r($prefix+$indent)), $opts.object_sep
+                ]
+              | .[0:-1]
+              )
+            , $opts.compound_newline
+            , $prefix, "}"
+            ]
+          end
+        else error("unknown type \($t)")
+        end
+      );
+    _r("");
+  ( _f($opts; $opts.indent * " ")
+  | if _is_array then flatten | join("") end
+  );
+def tojq($opts):
+  _tojq(
+    ( { indent: 0,
+        key_sep: ":",
+        object_sep: ",",
+        array_sep: ",",
+        compound_newline: "",
+      } + $opts
+    | if .indent > 0  then
+        ( .key_sep = ": "
+        | .object_sep = ",\n"
+        | .array_sep = ",\n"
+        | .compound_newline = "\n"
+        )
+      end
+    )
+  );
+def tojq: tojq(null);
+
+# from jq-flavoured json
+def fromjq:
+  def _f:
+    ( . as $v
+    | .term.type
+    | if . == "TermTypeNull" then null
+      elif . == "TermTypeTrue" then true
+      elif . == "TermTypeFalse" then false
+      elif . == "TermTypeString" then $v.term.str.str
+      elif . == "TermTypeNumber" then $v.term.number | tonumber
+      elif . == "TermTypeObject" then
+        ( $v.term.object.key_vals // []
+        | map(
+            { key: (.key // .key_string.str),
+              value: (.val.queries[0] | _f)
+            }
+          )
+        | from_entries
+        )
+      elif . == "TermTypeArray" then
+        ( def _a: if .op then .left, .right | _a end;
+          [$v.term.array.query // empty | _a | _f]
+        )
+      else error("unknown term")
+      end
+    );
+  try
+    (_query_fromstring | _f)
+  catch
+    error("fromjq only supports constant literals");
--- a/format/json/json.go
+++ b/format/json/json.go
@ -1,46 +1,93 @@
 package json

 import (
+	"bytes"
+	"embed"
 	stdjson "encoding/json"
+	"errors"
+	"fmt"
+	"io"
+	"math/big"

 	"github.com/wader/fq/format"
+	"github.com/wader/fq/internal/colorjson"
 	"github.com/wader/fq/pkg/bitio"
 	"github.com/wader/fq/pkg/decode"
 	"github.com/wader/fq/pkg/interp"
 	"github.com/wader/fq/pkg/scalar"
+	"github.com/wader/gojq"
 )

-// TODO: should read multiple json values or just one?
-// TODO: root not array/struct how to add unknown gaps?
-// TODO: ranges not end up correct
-// TODO: use jd.InputOffset() * 8?
+//go:embed json.jq
+var jsonFS embed.FS

 func init() {
 	interp.RegisterFormat(decode.Format{
 		Name:        format.JSON,
-		Description: "JSON",
-		ProbeOrder:  100, // last
+		Description: "JavaScript Object Notation",
+		ProbeOrder:  format.ProbeOrderText,
 		Groups:      []string{format.PROBE},
 		DecodeFn:    decodeJSON,
+		Functions:   []string{"_todisplay"},
+		Files:       jsonFS,
 	})
+	interp.RegisterFunc1("_tojson", toJSON)
 }

 func decodeJSON(d *decode.D, _ any) any {
 	br := d.RawLen(d.Len())
+
+	// keep in sync with gojq fromJSON
 	jd := stdjson.NewDecoder(bitio.NewIOReader(br))
+	jd.UseNumber()
 	var s scalar.S
 	if err := jd.Decode(&s.Actual); err != nil {
 		d.Fatalf(err.Error())
 	}
-	switch s.Actual.(type) {
-	case map[string]any,
-		[]any:
-	default:
-		d.Fatalf("root not object or array")
+	if err := jd.Decode(new(any)); !errors.Is(err, io.EOF) {
+		d.Fatalf("trialing data after top-level value")
 	}

+	s.Actual = gojq.NormalizeNumbers(s.Actual)
+
+	// switch s.Actual.(type) {
+	// case map[string]any,
+	// 	[]any:
+	// default:
+	// 	d.Fatalf("top-level not object or array")
+	// }
+
 	d.Value.V = &s
 	d.Value.Range.Len = d.Len()

 	return nil
 }
+
+type ToJSONOpts struct {
+	Indent int
+}
+
+func toJSON(_ *interp.Interp, c any, opts ToJSONOpts) any {
+	// TODO: share
+	cj := colorjson.NewEncoder(
+		false,
+		false,
+		opts.Indent,
+		func(v any) any {
+			switch v := v.(type) {
+			case gojq.JQValue:
+				return v.JQValueToGoJQ()
+			case nil, bool, float64, int, string, *big.Int, map[string]any, []any:
+				return v
+			default:
+				panic(fmt.Sprintf("toValue not a JQValue value: %#v %T", v, v))
+			}
+		},
+		colorjson.Colors{},
+	)
+	bb := &bytes.Buffer{}
+	if err := cj.Marshal(c, bb); err != nil {
+		return err
+	}
+	return bb.String()
+}
--- a/format/json/json.jq
+++ b/format/json/json.jq
@ -0,0 +1,3 @@
+def tojson($opts): _tojson($opts);
+def tojson: _tojson(null);
+def _json__todisplay: tovalue;
--- a/format/json/testdata/bigint.fqtest
+++ b/format/json/testdata/bigint.fqtest
@ -0,0 +1,10 @@
+$ fq -n "{a: bsl(1;100)} | tojq | ., fromjq"
+"{a:1267650600228229401496703205376}"
+{
+  "a": 1267650600228229401496703205376
+}
+$ fq -n "{a: bsl(1;100)} | tojson | ., fromjson"
+"{\"a\":1267650600228229401496703205376}"
+{
+  "a": 1267650600228229401496703205376
+}
--- a/pkg/interp/testdata/encoding/jq.fqtest
+++ b/pkg/interp/testdata/encoding/jq.fqtest
@ -134,3 +134,15 @@ string
  "white space": 123
 }
 ----
+[]
+[]
+----
+[]
+[]
+----
+{}
+{}
+----
+{}
+{}
+----
--- a/format/json/testdata/json.fqtest
+++ b/format/json/testdata/json.fqtest
@ -1,60 +1,158 @@
-$ fq -d json . test.json
-    |00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f|0123456789abcdef|
-0x00|7b 0a 20 20 20 20 22 61 22 3a 20 31 32 33 2c 0a|{.    "a": 123,.|.: {} (json)
-*   |until 0x74.7 (end) (117)                       |                |
-$ fq -d json tovalue test.json
+/probe.json:
+{"a": 123}
+/probe_scalar.json:
+123
+$ fq . /probe.json
 {
-  "a": 123,
-  "b": [
-    1,
-    2,
-    3
+  "a": 123
+}
+$ fq . /probe_scalar.json
+123
+$ fq -rRs 'fromjson[] | (tojson | ., fromjson), "----", (tojson({indent:2}) | ., fromjson), "----"' variants.json
+null
+null
+----
+null
+null
+----
+true
+true
+----
+true
+true
+----
+false
+false
+----
+false
+false
+----
+123
+123
+----
+123
+123
+----
+123.123
+123.123
+----
+123.123
+123.123
+----
+"string"
+string
+----
+"string"
+string
+----
+[1,2,3]
+[
+  1,
+  2,
+  3
+]
+----
+[
+  1,
+  2,
+  3
+]
+[
+  1,
+  2,
+  3
+]
+----
+{"array":[true,false,null,1.2,"string",[1.2,3],{"a":1}],"escape \\\"":456,"false":false,"null":null,"number":1.2,"object":{"a":1},"string":"string","true":true,"white space":123}
+{
+  "array": [
+    true,
+    false,
+    null,
+    1.2,
+    "string",
+    [
+      1.2,
+      3
+    ],
+    {
+      "a": 1
+    }
  ],
-  "c:": "string",
-  "d": null,
-  "e": 123.4
-}
-$ fq . test.json
-    |00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f|0123456789abcdef|
-0x00|7b 0a 20 20 20 20 22 61 22 3a 20 31 32 33 2c 0a|{.    "a": 123,.|.: {} (json)
-*   |until 0x74.7 (end) (117)                       |                |
-$ fq .b[1] test.json
-2
-$ fq . json.gz
-    |00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f|0123456789abcdef|.{}: json.gz (gzip)
-0x00|1f 8b                                          |..              |  identification: raw bits (valid)
-0x00|      08                                       |  .             |  compression_method: "deflate" (8)
-0x00|         00                                    |   .            |  flags{}:
-0x00|            65 0a 08 61                        |    e..a        |  mtime: 1627916901 (2021-08-02T15:08:21Z)
-0x00|                        00                     |        .       |  extra_flags: 0
-0x00|                           03                  |         .      |  os: "unix" (3)
- 0x0|7b 22 61 22 3a 20 31 32 33 7d 0a|              |{"a": 123}.|    |  uncompressed: {} (json)
-0x00|                              ab 56 4a 54 b2 52|          .VJT.R|  compressed: raw bits
-0x10|30 34 32 ae e5 02 00                           |042....         |
-0x10|                     20 ac d2 9c               |        ...     |  crc32: 0x9cd2ac20 (valid)
-0x10|                                 0b 00 00 00|  |           ....||  isize: 11
-$ fq tovalue json.gz
-{
-  "compressed": "<13>q1ZKVLJSMDQyruUCAA==",
-  "compression_method": "deflate",
-  "crc32": 2631052320,
-  "extra_flags": 0,
-  "flags": {
-    "comment": false,
-    "extra": false,
-    "header_crc": false,
-    "name": false,
-    "reserved": 0,
-    "text": false
+  "escape \\\"": 456,
+  "false": false,
+  "null": null,
+  "number": 1.2,
+  "object": {
+    "a": 1
  },
-  "identification": "<2>H4s=",
-  "isize": 11,
-  "mtime": 1627916901,
-  "os": "unix",
-  "uncompressed": {
-    "a": 123
-  }
+  "string": "string",
+  "true": true,
+  "white space": 123
 }
-$ fq .uncompressed json.gz
-   |00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f|0123456789abcdef|
-0x0|7b 22 61 22 3a 20 31 32 33 7d 0a|              |{"a": 123}.|    |.uncompressed: {} (json)
+----
+{
+  "array": [
+    true,
+    false,
+    null,
+    1.2,
+    "string",
+    [
+      1.2,
+      3
+    ],
+    {
+      "a": 1
+    }
+  ],
+  "escape \\\"": 456,
+  "false": false,
+  "null": null,
+  "number": 1.2,
+  "object": {
+    "a": 1
+  },
+  "string": "string",
+  "true": true,
+  "white space": 123
+}
+{
+  "array": [
+    true,
+    false,
+    null,
+    1.2,
+    "string",
+    [
+      1.2,
+      3
+    ],
+    {
+      "a": 1
+    }
+  ],
+  "escape \\\"": 456,
+  "false": false,
+  "null": null,
+  "number": 1.2,
+  "object": {
+    "a": 1
+  },
+  "string": "string",
+  "true": true,
+  "white space": 123
+}
+----
+[]
+[]
+----
+[]
+[]
+----
+{}
+{}
+----
+{}
+{}
+----
--- a/format/json/testdata/tofromjson.fqtest
+++ b/format/json/testdata/tofromjson.fqtest
@ -0,0 +1,77 @@
+$ fq -d json . test.json
+{
+  "a": 123,
+  "b": [
+    1,
+    2,
+    3
+  ],
+  "c:": "string",
+  "d": null,
+  "e": 123.4
+}
+$ fq -d json tovalue test.json
+{
+  "a": 123,
+  "b": [
+    1,
+    2,
+    3
+  ],
+  "c:": "string",
+  "d": null,
+  "e": 123.4
+}
+$ fq . test.json
+{
+  "a": 123,
+  "b": [
+    1,
+    2,
+    3
+  ],
+  "c:": "string",
+  "d": null,
+  "e": 123.4
+}
+$ fq .b[1] test.json
+2
+$ fq . json.gz
+    |00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f|0123456789abcdef|.{}: json.gz (gzip)
+0x00|1f 8b                                          |..              |  identification: raw bits (valid)
+0x00|      08                                       |  .             |  compression_method: "deflate" (8)
+0x00|         00                                    |   .            |  flags{}:
+0x00|            65 0a 08 61                        |    e..a        |  mtime: 1627916901 (2021-08-02T15:08:21Z)
+0x00|                        00                     |        .       |  extra_flags: 0
+0x00|                           03                  |         .      |  os: "unix" (3)
+ 0x0|7b 22 61 22 3a 20 31 32 33 7d 0a|              |{"a": 123}.|    |  uncompressed: {} (json)
+0x00|                              ab 56 4a 54 b2 52|          .VJT.R|  compressed: raw bits
+0x10|30 34 32 ae e5 02 00                           |042....         |
+0x10|                     20 ac d2 9c               |        ...     |  crc32: 0x9cd2ac20 (valid)
+0x10|                                 0b 00 00 00|  |           ....||  isize: 11
+$ fq tovalue json.gz
+{
+  "compressed": "<13>q1ZKVLJSMDQyruUCAA==",
+  "compression_method": "deflate",
+  "crc32": 2631052320,
+  "extra_flags": 0,
+  "flags": {
+    "comment": false,
+    "extra": false,
+    "header_crc": false,
+    "name": false,
+    "reserved": 0,
+    "text": false
+  },
+  "identification": "<2>H4s=",
+  "isize": 11,
+  "mtime": 1627916901,
+  "os": "unix",
+  "uncompressed": {
+    "a": 123
+  }
+}
+$ fq .uncompressed json.gz
+{
+  "a": 123
+}
--- a/format/json/testdata/trailing.fqtest
+++ b/format/json/testdata/trailing.fqtest
@ -0,0 +1,4 @@
+$ fq -n '"123 trailing" | fromjson._error.error'
+exitcode: 5
+stderr:
+error: error at position 0xc: trialing data after top-level value
--- a/pkg/interp/testdata/encoding/variants.json
+++ b/pkg/interp/testdata/encoding/variants.json
@ -16,5 +16,7 @@
        "string": "string",
        "true": true,
        "white space": 123
-    }
+    },
+    [],
+    {}
 ]
--- a/format/math/radix.go
+++ b/format/math/radix.go
@ -0,0 +1,14 @@
+package math
+
+import (
+	"embed"
+
+	"github.com/wader/fq/pkg/interp"
+)
+
+//go:embed radix.jq
+var radixFS embed.FS
+
+func init() {
+	interp.RegisterFS(radixFS)
+}
--- a/format/math/radix.jq
+++ b/format/math/radix.jq
@ -0,0 +1,45 @@
+def fromradix($base; $table):
+  ( if _is_string | not then error("cannot fromradix convert: \(.)") end
+  | split("")
+  | reverse
+  | map($table[.])
+  | if . == null then error("invalid char \(.)") end
+  # state: [power, ans]
+  | reduce .[] as $c ([1,0];
+      ( (.[0] * $base) as $b
+      | [$b, .[1] + (.[0] * $c)]
+      )
+    )
+  | .[1]
+  );
+def fromradix($base):
+  fromradix($base; {
+    "0": 0, "1": 1, "2": 2, "3": 3,"4": 4, "5": 5, "6": 6, "7": 7, "8": 8, "9": 9,
+    "a": 10, "b": 11, "c": 12, "d": 13, "e": 14, "f": 15, "g": 16,
+    "h": 17, "i": 18, "j": 19, "k": 20, "l": 21, "m": 22, "n": 23,
+    "o": 24, "p": 25, "q": 26, "r": 27, "s": 28, "t": 29, "u": 30,
+    "v": 31, "w": 32, "x": 33, "y": 34, "z": 35,
+    "A": 36, "B": 37, "C": 38, "D": 39, "E": 40, "F": 41, "G": 42,
+    "H": 43, "I": 44, "J": 45, "K": 46, "L": 47, "M": 48, "N": 49,
+    "O": 50, "P": 51, "Q": 52, "R": 53, "S": 54, "T": 55, "U": 56,
+    "V": 57, "W": 58, "X": 59, "Y": 60, "Z": 61,
+    "@": 62, "_": 63,
+  });
+
+def toradix($base; $table):
+  ( if type != "number" then error("cannot toradix convert: \(.)") end
+  | if . == 0 then "0"
+    else
+      ( [ recurse(if . > 0 then _intdiv(.; $base) else empty end) | . % $base]
+      | reverse
+      | .[1:]
+      | if $base <= ($table | length) then
+          map($table[.]) | join("")
+        else
+          error("base too large")
+        end
+      )
+    end
+  );
+def toradix($base):
+  toradix($base; "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_");
--- a/format/math/testdata/radix.fqtest
+++ b/format/math/testdata/radix.fqtest
@ -1,10 +1,4 @@
 $ fq -i
-null> "abc" | topem
-"-----BEGIN-----\nYWJj\n-----END-----\n"
-null> "abc" | topem | "before" + . + "between" + . + "after" | frompem | tostring
-"abc"
-"abc"
-null> 
 null> (0,1,1024,99999999999999999999) as $n | (2,8,16,62,64) as $r | "\($r): \($n) \($n | toradix($r)) \($n | toradix($r) | fromradix($r))" | println
 2: 0 0 0
 8: 0 0 0
--- a/format/mp3/mp3.go
+++ b/format/mp3/mp3.go
@ -17,7 +17,7 @@ var mp3Frame decode.Group
 func init() {
 	interp.RegisterFormat(decode.Format{
 		Name:        format.MP3,
-		ProbeOrder:  20, // after most others (silent samples and jpeg header can look like mp3 sync)
+		ProbeOrder:  format.ProbeOrderBinFuzzy, // after most others (silent samples and jpeg header can look like mp3 sync)
 		Description: "MP3 file",
 		Groups:      []string{format.PROBE},
 		DecodeFn:    mp3Decode,
--- a/format/mpeg/mpeg_ts.go
+++ b/format/mpeg/mpeg_ts.go
@ -10,7 +10,7 @@ import (
 func init() {
 	interp.RegisterFormat(decode.Format{
 		Name:        format.MPEG_TS,
-		ProbeOrder:  10, // make sure to be after gif, both start with 0x47
+		ProbeOrder:  format.ProbeOrderBinFuzzy, // make sure to be after gif, both start with 0x47
 		Description: "MPEG Transport Stream",
 		Groups:      []string{format.PROBE},
 		DecodeFn:    tsDecode,
--- a/format/mpeg/shared.go
+++ b/format/mpeg/shared.go
@ -16,7 +16,7 @@ func init() {
 }

 // transform to binary using fn
-func makeBinaryTransformFn(fn func(r io.Reader) (io.Reader, error)) func(i *interp.Interp, c any) any {
+func makeBinaryTransformFn(fn func(r io.Reader) (io.Reader, error)) func(_ *interp.Interp, c any) any {
 	return func(_ *interp.Interp, c any) any {
 		inBR, err := interp.ToBitReader(c)
 		if err != nil {
--- a/format/text/encoding.go
+++ b/format/text/encoding.go
@ -0,0 +1,242 @@
+package text
+
+import (
+	"bytes"
+	"embed"
+	"encoding/base64"
+	"encoding/hex"
+	"fmt"
+	"io"
+	"strings"
+
+	"github.com/wader/fq/pkg/bitio"
+	"github.com/wader/fq/pkg/interp"
+	"golang.org/x/text/encoding"
+	"golang.org/x/text/encoding/charmap"
+	"golang.org/x/text/encoding/unicode"
+)
+
+//go:embed encoding.jq
+var textFS embed.FS
+
+func init() {
+	interp.RegisterFunc0("fromhex", func(_ *interp.Interp, c string) any {
+		b, err := hex.DecodeString(c)
+		if err != nil {
+			return err
+		}
+		bb, err := interp.NewBinaryFromBitReader(bitio.NewBitReader(b, -1), 8, 0)
+		if err != nil {
+			return err
+		}
+		return bb
+	})
+	interp.RegisterFunc0("tohex", func(_ *interp.Interp, c string) any {
+		br, err := interp.ToBitReader(c)
+		if err != nil {
+			return err
+		}
+		buf := &bytes.Buffer{}
+		if _, err := io.Copy(hex.NewEncoder(buf), bitio.NewIOReader(br)); err != nil {
+			return err
+		}
+		return buf.String()
+	})
+
+	// TODO: other encodings and share?
+	base64Encoding := func(enc string) *base64.Encoding {
+		switch enc {
+		case "url":
+			return base64.URLEncoding
+		case "rawstd":
+			return base64.RawStdEncoding
+		case "rawurl":
+			return base64.RawURLEncoding
+		default:
+			return base64.StdEncoding
+		}
+	}
+	type fromBase64Opts struct {
+		Encoding string
+	}
+	interp.RegisterFunc1("_frombase64", func(_ *interp.Interp, c string, opts fromBase64Opts) any {
+		b, err := base64Encoding(opts.Encoding).DecodeString(c)
+		if err != nil {
+			return err
+		}
+		bin, err := interp.NewBinaryFromBitReader(bitio.NewBitReader(b, -1), 8, 0)
+		if err != nil {
+			return err
+		}
+		return bin
+	})
+	type toBase64Opts struct {
+		Encoding string
+	}
+	interp.RegisterFunc1("_tobase64", func(_ *interp.Interp, c string, opts toBase64Opts) any {
+		br, err := interp.ToBitReader(c)
+		if err != nil {
+			return err
+		}
+		bb := &bytes.Buffer{}
+		wc := base64.NewEncoder(base64Encoding(opts.Encoding), bb)
+		if _, err := io.Copy(wc, bitio.NewIOReader(br)); err != nil {
+			return err
+		}
+		wc.Close()
+		return bb.String()
+	})
+
+	strEncoding := func(s string) encoding.Encoding {
+		switch s {
+		case "UTF8":
+			return unicode.UTF8
+		case "UTF16":
+			return unicode.UTF16(unicode.LittleEndian, unicode.UseBOM)
+		case "UTF16LE":
+			return unicode.UTF16(unicode.LittleEndian, unicode.IgnoreBOM)
+		case "UTF16BE":
+			return unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
+		case "CodePage037":
+			return charmap.CodePage037
+		case "CodePage437":
+			return charmap.CodePage437
+		case "CodePage850":
+			return charmap.CodePage850
+		case "CodePage852":
+			return charmap.CodePage852
+		case "CodePage855":
+			return charmap.CodePage855
+		case "CodePage858":
+			return charmap.CodePage858
+		case "CodePage860":
+			return charmap.CodePage860
+		case "CodePage862":
+			return charmap.CodePage862
+		case "CodePage863":
+			return charmap.CodePage863
+		case "CodePage865":
+			return charmap.CodePage865
+		case "CodePage866":
+			return charmap.CodePage866
+		case "CodePage1047":
+			return charmap.CodePage1047
+		case "CodePage1140":
+			return charmap.CodePage1140
+		case "ISO8859_1":
+			return charmap.ISO8859_1
+		case "ISO8859_2":
+			return charmap.ISO8859_2
+		case "ISO8859_3":
+			return charmap.ISO8859_3
+		case "ISO8859_4":
+			return charmap.ISO8859_4
+		case "ISO8859_5":
+			return charmap.ISO8859_5
+		case "ISO8859_6":
+			return charmap.ISO8859_6
+		case "ISO8859_6E":
+			return charmap.ISO8859_6E
+		case "ISO8859_6I":
+			return charmap.ISO8859_6I
+		case "ISO8859_7":
+			return charmap.ISO8859_7
+		case "ISO8859_8":
+			return charmap.ISO8859_8
+		case "ISO8859_8E":
+			return charmap.ISO8859_8E
+		case "ISO8859_8I":
+			return charmap.ISO8859_8I
+		case "ISO8859_9":
+			return charmap.ISO8859_9
+		case "ISO8859_10":
+			return charmap.ISO8859_10
+		case "ISO8859_13":
+			return charmap.ISO8859_13
+		case "ISO8859_14":
+			return charmap.ISO8859_14
+		case "ISO8859_15":
+			return charmap.ISO8859_15
+		case "ISO8859_16":
+			return charmap.ISO8859_16
+		case "KOI8R":
+			return charmap.KOI8R
+		case "KOI8U":
+			return charmap.KOI8U
+		case "Macintosh":
+			return charmap.Macintosh
+		case "MacintoshCyrillic":
+			return charmap.MacintoshCyrillic
+		case "Windows874":
+			return charmap.Windows874
+		case "Windows1250":
+			return charmap.Windows1250
+		case "Windows1251":
+			return charmap.Windows1251
+		case "Windows1252":
+			return charmap.Windows1252
+		case "Windows1253":
+			return charmap.Windows1253
+		case "Windows1254":
+			return charmap.Windows1254
+		case "Windows1255":
+			return charmap.Windows1255
+		case "Windows1256":
+			return charmap.Windows1256
+		case "Windows1257":
+			return charmap.Windows1257
+		case "Windows1258":
+			return charmap.Windows1258
+		case "XUserDefined":
+			return charmap.XUserDefined
+		default:
+			return nil
+		}
+	}
+
+	type toStrEncodingOpts struct {
+		Encoding string
+	}
+	interp.RegisterFunc1("_tostrencoding", func(_ *interp.Interp, c string, opts toStrEncodingOpts) any {
+		h := strEncoding(opts.Encoding)
+		if h == nil {
+			return fmt.Errorf("unknown string encoding %s", opts.Encoding)
+		}
+
+		bb := &bytes.Buffer{}
+		if _, err := io.Copy(h.NewEncoder().Writer(bb), strings.NewReader(c)); err != nil {
+			return err
+		}
+		outBR := bitio.NewBitReader(bb.Bytes(), -1)
+		bin, err := interp.NewBinaryFromBitReader(outBR, 8, 0)
+		if err != nil {
+			return err
+		}
+
+		return bin
+	})
+
+	type fromStrEncodingOpts struct {
+		Encoding string
+	}
+	interp.RegisterFunc1("_fromstrencoding", func(_ *interp.Interp, c any, opts fromStrEncodingOpts) any {
+		inBR, err := interp.ToBitReader(c)
+		if err != nil {
+			return err
+		}
+		h := strEncoding(opts.Encoding)
+		if h == nil {
+			return fmt.Errorf("unknown string encoding %s", opts.Encoding)
+		}
+
+		bb := &bytes.Buffer{}
+		if _, err := io.Copy(bb, h.NewDecoder().Reader(bitio.NewIOReader(inBR))); err != nil {
+
+			return err
+		}
+
+		return bb.String()
+	})
+
+	interp.RegisterFS(textFS)
+}
--- a/format/text/encoding.jq
+++ b/format/text/encoding.jq
@ -0,0 +1,19 @@
+def toiso8859_1: _tostrencoding({encoding: "ISO8859_1"});
+def fromiso8859_1: _fromstrencoding({encoding: "ISO8859_1"});
+def toutf8: _tostrencoding({encoding: "UTF8"});
+def fromutf8: _fromstrencoding({encoding: "UTF8"});
+def toutf16: _tostrencoding({encoding: "UTF16"});
+def fromutf16: _fromstrencoding({encoding: "UTF16"});
+def toutf16le: _tostrencoding({encoding: "UTF16LE"});
+def fromutf16le: _fromstrencoding({encoding: "UTF16LE"});
+def toutf16be: _tostrencoding({encoding: "UTF16BE"});
+def fromutf16be: _fromstrencoding({encoding: "UTF16BE"});
+
+def frombase64($opts): _frombase64({encoding: "std"} + $opts);
+def frombase64: _frombase64(null);
+def tobase64($opts): _tobase64({encoding: "std"} + $opts);
+def tobase64: _tobase64(null);
+
+# TODO: compat: remove at some point
+def hex: _binary_or_orig(tohex; fromhex);
+def base64: _binary_or_orig(tobase64; frombase64);
--- a/pkg/interp/testdata/encoding/base64.fqtest
+++ b/pkg/interp/testdata/encoding/base64.fqtest
--- a/pkg/interp/testdata/encoding/string.fqtest
+++ b/pkg/interp/testdata/encoding/string.fqtest
--- a/pkg/interp/testdata/encoding/hex.fqtest
+++ b/pkg/interp/testdata/encoding/hex.fqtest
--- a/pkg/interp/testdata/encoding/url.fqtest
+++ b/pkg/interp/testdata/encoding/url.fqtest
--- a/pkg/interp/testdata/encoding/urlencode.fqtest
+++ b/pkg/interp/testdata/encoding/urlencode.fqtest
--- a/pkg/interp/testdata/encoding/urlpath.fqtest
+++ b/pkg/interp/testdata/encoding/urlpath.fqtest
--- a/pkg/interp/testdata/encoding/urlquery.fqtest
+++ b/pkg/interp/testdata/encoding/urlquery.fqtest
--- a/format/text/url.go
+++ b/format/text/url.go
@ -0,0 +1,153 @@
+package text
+
+import (
+	"net/url"
+
+	"github.com/wader/fq/internal/gojqextra"
+	"github.com/wader/fq/pkg/interp"
+)
+
+func init() {
+	interp.RegisterFunc0("fromurlencode", func(_ *interp.Interp, c string) any {
+		u, err := url.QueryUnescape(c)
+		if err != nil {
+			return err
+		}
+		return u
+	})
+	interp.RegisterFunc0("tourlencode", func(_ *interp.Interp, c string) any {
+		return url.QueryEscape(c)
+	})
+
+	interp.RegisterFunc0("fromurlpath", func(_ *interp.Interp, c string) any {
+		u, err := url.PathUnescape(c)
+		if err != nil {
+			return err
+		}
+		return u
+	})
+	interp.RegisterFunc0("tourlpath", func(_ *interp.Interp, c string) any {
+		return url.PathEscape(c)
+	})
+
+	fromURLValues := func(q url.Values) any {
+		qm := map[string]any{}
+		for k, v := range q {
+			if len(v) > 1 {
+				vm := []any{}
+				for _, v := range v {
+					vm = append(vm, v)
+				}
+				qm[k] = vm
+			} else {
+				qm[k] = v[0]
+			}
+		}
+
+		return qm
+	}
+	interp.RegisterFunc0("fromurlquery", func(_ *interp.Interp, c string) any {
+		q, err := url.ParseQuery(c)
+		if err != nil {
+			return err
+		}
+		return fromURLValues(q)
+	})
+	toURLValues := func(c map[string]any) url.Values {
+		qv := url.Values{}
+		for k, v := range c {
+			if va, ok := gojqextra.Cast[[]any](v); ok {
+				var ss []string
+				for _, s := range va {
+					if s, ok := gojqextra.Cast[string](s); ok {
+						ss = append(ss, s)
+					}
+				}
+				qv[k] = ss
+			} else if vs, ok := gojqextra.Cast[string](v); ok {
+				qv[k] = []string{vs}
+			}
+		}
+		return qv
+	}
+	interp.RegisterFunc0("tourlquery", func(_ *interp.Interp, c map[string]any) any {
+		// TODO: nicer
+		c, ok := gojqextra.NormalizeToStrings(c).(map[string]any)
+		if !ok {
+			panic("not map")
+		}
+		return toURLValues(c).Encode()
+	})
+
+	interp.RegisterFunc0("fromurl", func(_ *interp.Interp, c string) any {
+		u, err := url.Parse(c)
+		if err != nil {
+			return err
+		}
+
+		m := map[string]any{}
+		if u.Scheme != "" {
+			m["scheme"] = u.Scheme
+		}
+		if u.User != nil {
+			um := map[string]any{
+				"username": u.User.Username(),
+			}
+			if p, ok := u.User.Password(); ok {
+				um["password"] = p
+			}
+			m["user"] = um
+		}
+		if u.Host != "" {
+			m["host"] = u.Host
+		}
+		if u.Path != "" {
+			m["path"] = u.Path
+		}
+		if u.RawPath != "" {
+			m["rawpath"] = u.RawPath
+		}
+		if u.RawQuery != "" {
+			m["rawquery"] = u.RawQuery
+			m["query"] = fromURLValues(u.Query())
+		}
+		if u.Fragment != "" {
+			m["fragment"] = u.Fragment
+		}
+		return m
+	})
+	interp.RegisterFunc0("tourl", func(_ *interp.Interp, c map[string]any) any {
+		// TODO: nicer
+		c, ok := gojqextra.NormalizeToStrings(c).(map[string]any)
+		if !ok {
+			panic("not map")
+		}
+
+		str := func(v any) string { s, _ := gojqextra.Cast[string](v); return s }
+		u := url.URL{
+			Scheme:   str(c["scheme"]),
+			Host:     str(c["host"]),
+			Path:     str(c["path"]),
+			Fragment: str(c["fragment"]),
+		}
+
+		if um, ok := gojqextra.Cast[map[string]any](c["user"]); ok {
+			username, password := str(um["username"]), str(um["password"])
+			if username != "" {
+				if password == "" {
+					u.User = url.User(username)
+				} else {
+					u.User = url.UserPassword(username, password)
+				}
+			}
+		}
+		if s, ok := gojqextra.Cast[string](c["rawquery"]); ok {
+			u.RawQuery = s
+		}
+		if qm, ok := gojqextra.Cast[map[string]any](c["query"]); ok {
+			u.RawQuery = toURLValues(qm).Encode()
+		}
+
+		return u.String()
+	})
+}
--- a/format/toml/testdata/bigint.fqtest
+++ b/format/toml/testdata/bigint.fqtest
@ -0,0 +1,5 @@
+$ fq -n "{a: bsl(1;100)} | totoml | ., fromtoml"
+"a = \"1267650600228229401496703205376\"\n"
+{
+  "a": "1267650600228229401496703205376"
+}
--- a/pkg/interp/testdata/encoding/toml.fqtest
+++ b/pkg/interp/testdata/encoding/toml.fqtest
@ -1,20 +1,28 @@
+/probe.toml:
+[test]
+key = 123
+$ fq . probe.toml
+{
+  "test": {
+    "key": 123
+  }
+}
 # toml does not support null in arrays
 # TODO: add uint64 norm test
 $ fq -rRs 'fromjson[] | (walk(if type == "array" then map(select(. != null)) end) | try (totoml | ., fromtoml) catch .), "----"' variants.json
-
-{}
+totoml cannot be applied to: null
 ----
-totoml cannot be applied to: boolean (true)
+toml: top-level values must be Go maps or structs
 ----
-totoml cannot be applied to: boolean (false)
+toml: top-level values must be Go maps or structs
 ----
-totoml cannot be applied to: number (123)
+toml: top-level values must be Go maps or structs
 ----
-totoml cannot be applied to: number (123.123)
+toml: top-level values must be Go maps or structs
 ----
-totoml cannot be applied to: string ("string")
+toml: top-level values must be Go maps or structs
 ----
-totoml cannot be applied to: array ([1,2,3])
+toml: top-level values must be Go maps or structs
 ----
 array = [true, false, 1.2, "string", [1.2, 3], {a = 1}]
 "escape \\\"" = 456
@ -52,3 +60,12 @@ true = true
  "white space": 123
 }
 ----
+toml: top-level values must be Go maps or structs
+----
+
+error at position 0x0: root object has no values
+----
+$ fq -n '"" | fromtoml'
+exitcode: 5
+stderr:
+error: error at position 0x0: root object has no values
--- a/format/toml/testdata/trailing.fqtest
+++ b/format/toml/testdata/trailing.fqtest
@ -0,0 +1,4 @@
+$ fq.go -n '"[a] trailing" | fromtoml._error.error'
+exitcode: 5
+stderr:
+error: error at position 0xc: toml: line 1 (last key "a"): expected a top-level item to end with a newline, comment, or EOF, but got 't' instead
--- a/format/toml/testdata/variants.json
+++ b/format/toml/testdata/variants.json
@ -0,0 +1,22 @@
+[
+    null,
+    true,
+    false,
+    123,
+    123.123,
+    "string",
+    [1, 2, 3],
+    {
+        "array": [ true, false, null, 1.2, "string", [1.2, 3], {"a": 1} ],
+        "escape \\\"": 456,
+        "false": false,
+        "null": null,
+        "number": 1.2,
+        "object": {"a": 1},
+        "string": "string",
+        "true": true,
+        "white space": 123
+    },
+    [],
+    {}
+]
--- a/format/toml/toml.go
+++ b/format/toml/toml.go
@ -0,0 +1,69 @@
+package toml
+
+import (
+	"bytes"
+	"embed"
+
+	"github.com/BurntSushi/toml"
+	"github.com/wader/fq/format"
+	"github.com/wader/fq/internal/gojqextra"
+	"github.com/wader/fq/pkg/bitio"
+	"github.com/wader/fq/pkg/decode"
+	"github.com/wader/fq/pkg/interp"
+	"github.com/wader/fq/pkg/scalar"
+)
+
+//go:embed toml.jq
+var tomlFS embed.FS
+
+func init() {
+	interp.RegisterFormat(decode.Format{
+		Name:        format.TOML,
+		Description: "Tom's Obvious, Minimal Language",
+		ProbeOrder:  format.ProbeOrderText,
+		Groups:      []string{format.PROBE},
+		DecodeFn:    decodeTOML,
+		Functions:   []string{"_todisplay"},
+		Files:       tomlFS,
+	})
+	interp.RegisterFunc0("totoml", toTOML)
+}
+
+func decodeTOML(d *decode.D, _ any) any {
+	br := d.RawLen(d.Len())
+	var r any
+
+	if _, err := toml.NewDecoder(bitio.NewIOReader(br)).Decode(&r); err != nil {
+		d.Fatalf("%s", err)
+	}
+	var s scalar.S
+	s.Actual = gojqextra.Normalize(r)
+
+	// TODO: better way to handle that an empty file is valid toml and parsed as an object
+	switch v := s.Actual.(type) {
+	case map[string]any:
+		if len(v) == 0 {
+			d.Fatalf("root object has no values")
+		}
+	case []any:
+	default:
+		d.Fatalf("root not object or array")
+	}
+
+	d.Value.V = &s
+	d.Value.Range.Len = d.Len()
+
+	return nil
+}
+
+func toTOML(_ *interp.Interp, c any) any {
+	if c == nil {
+		return gojqextra.FuncTypeError{Name: "totoml", V: c}
+	}
+
+	b := &bytes.Buffer{}
+	if err := toml.NewEncoder(b).Encode(gojqextra.Normalize(c)); err != nil {
+		return err
+	}
+	return b.String()
+}
--- a/format/toml/toml.jq
+++ b/format/toml/toml.jq
@ -0,0 +1 @@
+def _toml__todisplay: tovalue;
--- a/format/wav/wav.go
+++ b/format/wav/wav.go
@ -23,7 +23,7 @@ var footerFormat decode.Group
 func init() {
 	interp.RegisterFormat(decode.Format{
 		Name:        format.WAV,
-		ProbeOrder:  10, // after most others (overlap some with webp)
+		ProbeOrder:  format.ProbeOrderBinFuzzy, // after most others (overlap some with webp)
 		Description: "WAV file",
 		Groups:      []string{format.PROBE},
 		DecodeFn:    wavDecode,
--- a/format/xml/html.go
+++ b/format/xml/html.go
@ -0,0 +1,206 @@
+package xml
+
+import (
+	"embed"
+	"strings"
+
+	"github.com/wader/fq/format"
+	"github.com/wader/fq/pkg/bitio"
+	"github.com/wader/fq/pkg/decode"
+	"github.com/wader/fq/pkg/interp"
+	"github.com/wader/fq/pkg/scalar"
+	"golang.org/x/net/html"
+)
+
+//go:embed html.jq
+var htmlFS embed.FS
+
+func init() {
+	interp.RegisterFormat(decode.Format{
+		Name:        format.HTML,
+		Description: "HyperText Markup Language",
+		DecodeFn:    decodeHTML,
+		DecodeInArg: format.HTMLIn{
+			Seq:   false,
+			Array: false,
+		},
+		Functions: []string{"_todisplay"},
+		Files:     htmlFS,
+	})
+}
+
+func fromHTMLObject(n *html.Node, hi format.HTMLIn) any {
+	var f func(n *html.Node, seq int) any
+	f = func(n *html.Node, seq int) any {
+		attrs := map[string]any{}
+
+		switch n.Type {
+		case html.ElementNode:
+			for _, a := range n.Attr {
+				attrs["-"+a.Key] = a.Val
+			}
+		default:
+			// skip
+		}
+
+		nNodes := 0
+		for c := n.FirstChild; c != nil; c = c.NextSibling {
+			if c.Type == html.ElementNode {
+				nNodes++
+			}
+		}
+		nSeq := -1
+		if nNodes > 1 {
+			nSeq = 0
+		}
+
+		var textSb *strings.Builder
+		var commentSb *strings.Builder
+
+		for c := n.FirstChild; c != nil; c = c.NextSibling {
+			switch c.Type {
+			case html.ElementNode:
+				if e, ok := attrs[c.Data]; ok {
+					if ea, ok := e.([]any); ok {
+						attrs[c.Data] = append(ea, f(c, nSeq))
+					} else {
+						attrs[c.Data] = []any{e, f(c, nSeq)}
+					}
+				} else {
+					attrs[c.Data] = f(c, nSeq)
+				}
+				if nNodes > 1 {
+					nSeq++
+				}
+			case html.TextNode:
+				if !whitespaceRE.MatchString(c.Data) {
+					if textSb == nil {
+						textSb = &strings.Builder{}
+					}
+					textSb.WriteString(c.Data)
+				}
+			case html.CommentNode:
+				if !whitespaceRE.MatchString(c.Data) {
+					if commentSb == nil {
+						commentSb = &strings.Builder{}
+					}
+					commentSb.WriteString(c.Data)
+				}
+			default:
+				// skip other nodes
+			}
+
+			if textSb != nil {
+				attrs["#text"] = strings.TrimSpace(textSb.String())
+			}
+			if commentSb != nil {
+				attrs["#comment"] = strings.TrimSpace(commentSb.String())
+			}
+		}
+
+		if hi.Seq && seq != -1 {
+			attrs["#seq"] = seq
+		}
+
+		if len(attrs) == 0 {
+			return ""
+		} else if len(attrs) == 1 && attrs["#text"] != nil {
+			return attrs["#text"]
+		}
+
+		return attrs
+	}
+
+	return f(n, -1)
+}
+
+func fromHTMLArray(n *html.Node) any {
+	var f func(n *html.Node) any
+	f = func(n *html.Node) any {
+		attrs := map[string]any{}
+
+		switch n.Type {
+		case html.ElementNode:
+			for _, a := range n.Attr {
+				attrs[a.Key] = a.Val
+			}
+		default:
+			// skip
+		}
+
+		nodes := []any{}
+		var textSb *strings.Builder
+		var commentSb *strings.Builder
+
+		for c := n.FirstChild; c != nil; c = c.NextSibling {
+			switch c.Type {
+			case html.ElementNode:
+				nodes = append(nodes, f(c))
+			case html.TextNode:
+				if !whitespaceRE.MatchString(c.Data) {
+					if textSb == nil {
+						textSb = &strings.Builder{}
+					}
+					textSb.WriteString(c.Data)
+				}
+			case html.CommentNode:
+				if !whitespaceRE.MatchString(c.Data) {
+					if commentSb == nil {
+						commentSb = &strings.Builder{}
+					}
+					commentSb.WriteString(c.Data)
+				}
+			default:
+				// skip other nodes
+			}
+		}
+
+		if textSb != nil {
+			attrs["#text"] = strings.TrimSpace(textSb.String())
+		}
+		if commentSb != nil {
+			attrs["#comment"] = strings.TrimSpace(commentSb.String())
+		}
+
+		elm := []any{n.Data}
+		if len(attrs) > 0 {
+			elm = append(elm, attrs)
+		}
+		if len(nodes) > 0 {
+			elm = append(elm, nodes)
+		}
+
+		return elm
+	}
+
+	return f(n.FirstChild)
+}
+
+func decodeHTML(d *decode.D, in any) any {
+	hi, _ := in.(format.HTMLIn)
+
+	br := d.RawLen(d.Len())
+	var r any
+	var err error
+	// disabled scripting means parse noscript tags etc
+	n, err := html.ParseWithOptions(bitio.NewIOReader(br), html.ParseOptionEnableScripting(false))
+	if err != nil {
+		d.Fatalf("%s", err)
+	}
+
+	if hi.Array {
+		r = fromHTMLArray(n)
+	} else {
+		r = fromHTMLObject(n, hi)
+	}
+	if err != nil {
+		d.Fatalf("%s", err)
+	}
+	var s scalar.S
+	s.Actual = r
+
+	d.Value.V = &s
+	d.Value.Range.Len = d.Len()
+
+	return nil
+}
--- a/format/xml/html.jq
+++ b/format/xml/html.jq
@ -0,0 +1 @@
+def _html__todisplay: tovalue;
--- a/pkg/interp/testdata/encoding/xml_html/all.xml
+++ b/pkg/interp/testdata/encoding/xml_html/all.xml
--- a/format/xml/testdata/bigint.fqtest
+++ b/format/xml/testdata/bigint.fqtest
@ -0,0 +1,5 @@
+$ fq -n "{a: bsl(1;100)} | toxml | ., fromxml"
+"<a>1267650600228229401496703205376</a>"
+{
+  "a": "1267650600228229401496703205376"
+}
--- a/pkg/interp/testdata/encoding/xml_html/decl.xml
+++ b/pkg/interp/testdata/encoding/xml_html/decl.xml
--- a/format/xml/testdata/defaultns.xml
+++ b/format/xml/testdata/defaultns.xml
@ -0,0 +1,4 @@
+<elm xmlns="a:b:c">
+    <ns1:aaa ns1:attr1="v1">1</ns1:aaa>
+    <bbb key="value">3</bbb>
+</elm>
--- a/pkg/interp/testdata/encoding/xml_html/escape.xml
+++ b/pkg/interp/testdata/encoding/xml_html/escape.xml
--- a/pkg/interp/testdata/encoding/xml_html/html.fqtest
+++ b/pkg/interp/testdata/encoding/xml_html/html.fqtest
@ -1,4 +1,13 @@
-$ fq -d raw -ni . all.xml multi_diff.xml multi_same.xml ns.xml simple.xml escape.xml
+/test:
+test
+$ fq -d html . /test
+{
+  "html": {
+    "body": "test",
+    "head": ""
+  }
+}
+$ fq -d raw -ni . all.xml multi_diff.xml multi_same.xml ns.xml simple.xml escape.xml noscript.html
 null> inputs | {name: input_filename, str: (tobytes | tostring)} | slurp("files")
 null> spew("files") | .name, (.str | fromhtml | ., (toxml({indent: 2}) | println))
 "all.xml"
@ -146,6 +155,25 @@ null> spew("files") | .name, (.str | fromhtml | ., (toxml({indent: 2}) | println
  </body>
  <head></head>
 </html>
+"noscript.html"
+{
+  "html": {
+    "body": {
+      "a": "text"
+    },
+    "head": {
+      "noscript": ""
+    }
+  }
+}
+<html>
+  <body>
+    <a>text</a>
+  </body>
+  <head>
+    <noscript></noscript>
+  </head>
+</html>
 null> spew("files") | .name, (.str | fromhtml({seq: true}) | ., (toxml({indent: 2}) | println))
 "all.xml"
 {
@ -331,6 +359,27 @@ null> spew("files") | .name, (.str | fromhtml({seq: true}) | ., (toxml({indent:
    <a attr="&amp;&lt;&gt;">&amp;&lt;&gt;</a>
  </body>
 </html>
+"noscript.html"
+{
+  "html": {
+    "body": {
+      "#seq": 1,
+      "a": "text"
+    },
+    "head": {
+      "#seq": 0,
+      "noscript": ""
+    }
+  }
+}
+<html>
+  <head>
+    <noscript></noscript>
+  </head>
+  <body>
+    <a>text</a>
+  </body>
+</html>
 null> spew("files") | .name, (.str | fromhtml({array: true}) | ., (toxml({indent: 2}) | println))
 "all.xml"
 [
@ -565,4 +614,37 @@ null> spew("files") | .name, (.str | fromhtml({array: true}) | ., (toxml({indent
    <a attr="&amp;&lt;&gt;">&amp;&lt;&gt;</a>
  </body>
 </html>
+"noscript.html"
+[
+  "html",
+  [
+    [
+      "head",
+      [
+        [
+          "noscript"
+        ]
+      ]
+    ],
+    [
+      "body",
+      [
+        [
+          "a",
+          {
+            "#text": "text"
+          }
+        ]
+      ]
+    ]
+  ]
+]
+<html>
+  <head>
+    <noscript></noscript>
+  </head>
+  <body>
+    <a>text</a>
+  </body>
+</html>
 null> ^D
--- a/pkg/interp/testdata/encoding/xml_html/multi_diff.xml
+++ b/pkg/interp/testdata/encoding/xml_html/multi_diff.xml
--- a/pkg/interp/testdata/encoding/xml_html/multi_same.xml
+++ b/pkg/interp/testdata/encoding/xml_html/multi_same.xml
--- a/format/xml/testdata/noscript.html
+++ b/format/xml/testdata/noscript.html
@ -0,0 +1,3 @@
+<noscript>
+    <a>text</a>
+</noscript>
--- a/pkg/interp/testdata/encoding/xml_html/ns.xml
+++ b/pkg/interp/testdata/encoding/xml_html/ns.xml
--- a/pkg/interp/testdata/encoding/xml_html/simple.xml
+++ b/pkg/interp/testdata/encoding/xml_html/simple.xml
--- a/format/xml/testdata/trailing.fqtest
+++ b/format/xml/testdata/trailing.fqtest
@ -0,0 +1,4 @@
+$ fq -n '"<a></a> trailing" | fromxml'
+exitcode: 5
+stderr:
+error: error at position 0xa: root element has trailing data
--- a/pkg/interp/testdata/encoding/xml_html/xml.fqtest
+++ b/pkg/interp/testdata/encoding/xml_html/xml.fqtest
@ -1,6 +1,12 @@
+/probe.xml:
+<a></a>
+$ fq . probe.xml
+{
+  "a": ""
+}
 $ fq -d raw -ni . all.xml decl.xml multi_diff.xml multi_same.xml ns.xml simple.xml escape.xml
 null> inputs | {name: input_filename, str: (tobytes | tostring)} | slurp("files")
-null> spew("files") | .name, (.str | fromxml | ., (toxml({indent: 2}) | println))
+null> spew("files") | .name, try (.str | fromxml | ., (toxml({indent: 2}) | println)) catch .
 "all.xml"
 {
  "elm": {
@ -29,15 +35,9 @@ null> spew("files") | .name, (.str | fromxml | ., (toxml({indent: 2}) | println)
 }
 <elm></elm>
 "multi_diff.xml"
-{
-  "elm1": ""
-}
-<elm1></elm1>
+"error at position 0x10: root element has trailing data"
 "multi_same.xml"
-{
-  "elm": ""
-}
-<elm></elm>
+"error at position 0xe: root element has trailing data"
 "ns.xml"
 {
  "elm": {
@ -51,14 +51,14 @@ null> spew("files") | .name, (.str | fromxml | ., (toxml({indent: 2}) | println)
    "ns2:aaa": {
      "#text": "2",
      "-ns2:attr2": "v2",
+      "ccc": {
+        "-ns2:attr5": "v5"
+      },
      "ns1:ccc": {
        "-ns1:attr3": "v3"
      },
      "ns2:ccc": {
        "-ns2:attr4": "v4"
-      },
-      "ns3:ccc": {
-        "-ns2:attr5": "v5"
      }
    }
  }
@ -67,9 +67,9 @@ null> spew("files") | .name, (.str | fromxml | ., (toxml({indent: 2}) | println)
  <aaa>3</aaa>
  <ns1:aaa ns1:attr1="v1">1</ns1:aaa>
  <ns2:aaa ns2:attr2="v2">2
+    <ccc ns2:attr5="v5"></ccc>
    <ns1:ccc ns1:attr3="v3"></ns1:ccc>
    <ns2:ccc ns2:attr4="v4"></ns2:ccc>
-    <ns3:ccc ns2:attr5="v5"></ns3:ccc>
  </ns2:aaa>
 </elm>
 "simple.xml"
@ -85,7 +85,7 @@ null> spew("files") | .name, (.str | fromxml | ., (toxml({indent: 2}) | println)
  }
 }
 <a attr="&amp;&lt;&gt;">&amp;&lt;&gt;</a>
-null> spew("files") | .name, (.str | fromxml({seq: true}) | ., (toxml({indent: 2}) | println))
+null> spew("files") | .name, try (.str | fromxml({seq: true}) | ., (toxml({indent: 2}) | println)) catch .
 "all.xml"
 {
  "elm": {
@ -119,15 +119,9 @@ null> spew("files") | .name, (.str | fromxml({seq: true}) | ., (toxml({indent: 2
 }
 <elm></elm>
 "multi_diff.xml"
-{
-  "elm1": ""
-}
-<elm1></elm1>
+"error at position 0x10: root element has trailing data"
 "multi_same.xml"
-{
-  "elm": ""
-}
-<elm></elm>
+"error at position 0xe: root element has trailing data"
 "ns.xml"
 {
  "elm": {
@ -146,6 +140,10 @@ null> spew("files") | .name, (.str | fromxml({seq: true}) | ., (toxml({indent: 2
      "#seq": 1,
      "#text": "2",
      "-ns2:attr2": "v2",
+      "ccc": {
+        "#seq": 2,
+        "-ns2:attr5": "v5"
+      },
      "ns1:ccc": {
        "#seq": 0,
        "-ns1:attr3": "v3"
@ -153,10 +151,6 @@ null> spew("files") | .name, (.str | fromxml({seq: true}) | ., (toxml({indent: 2
      "ns2:ccc": {
        "#seq": 1,
        "-ns2:attr4": "v4"
-      },
-      "ns3:ccc": {
-        "#seq": 2,
-        "-ns2:attr5": "v5"
      }
    }
  }
@ -166,7 +160,7 @@ null> spew("files") | .name, (.str | fromxml({seq: true}) | ., (toxml({indent: 2
  <ns2:aaa ns2:attr2="v2">2
    <ns1:ccc ns1:attr3="v3"></ns1:ccc>
    <ns2:ccc ns2:attr4="v4"></ns2:ccc>
-    <ns3:ccc ns2:attr5="v5"></ns3:ccc>
+    <ccc ns2:attr5="v5"></ccc>
  </ns2:aaa>
  <aaa>3</aaa>
 </elm>
@ -183,7 +177,7 @@ null> spew("files") | .name, (.str | fromxml({seq: true}) | ., (toxml({indent: 2
  }
 }
 <a attr="&amp;&lt;&gt;">&amp;&lt;&gt;</a>
-null> spew("files") | .name, (.str | fromxml({array: true}) | ., (toxml({indent: 2}) | println))
+null> spew("files") | .name, try (.str | fromxml({array: true}) | ., (toxml({indent: 2}) | println)) catch .
 "all.xml"
 [
  "elm",
@ -224,15 +218,9 @@ null> spew("files") | .name, (.str | fromxml({array: true}) | ., (toxml({indent:
 ]
 <elm></elm>
 "multi_diff.xml"
-[
-  "elm1"
-]
-<elm1></elm1>
+"error at position 0x10: root element has trailing data"
 "multi_same.xml"
-[
-  "elm"
-]
-<elm></elm>
+"error at position 0xe: root element has trailing data"
 "ns.xml"
 [
  "elm",
@ -268,7 +256,7 @@ null> spew("files") | .name, (.str | fromxml({array: true}) | ., (toxml({indent:
          }
        ],
        [
-          "ns3:ccc",
+          "ccc",
          {
            "ns2:attr5": "v5"
          }
@ -288,7 +276,7 @@ null> spew("files") | .name, (.str | fromxml({array: true}) | ., (toxml({indent:
  <ns2:aaa ns2:attr2="v2">2
    <ns1:ccc ns1:attr3="v3"></ns1:ccc>
    <ns2:ccc ns2:attr4="v4"></ns2:ccc>
-    <ns3:ccc ns2:attr5="v5"></ns3:ccc>
+    <ccc ns2:attr5="v5"></ccc>
  </ns2:aaa>
  <aaa>3</aaa>
 </elm>
@ -337,5 +325,5 @@ null> {a: ["b", "c"]} | toxml
 null> {a: [123, null, true, false]} | toxml
 "<doc><a>123</a><a></a><a>true</a><a>false</a></doc>"
 null> 123 | toxml
-error: toxml cannot be applied to: string ("123")
+error: toxml cannot be applied to: number (123)
 null> ^D
--- a/pkg/interp/testdata/encoding/xmlentities.fqtest
+++ b/pkg/interp/testdata/encoding/xmlentities.fqtest
--- a/format/xml/xml.go
+++ b/format/xml/xml.go
@ -0,0 +1,448 @@
+package xml
+
+// object mode inspired by https://www.xml.com/pub/a/2006/05/31/converting-between-xml-and-json.html
+
+// TODO: keep <?xml>? root #desc?
+// TODO: xml default indent?
+
+import (
+	"bytes"
+	"embed"
+	"encoding/xml"
+	"html"
+	"regexp"
+	"sort"
+	"strconv"
+	"strings"
+
+	"github.com/wader/fq/format"
+	"github.com/wader/fq/internal/gojqextra"
+	"github.com/wader/fq/internal/proxysort"
+	"github.com/wader/fq/pkg/bitio"
+	"github.com/wader/fq/pkg/decode"
+	"github.com/wader/fq/pkg/interp"
+	"github.com/wader/fq/pkg/scalar"
+)
+
+//go:embed xml.jq
+var xmlFS embed.FS
+
+func init() {
+	interp.RegisterFormat(decode.Format{
+		Name:        format.XML,
+		Description: "Extensible Markup Language",
+		ProbeOrder:  format.ProbeOrderText,
+		Groups:      []string{format.PROBE},
+		DecodeFn:    decodeXML,
+		DecodeInArg: format.XMLIn{
+			Seq:   false,
+			Array: false,
+		},
+		Functions: []string{"_todisplay"},
+		Files:     xmlFS,
+	})
+	interp.RegisterFunc1("toxml", toXML)
+	interp.RegisterFunc0("fromxmlentities", func(_ *interp.Interp, c string) any {
+		return html.UnescapeString(c)
+	})
+	interp.RegisterFunc0("toxmlentities", func(_ *interp.Interp, c string) any {
+		return html.EscapeString(c)
+	})
+}
+
+var whitespaceRE = regexp.MustCompile(`^\s*$`)
+
+type xmlNode struct {
+	XMLName  xml.Name
+	Attrs    []xml.Attr `xml:",attr"`
+	Chardata []byte     `xml:",chardata"`
+	Comment  []byte     `xml:",comment"`
+	Nodes    []xmlNode  `xml:",any"`
+}
+
+func (n *xmlNode) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
+	n.Attrs = start.Attr
+	type node xmlNode
+	return d.DecodeElement((*node)(n), &start)
+}
+
+type xmlNS struct {
+	name string
+	url  string
+}
+
+// xmlNNStack is used to undo namespace url resolving, space is url not the "alias" name
+type xmlNNStack []xmlNS
+
+func (nss xmlNNStack) lookup(name xml.Name) string {
+	for i := len(nss) - 1; i >= 0; i-- {
+		ns := nss[i]
+		if name.Space == ns.url {
+			return ns.name
+		}
+	}
+	return ""
+}
+
+func (nss xmlNNStack) push(name string, url string) xmlNNStack {
+	n := append([]xmlNS{}, nss...)
+	n = append(n, xmlNS{name: name, url: url})
+	return xmlNNStack(n)
+}
+
+func fromXMLArray(n xmlNode) any {
+	var f func(n xmlNode, nss xmlNNStack) []any
+	f = func(n xmlNode, nss xmlNNStack) []any {
+		attrs := map[string]any{}
+		for _, a := range n.Attrs {
+			local, space := a.Name.Local, a.Name.Space
+			name := local
+			if space != "" {
+				if space == "xmlns" {
+					nss = nss.push(local, a.Value)
+				} else {
+					space = nss.lookup(a.Name)
+				}
+				name = space + ":" + local
+			}
+			attrs[name] = a.Value
+		}
+		if attrs["#text"] == nil && !whitespaceRE.Match(n.Chardata) {
+			attrs["#text"] = strings.TrimSpace(string(n.Chardata))
+		}
+		if attrs["#comment"] == nil && !whitespaceRE.Match(n.Comment) {
+			attrs["#comment"] = strings.TrimSpace(string(n.Comment))
+		}
+
+		nodes := []any{}
+		for _, c := range n.Nodes {
+			nodes = append(nodes, f(c, nss))
+		}
+
+		name, space := n.XMLName.Local, n.XMLName.Space
+		if space != "" {
+			space = nss.lookup(n.XMLName)
+		}
+		// only add if ns is found and not default ns
+		if space != "" {
+			name = space + ":" + name
+		}
+		elm := []any{name}
+		if len(attrs) > 0 {
+			elm = append(elm, attrs)
+		}
+		if len(nodes) > 0 {
+			elm = append(elm, nodes)
+		}
+
+		return elm
+	}
+
+	return f(n, nil)
+}
+
+func fromXMLObject(n xmlNode, xi format.XMLIn) any {
+	var f func(n xmlNode, seq int, nss xmlNNStack) any
+	f = func(n xmlNode, seq int, nss xmlNNStack) any {
+		attrs := map[string]any{}
+
+		for _, a := range n.Attrs {
+			local, space := a.Name.Local, a.Name.Space
+			name := local
+			if space != "" {
+				if space == "xmlns" {
+					nss = nss.push(local, a.Value)
+				} else {
+					space = nss.lookup(a.Name)
+				}
+				name = space + ":" + local
+			}
+			attrs["-"+name] = a.Value
+		}
+
+		for i, nn := range n.Nodes {
+			nSeq := i
+			if len(n.Nodes) == 1 {
+				nSeq = -1
+			}
+			local, space := nn.XMLName.Local, nn.XMLName.Space
+			name := local
+			if space != "" {
+				space = nss.lookup(nn.XMLName)
+			}
+			// only add if ns is found and not default ns
+			if space != "" {
+				name = space + ":" + name
+			}
+			if e, ok := attrs[name]; ok {
+				if ea, ok := e.([]any); ok {
+					attrs[name] = append(ea, f(nn, nSeq, nss))
+				} else {
+					attrs[name] = []any{e, f(nn, nSeq, nss)}
+				}
+			} else {
+				attrs[name] = f(nn, nSeq, nss)
+			}
+		}
+
+		if xi.Seq && seq != -1 {
+			attrs["#seq"] = seq
+		}
+		if attrs["#text"] == nil && !whitespaceRE.Match(n.Chardata) {
+			attrs["#text"] = strings.TrimSpace(string(n.Chardata))
+		}
+		if attrs["#comment"] == nil && !whitespaceRE.Match(n.Comment) {
+			attrs["#comment"] = strings.TrimSpace(string(n.Comment))
+		}
+
+		if len(attrs) == 0 {
+			return ""
+		} else if len(attrs) == 1 && attrs["#text"] != nil {
+			return attrs["#text"]
+		}
+
+		return attrs
+	}
+
+	return map[string]any{
+		n.XMLName.Local: f(n, -1, nil),
+	}
+}
+
+var wsRE *regexp.Regexp
+
+func decodeXML(d *decode.D, in any) any {
+	xi, _ := in.(format.XMLIn)
+
+	br := d.RawLen(d.Len())
+	var r any
+	var err error
+
+	xd := xml.NewDecoder(bitio.NewIOReader(br))
+	xd.Strict = false
+	var n xmlNode
+	if err := xd.Decode(&n); err != nil {
+		d.Fatalf("%s", err)
+	}
+
+	if xi.Array {
+		r = fromXMLArray(n)
+	} else {
+		r = fromXMLObject(n, xi)
+	}
+	if err != nil {
+		d.Fatalf("%s", err)
+	}
+	var s scalar.S
+	s.Actual = r
+
+	switch s.Actual.(type) {
+	case map[string]any,
+		[]any:
+	default:
+		d.Fatalf("root not object or array")
+	}
+
+	d.SeekAbs(xd.InputOffset() * 8)
+	if d.RE(&wsRE, `^\s*$`) == nil {
+		d.Fatalf("root element has trailing data")
+	}
+
+	d.Value.V = &s
+	d.Value.Range.Len = d.Len()
+
+	return nil
+}
+
+type ToXMLOpts struct {
+	Indent int
+}
+
+func toXMLObject(c any, opts ToXMLOpts) any {
+	var f func(name string, content any) (xmlNode, int)
+	f = func(name string, content any) (xmlNode, int) {
+		n := xmlNode{
+			XMLName: xml.Name{Local: name},
+		}
+
+		seq := -1
+		var orderSeqs []int
+		var orderNames []string
+
+		switch v := content.(type) {
+		case string:
+			n.Chardata = []byte(v)
+		case map[string]any:
+			for k, v := range v {
+				switch {
+				case k == "#seq":
+					seq, _ = strconv.Atoi(v.(string))
+				case k == "#text":
+					s, _ := v.(string)
+					n.Chardata = []byte(s)
+				case k == "#comment":
+					s, _ := v.(string)
+					n.Comment = []byte(s)
+				case strings.HasPrefix(k, "-"):
+					s, _ := v.(string)
+					n.Attrs = append(n.Attrs, xml.Attr{
+						Name:  xml.Name{Local: k[1:]},
+						Value: s,
+					})
+				default:
+					switch v := v.(type) {
+					case []any:
+						if len(v) > 0 {
+							for _, c := range v {
+								nn, nseq := f(k, c)
+								n.Nodes = append(n.Nodes, nn)
+								orderNames = append(orderNames, k)
+								orderSeqs = append(orderSeqs, nseq)
+							}
+						} else {
+							nn, nseq := f(k, "")
+							n.Nodes = append(n.Nodes, nn)
+							orderNames = append(orderNames, k)
+							orderSeqs = append(orderSeqs, nseq)
+						}
+					default:
+						nn, nseq := f(k, v)
+						n.Nodes = append(n.Nodes, nn)
+						orderNames = append(orderNames, k)
+						orderSeqs = append(orderSeqs, nseq)
+					}
+				}
+			}
+		}
+
+		// if one #seq was found, assume all have them, otherwise sort by name
+		if len(orderSeqs) > 0 && orderSeqs[0] != -1 {
+			proxysort.Sort(orderSeqs, n.Nodes, func(ss []int, i, j int) bool { return ss[i] < ss[j] })
+		} else {
+			proxysort.Sort(orderNames, n.Nodes, func(ss []string, i, j int) bool { return ss[i] < ss[j] })
+		}
+
+		sort.Slice(n.Attrs, func(i, j int) bool {
+			a, b := n.Attrs[i].Name, n.Attrs[j].Name
+			return a.Space < b.Space || a.Local < b.Local
+		})
+
+		return n, seq
+	}
+
+	n, _ := f("doc", c)
+	if len(n.Nodes) == 1 && len(n.Attrs) == 0 && n.Comment == nil && n.Chardata == nil {
+		n = n.Nodes[0]
+	}
+
+	bb := &bytes.Buffer{}
+	e := xml.NewEncoder(bb)
+	e.Indent("", strings.Repeat(" ", opts.Indent))
+	if err := e.Encode(n); err != nil {
+		return err
+	}
+	if err := e.Flush(); err != nil {
+		return err
+	}
+
+	return bb.String()
+}
+
+// ["elm", {attrs}, [children]] -> <elm attrs...>children...</elm>
+func toXMLArray(c any, opts ToXMLOpts) any {
+	var f func(elm []any) (xmlNode, bool)
+	f = func(elm []any) (xmlNode, bool) {
+		var name string
+		var attrs map[string]any
+		var children []any
+
+		for _, v := range elm {
+			switch v := v.(type) {
+			case string:
+				if name == "" {
+					name = v
+				}
+			case map[string]any:
+				if attrs == nil {
+					attrs = v
+				}
+			case []any:
+				if children == nil {
+					children = v
+				}
+			}
+		}
+
+		if name == "" {
+			return xmlNode{}, false
+		}
+
+		n := xmlNode{
+			XMLName: xml.Name{Local: name},
+		}
+
+		for k, v := range attrs {
+			switch k {
+			case "#comment":
+				s, _ := v.(string)
+				n.Comment = []byte(s)
+			case "#text":
+				s, _ := v.(string)
+				n.Chardata = []byte(s)
+			default:
+				s, _ := v.(string)
+				n.Attrs = append(n.Attrs, xml.Attr{
+					Name:  xml.Name{Local: k},
+					Value: s,
+				})
+			}
+		}
+
+		sort.Slice(n.Attrs, func(i, j int) bool {
+			a, b := n.Attrs[i].Name, n.Attrs[j].Name
+			return a.Space < b.Space || a.Local < b.Local
+		})
+
+		for _, c := range children {
+			c, ok := c.([]any)
+			if !ok {
+				continue
+			}
+			if cn, ok := f(c); ok {
+				n.Nodes = append(n.Nodes, cn)
+			}
+		}
+
+		return n, true
+	}
+
+	ca, ok := c.([]any)
+	if !ok {
+		return gojqextra.FuncTypeError{Name: "toxml", V: c}
+	}
+	n, ok := f(ca)
+	if !ok {
+		// TODO: better error
+		return gojqextra.FuncTypeError{Name: "toxml", V: c}
+	}
+	bb := &bytes.Buffer{}
+	e := xml.NewEncoder(bb)
+	e.Indent("", strings.Repeat(" ", opts.Indent))
+	if err := e.Encode(n); err != nil {
+		return err
+	}
+	if err := e.Flush(); err != nil {
+		return err
+	}
+
+	return bb.String()
+}
+
+func toXML(_ *interp.Interp, c any, opts ToXMLOpts) any {
+	if v, ok := gojqextra.Cast[map[string]any](c); ok {
+		return toXMLObject(gojqextra.NormalizeToStrings(v), opts)
+	} else if v, ok := gojqextra.Cast[[]any](c); ok {
+		return toXMLArray(gojqextra.NormalizeToStrings(v), opts)
+	}
+	return gojqextra.FuncTypeError{Name: "toxml", V: c}
+}
--- a/format/xml/xml.jq
+++ b/format/xml/xml.jq
@ -0,0 +1,2 @@
+def toxml: toxml(null);
+def _xml__todisplay: tovalue;
--- a/format/yaml/testdata/bigint.fqtest
+++ b/format/yaml/testdata/bigint.fqtest
@ -0,0 +1,5 @@
+$ fq -n "{a: bsl(1;100)} | toyaml | ., fromyaml"
+"a: \"1267650600228229401496703205376\"\n"
+{
+  "a": "1267650600228229401496703205376"
+}
--- a/format/yaml/testdata/trailing.fqtest
+++ b/format/yaml/testdata/trailing.fqtest
@ -0,0 +1,4 @@
+$ fq -n '"- a\ntrailing" | fromyaml._error.error'
+exitcode: 5
+stderr:
+error: error at position 0xc: yaml: line 2: could not find expected ':'
--- a/format/yaml/testdata/variants.json
+++ b/format/yaml/testdata/variants.json
@ -0,0 +1,22 @@
+[
+    null,
+    true,
+    false,
+    123,
+    123.123,
+    "string",
+    [1, 2, 3],
+    {
+        "array": [ true, false, null, 1.2, "string", [1.2, 3], {"a": 1} ],
+        "escape \\\"": 456,
+        "false": false,
+        "null": null,
+        "number": 1.2,
+        "object": {"a": 1},
+        "string": "string",
+        "true": true,
+        "white space": 123
+    },
+    [],
+    {}
+]
--- a/pkg/interp/testdata/encoding/yaml.fqtest
+++ b/pkg/interp/testdata/encoding/yaml.fqtest
@ -1,28 +1,37 @@
+/probe.yaml:
+test:
+  key: 123
+$ fq . probe.yaml
+{
+  "test": {
+    "key": 123
+  }
+}
 # TODO: add uint64 norm test
 $ fq -rRs 'fromjson[] | (try (toyaml | ., fromyaml) catch .), "----"' variants.json
 null

-null
+error at position 0x5: root not object or array
 ----
 true

-true
+error at position 0x5: root not object or array
 ----
 false

-false
+error at position 0x6: root not object or array
 ----
 123

-123
+error at position 0x4: root not object or array
 ----
 123.123

-123.123
+error at position 0x8: root not object or array
 ----
 string

-string
+error at position 0x7: root not object or array
 ----
 - 1
 - 2
@ -80,3 +89,11 @@ white space: 123
  "white space": 123
 }
 ----
+[]
+
+[]
+----
+{}
+
+{}
+----
--- a/format/yaml/yaml.go
+++ b/format/yaml/yaml.go
@ -0,0 +1,62 @@
+package yaml
+
+// TODO: yaml type eval? walk eval?
+
+import (
+	"embed"
+
+	"github.com/wader/fq/format"
+	"github.com/wader/fq/internal/gojqextra"
+	"github.com/wader/fq/pkg/bitio"
+	"github.com/wader/fq/pkg/decode"
+	"github.com/wader/fq/pkg/interp"
+	"github.com/wader/fq/pkg/scalar"
+	"gopkg.in/yaml.v3"
+)
+
+//go:embed yaml.jq
+var yamlFS embed.FS
+
+func init() {
+	interp.RegisterFormat(decode.Format{
+		Name:        format.YAML,
+		Description: "YAML Ain't Markup Language",
+		ProbeOrder:  format.ProbeOrderText,
+		Groups:      []string{format.PROBE},
+		DecodeFn:    decodeYAML,
+		Functions:   []string{"_todisplay"},
+		Files:       yamlFS,
+	})
+	interp.RegisterFunc0("toyaml", toYAML)
+}
+
+func decodeYAML(d *decode.D, _ any) any {
+	br := d.RawLen(d.Len())
+	var r any
+
+	if err := yaml.NewDecoder(bitio.NewIOReader(br)).Decode(&r); err != nil {
+		d.Fatalf("%s", err)
+	}
+	var s scalar.S
+	s.Actual = r
+
+	switch s.Actual.(type) {
+	case map[string]any,
+		[]any:
+	default:
+		d.Fatalf("root not object or array")
+	}
+
+	d.Value.V = &s
+	d.Value.Range.Len = d.Len()
+
+	return nil
+}
+
+func toYAML(_ *interp.Interp, c any) any {
+	b, err := yaml.Marshal(gojqextra.Normalize(c))
+	if err != nil {
+		return err
+	}
+	return string(b)
+}
--- a/format/yaml/yaml.jq
+++ b/format/yaml/yaml.jq
@ -0,0 +1 @@
+def _yaml__todisplay: tovalue;
--- a/go.mod
+++ b/go.mod
@ -14,6 +14,10 @@ require (
 	// bump: gomod-BurntSushi/toml command go get -d github.com/BurntSushi/toml@v$LATEST && go mod tidy
 	// bump: gomod-BurntSushi/toml link "Source diff $CURRENT..$LATEST" https://github.com/BurntSushi/toml/compare/v$CURRENT..v$LATEST
 	github.com/BurntSushi/toml v1.2.0
+	// bump: gomod-creasty-defaults /github\.com\/creasty\/defaults v(.*)/ https://github.com/creasty/defaults.git|^1
+	// bump: gomod-creasty-defaults command go get -d github.com/creasty/defaults@v$LATEST && go mod tidy
+	// bump: gomod-creasty-defaults link "Source diff $CURRENT..$LATEST" https://github.com/creasty/defaults/compare/v$CURRENT..v$LATEST
+	github.com/creasty/defaults v1.6.0
 	// bump: gomod-golang-snappy /github\.com\/golang\/snappy v(.*)/ https://github.com/golang/snappy.git|^0
 	// bump: gomod-golang-snappy command go get -d github.com/golang/snappy@v$LATEST && go mod tidy
 	// bump: gomod-golang-snappy link "Source diff $CURRENT..$LATEST" https://github.com/golang/snappy/compare/v$CURRENT..v$LATEST
@ -53,5 +57,7 @@ require (
 require (
 	github.com/itchyny/timefmt-go v0.1.3 // indirect
 	github.com/mitchellh/reflectwalk v1.0.2 // indirect
+	github.com/niemeyer/pretty v0.0.0-20200227124842-a10e7caefd8e // indirect
 	golang.org/x/sys v0.0.0-20220627191245-f75cf1eec38b // indirect
+	gopkg.in/check.v1 v1.0.0-20200227125254-8fa46927fb4f // indirect
 )
--- a/go.sum
+++ b/go.sum
@ -1,17 +1,24 @@
 github.com/BurntSushi/toml v1.2.0 h1:Rt8g24XnyGTyglgET/PRUNlrUeu9F5L+7FilkXfZgs0=
 github.com/BurntSushi/toml v1.2.0/go.mod h1:CxXYINrC8qIiEnFrOxCa7Jy5BFHlXnUU2pbicEuybxQ=
+github.com/creasty/defaults v1.6.0 h1:ltuE9cfphUtlrBeomuu8PEyISTXnxqkBIoQfXgv7BSc=
+github.com/creasty/defaults v1.6.0/go.mod h1:iGzKe6pbEHnpMPtfDXZEr0NVxWnPTjb1bbDy08fPzYM=
 github.com/golang/snappy v0.0.4 h1:yAGX7huGHXlcLOEtBnF4w7FQwA26wojNCwOYAEhLjQM=
 github.com/golang/snappy v0.0.4/go.mod h1:/XxbfmMg8lxefKM7IXC3fBNl/7bRcc72aCRzEWrmP2Q=
 github.com/google/gopacket v1.1.19 h1:ves8RnFZPGiFnTS0uPQStjwru6uO6h+nlr9j6fL7kF8=
 github.com/google/gopacket v1.1.19/go.mod h1:iJ8V8n6KS+z2U1A8pUwu8bW5SyEMkXJB8Yo/Vo+TKTo=
 github.com/itchyny/timefmt-go v0.1.3 h1:7M3LGVDsqcd0VZH2U+x393obrzZisp7C0uEe921iRkU=
 github.com/itchyny/timefmt-go v0.1.3/go.mod h1:0osSSCQSASBJMsIZnhAaF1C2fCBTJZXrnj37mG8/c+A=
+github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ=
+github.com/kr/text v0.1.0 h1:45sCR5RtlFHMR4UwH9sdQ5TC8v0qDQCHnXt+kaKSTVE=
+github.com/kr/text v0.1.0/go.mod h1:4Jbv+DJW3UT/LiOwJeYQe1efqtUx/iVham/4vfdArNI=
 github.com/mitchellh/copystructure v1.2.0 h1:vpKXTN4ewci03Vljg/q9QvCGUDttBOGBIa15WveJJGw=
 github.com/mitchellh/copystructure v1.2.0/go.mod h1:qLl+cE2AmVv+CoeAwDPye/v+N2HKCj9FbZEVFJRxO9s=
 github.com/mitchellh/mapstructure v1.5.0 h1:jeMsZIYE/09sWLaz43PL7Gy6RuMjD2eJVyuac5Z2hdY=
 github.com/mitchellh/mapstructure v1.5.0/go.mod h1:bFUtVrKA4DC2yAKiSyO/QUcy7e+RRV2QTWOzhPopBRo=
 github.com/mitchellh/reflectwalk v1.0.2 h1:G2LzWKi524PWgd3mLHV8Y5k7s6XUvT0Gef6zxSIeXaQ=
 github.com/mitchellh/reflectwalk v1.0.2/go.mod h1:mSTlrgnPZtwu0c4WaC2kGObEpuNDbx0jmZXqmk4esnw=
+github.com/niemeyer/pretty v0.0.0-20200227124842-a10e7caefd8e h1:fD57ERR4JtEqsWbfPhv4DMiApHyliiK5xCTNVSPiaAs=
+github.com/niemeyer/pretty v0.0.0-20200227124842-a10e7caefd8e/go.mod h1:zD1mROLANZcx1PVRCS0qkT7pwLkGfwJo4zjcN/Tysno=
 github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
 github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
 github.com/wader/gojq v0.12.1-0.20220703094036-0eed2734a1d7 h1:3IQ6iYU/tkMcEpYu64CfhzQZNemPevlQyOsiga5uN2o=
@ -39,7 +46,8 @@ golang.org/x/text v0.3.7/go.mod h1:u+2+/6zg+i71rQMx5EYifcz6MCKuco9NR6JIITiCfzQ=
 golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
 golang.org/x/tools v0.0.0-20200130002326-2f3ba24bd6e7/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28=
 golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
-gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
 gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
+gopkg.in/check.v1 v1.0.0-20200227125254-8fa46927fb4f h1:BLraFXnmrev5lT+xlilqcH8XK9/i0At2xKjWk4p6zsU=
+gopkg.in/check.v1 v1.0.0-20200227125254-8fa46927fb4f/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
 gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
 gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
--- a/internal/mapstruct/mapstruct.go
+++ b/internal/mapstruct/mapstruct.go
@ -8,6 +8,7 @@ import (
 	"regexp"
 	"strings"

+	"github.com/creasty/defaults"
 	"github.com/mitchellh/mapstructure"
 )

@ -21,6 +22,7 @@ func CamelToSnake(s string) string {
 }

 func ToStruct(m any, v any) error {
+	_ = defaults.Set(v)
 	ms, err := mapstructure.NewDecoder(&mapstructure.DecoderConfig{
 		MatchName: func(mapKey, fieldName string) bool {
 			return CamelToSnake(fieldName) == mapKey
--- a/pkg/decode/decode.go
+++ b/pkg/decode/decode.go
@ -7,8 +7,10 @@ import (
 	"io"
 	"io/ioutil"
 	"math/big"
+	"regexp"

 	"github.com/wader/fq/internal/bitioextra"
+	"github.com/wader/fq/internal/ioextra"
 	"github.com/wader/fq/internal/recoverfn"
 	"github.com/wader/fq/pkg/bitio"
 	"github.com/wader/fq/pkg/ranges"
@ -104,8 +106,8 @@ func decode(ctx context.Context, br bitio.ReaderAtSeeker, group Group, opts Opti
 				switch vv := d.Value.V.(type) {
 				case *Compound:
 					// TODO: hack, changes V
-					vv.Err = formatErr
 					d.Value.V = vv
+					d.Value.Err = formatErr
 				}

 				if len(group) != 1 {
@ -171,7 +173,6 @@ func newDecoder(ctx context.Context, format Format, br bitio.ReaderAtSeeker, opt
 		RangeSorted: true,
 		Children:    nil,
 		Description: opts.Description,
-		Format:      &format,
 	}

 	return &D{
@ -184,6 +185,7 @@ func newDecoder(ctx context.Context, format Format, br bitio.ReaderAtSeeker, opt
 			RootReader: br,
 			Range:      ranges.Range{Start: 0, Len: 0},
 			IsRoot:     opts.IsRoot,
+			Format:     &format,
 		},
 		Options: opts,

@ -1226,3 +1228,56 @@ func (d *D) FieldScalarFn(name string, sfn scalar.Fn, sms ...scalar.Mapper) *sca
 	}
 	return v
 }
+
+func (d *D) RE(reRef **regexp.Regexp, reStr string) []ranges.Range {
+	if *reRef == nil {
+		*reRef = regexp.MustCompile(reStr)
+	}
+
+	startPos := d.Pos()
+
+	rr := ioextra.ByteRuneReader{RS: bitio.NewIOReadSeeker(d.bitBuf)}
+	locs := (*reRef).FindReaderSubmatchIndex(rr)
+	if locs == nil {
+		return nil
+	}
+	d.SeekAbs(startPos)
+
+	var rs []ranges.Range
+	l := len(locs) / 2
+	for i := 0; i < l; i++ {
+		loc := locs[i*2 : i*2+2]
+		if loc[0] == -1 {
+			rs = append(rs, ranges.Range{Start: -1})
+		} else {
+			rs = append(rs, ranges.Range{
+				Start: startPos + int64(loc[0]*8),
+				Len:   int64((loc[1] - loc[0]) * 8)},
+			)
+		}
+	}
+
+	return rs
+}
+
+func (d *D) FieldRE(reRef **regexp.Regexp, reStr string, mRef *map[string]string, sms ...scalar.Mapper) {
+	if *reRef == nil {
+		*reRef = regexp.MustCompile(reStr)
+	}
+	subexpNames := (*reRef).SubexpNames()
+
+	rs := d.RE(reRef, reStr)
+	for i, r := range rs {
+		if i == 0 || r.Start == -1 {
+			continue
+		}
+		d.SeekAbs(r.Start)
+		name := subexpNames[i]
+		value := d.FieldUTF8(name, int(r.Len/8), sms...)
+		if mRef != nil {
+			(*mRef)[name] = value
+		}
+	}
+
+	d.SeekAbs(rs[0].Stop())
+}
--- a/pkg/decode/value.go
+++ b/pkg/decode/value.go
@ -1,8 +1,5 @@
 package decode

-// TODO: Encoding, u16le, varint etc, encode?
-// TODO: Value/Compound interface? can have per type and save memory
-
 import (
 	"errors"
 	"sort"
@ -16,20 +13,23 @@ type Compound struct {
 	IsArray     bool
 	RangeSorted bool
 	Children    []*Value
-
 	Description string
-	Format      *Format
-	Err         error
 }

+// TODO: Encoding, u16le, varint etc, encode?
+// TODO: Value/Compound interface? can have per type and save memory
+// TODO: Make some fields optional somehow? map/slice?
 type Value struct {
-	Parent     *Value
-	Name       string
-	V          any // scalar.S or Compound (array/struct)
-	Index      int // index in parent array/struct
-	Range      ranges.Range
-	RootReader bitio.ReaderAtSeeker
-	IsRoot     bool // TODO: rework?
+	Parent      *Value
+	Name        string
+	V           any // scalar.S or Compound (array/struct)
+	Index       int // index in parent array/struct
+	Range       ranges.Range
+	RootReader  bitio.ReaderAtSeeker
+	IsRoot      bool    // TODO: rework?
+	Format      *Format // TODO: rework
+	Description string
+	Err         error
 }

 type WalkFn func(v *Value, rootV *Value, depth int, rootDepth int) error
@ -148,12 +148,8 @@ func (v *Value) root(findSubRoot bool, findFormatRoot bool) *Value {
 		if findSubRoot && rootV.IsRoot {
 			break
 		}
-		if findFormatRoot {
-			if c, ok := rootV.V.(*Compound); ok {
-				if c.Format != nil {
-					break
-				}
-			}
+		if findFormatRoot && rootV.Format != nil {
+			break
 		}

 		rootV = rootV.Parent
@ -167,12 +163,9 @@ func (v *Value) FormatRoot() *Value { return v.root(true, true) }

 func (v *Value) Errors() []error {
 	var errs []error
-	_ = v.WalkPreOrder(func(_ *Value, rootV *Value, _ int, _ int) error {
-		switch vv := rootV.V.(type) {
-		case *Compound:
-			if vv.Err != nil {
-				errs = append(errs, vv.Err)
-			}
+	_ = v.WalkPreOrder(func(v *Value, _ *Value, _ int, _ int) error {
+		if v.Err != nil {
+			errs = append(errs, v.Err)
 		}
 		return nil
 	})
--- a/pkg/interp/decode.go
+++ b/pkg/interp/decode.go
@ -561,17 +561,11 @@ func (dvb decodeValueBase) JQValueKey(name string) any {
 	case "_path":
 		return valuePath(dv)
 	case "_error":
-		switch vv := dv.V.(type) {
-		case *decode.Compound:
-			var formatErr decode.FormatError
-			if errors.As(vv.Err, &formatErr) {
-				return formatErr.Value()
-
-			}
-			return vv.Err
-		default:
-			return nil
+		var formatErr decode.FormatError
+		if errors.As(dv.Err, &formatErr) {
+			return formatErr.Value()
 		}
+		return nil
 	case "_bits":
 		return Binary{
 			br:   dv.RootReader,
@ -585,23 +579,10 @@ func (dvb decodeValueBase) JQValueKey(name string) any {
 			unit: 8,
 		}
 	case "_format":
-		switch vv := dv.V.(type) {
-		case *decode.Compound:
-			if vv.Format != nil {
-				return vv.Format.Name
-			}
-			return nil
-		case *scalar.S:
-			// TODO: hack, Scalar interface?
-			switch vv.Actual.(type) {
-			case map[string]any, []any:
-				return "json"
-			default:
-				return nil
-			}
-		default:
-			return nil
+		if dv.Format != nil {
+			return dv.Format.Name
 		}
+		return nil
 	case "_out":
 		return dvb.out
 	case "_unknown":
--- a/pkg/interp/dump.go
+++ b/pkg/interp/dump.go
@ -142,18 +142,13 @@ func dumpEx(v *decode.Value, ctx *dumpCtx, depth int, rootV *decode.Value, rootD
 		if vv.Description != "" {
 			cfmt(colField, " %s", deco.Value.F(vv.Description))
 		}
-		if vv.Format != nil {
-			cfmt(colField, " (%s)", deco.Value.F(vv.Format.Name))
-		}
-
-		valueErr = vv.Err
 	case *scalar.S:
-		// TODO: rethink scalar array/struct (json format)
 		switch av := vv.Actual.(type) {
 		case map[string]any:
-			cfmt(colField, ": %s (%s)", deco.Object.F("{}"), deco.Value.F("json"))
+			cfmt(colField, ": %s", deco.Object.F("{}"))
 		case []any:
-			cfmt(colField, ": %s%s:%s%s (%s)", deco.Index.F("["), deco.Number.F("0"), deco.Number.F(strconv.Itoa(len(av))), deco.Index.F("]"), deco.Value.F("json"))
+			// TODO: format?
+			cfmt(colField, ": %s%s:%s%s", deco.Index.F("["), deco.Number.F("0"), deco.Number.F(strconv.Itoa(len(av))), deco.Index.F("]"))
 		default:
 			cprint(colField, ":")
 			if vv.Sym == nil {
@ -162,20 +157,23 @@ func dumpEx(v *decode.Value, ctx *dumpCtx, depth int, rootV *decode.Value, rootD
 				cfmt(colField, " %s", deco.ValueColor(vv.Sym).F(previewValue(vv.Sym, vv.SymDisplay)))
 				cfmt(colField, " (%s)", deco.ValueColor(vv.Actual).F(previewValue(vv.Actual, vv.ActualDisplay)))
 			}
+		}

-			if opts.Verbose && isInArray {
-				cfmt(colField, " %s", v.Name)
-			}
-
-			// TODO: similar to struct/array?
-			if vv.Description != "" {
-				cfmt(colField, fmt.Sprintf(" (%s)", deco.Value.F(vv.Description)))
-			}
+		if opts.Verbose && isInArray {
+			cfmt(colField, " %s", v.Name)
+		}
+		if vv.Description != "" {
+			cfmt(colField, " (%s)", deco.Value.F(vv.Description))
 		}
 	default:
 		panic(fmt.Sprintf("unreachable vv %#+v", vv))
 	}

+	if v.Format != nil {
+		cfmt(colField, " (%s)", deco.Value.F(v.Format.Name))
+	}
+	valueErr = v.Err
+
 	innerRange := v.InnerRange()

 	if opts.Verbose {
--- a/pkg/interp/encoding.go
+++ b/pkg/interp/encoding.go
--- a/pkg/interp/encoding.jq
+++ b/pkg/interp/encoding.jq
@ -1,240 +0,0 @@
-include "internal";
-include "binary";
-
-# convert all scalars to strings, null as empty string (same as @csv)
-def _walk_tostring:
-  walk(
-    if _is_null then ""
-    elif _is_scalar then tostring
-    end
-  );
-# overloads builtin tojson to have options
-def tojson($opts): _tojson({} + $opts);
-def tojson: tojson(null);
-
-def fromxml($opts): _fromxml({} + $opts);
-def fromxml: _fromxml(null);
-def toxml($opts): _walk_tostring | _toxml({} + $opts);
-def toxml: toxml(null);
-
-def fromhtml($opts): _fromhtml({} + $opts);
-def fromhtml: fromhtml(null);
-
-def fromyaml: _fromyaml;
-def toyaml: _toyaml;
-
-def fromtoml: _fromtoml;
-def totoml: _totoml;
-
-def fromcsv($opts): _fromcsv({comma: ",", comment: "#"} + $opts);
-def fromcsv: fromcsv(null);
-def tocsv($opts): _walk_tostring | _tocsv({comma: ","} + $opts);
-def tocsv: tocsv(null);
-
-def fromxmlentities: _fromxmlentities;
-def toxmlentities: _toxmlentities;
-
-def fromurlpath: _fromurlpath;
-def tourlpath: _tourlpath;
-
-def fromurlencode: _fromurlencode;
-def tourlencode: _tourlencode;
-
-def fromurlquery: _fromurlquery;
-def tourlquery: _tourlquery;
-
-def fromurl: _fromurl;
-def tourl: _tourl;
-
-def fromhex: _fromhex;
-def tohex: _tohex;
-
-def frombase64($opts): _frombase64({encoding: "std"} + $opts);
-def frombase64: _frombase64(null);
-def tobase64($opts): _tobase64({encoding: "std"} + $opts);
-def tobase64: _tobase64(null);
-
-def tomd4: _tohash({name: "md4"});
-def tomd5: _tohash({name: "md5"});
-def tosha1: _tohash({name: "sha1"});
-def tosha256: _tohash({name: "sha256"});
-def tosha512: _tohash({name: "sha512"});
-def tosha3_224: _tohash({name: "sha3_224"});
-def tosha3_256: _tohash({name: "sha3_256"});
-def tosha3_384: _tohash({name: "sha3_384"});
-def tosha3_512: _tohash({name: "sha3_512"});
-
-# _tostrencoding/_fromstrencoding can do more but not exposed as functions yet
-def toiso8859_1: _tostrencoding({encoding: "ISO8859_1"});
-def fromiso8859_1: _fromstrencoding({encoding: "ISO8859_1"});
-def toutf8: _tostrencoding({encoding: "UTF8"});
-def fromutf8: _fromstrencoding({encoding: "UTF8"});
-def toutf16: _tostrencoding({encoding: "UTF16"});
-def fromutf16: _fromstrencoding({encoding: "UTF16"});
-def toutf16le: _tostrencoding({encoding: "UTF16LE"});
-def fromutf16le: _fromstrencoding({encoding: "UTF16LE"});
-def toutf16be: _tostrencoding({encoding: "UTF16BE"});
-def fromutf16be: _fromstrencoding({encoding: "UTF16BE"});
-
-# https://en.wikipedia.org/wiki/Privacy-Enhanced_Mail
-# TODO: add test
-def frompem:
-  ( tobytes
-  | tostring
-  | capture("-----BEGIN(.*?)-----(?<s>.*?)-----END(.*?)-----"; "mg").s
-  | frombase64
-  ) // error("no pem header or footer found");
-
-def topem($label):
-  ( tobytes
-  | tobase64
-  | ($label | if $label != "" then " " + $label end) as $label
-  | [ "-----BEGIN\($label)-----"
-    , .
-    , "-----END\($label)-----"
-    , ""
-    ]
-  | join("\n")
-  );
-def topem: topem("");
-
-def fromradix($base; $table):
-  ( if _is_string | not then error("cannot fromradix convert: \(.)") end
-  | split("")
-  | reverse
-  | map($table[.])
-  | if . == null then error("invalid char \(.)") end
-  # state: [power, ans]
-  | reduce .[] as $c ([1,0];
-      ( (.[0] * $base) as $b
-      | [$b, .[1] + (.[0] * $c)]
-      )
-    )
-  | .[1]
-  );
-def fromradix($base):
-  fromradix($base; {
-    "0": 0, "1": 1, "2": 2, "3": 3,"4": 4, "5": 5, "6": 6, "7": 7, "8": 8, "9": 9,
-    "a": 10, "b": 11, "c": 12, "d": 13, "e": 14, "f": 15, "g": 16,
-    "h": 17, "i": 18, "j": 19, "k": 20, "l": 21, "m": 22, "n": 23,
-    "o": 24, "p": 25, "q": 26, "r": 27, "s": 28, "t": 29, "u": 30,
-    "v": 31, "w": 32, "x": 33, "y": 34, "z": 35,
-    "A": 36, "B": 37, "C": 38, "D": 39, "E": 40, "F": 41, "G": 42,
-    "H": 43, "I": 44, "J": 45, "K": 46, "L": 47, "M": 48, "N": 49,
-    "O": 50, "P": 51, "Q": 52, "R": 53, "S": 54, "T": 55, "U": 56,
-    "V": 57, "W": 58, "X": 59, "Y": 60, "Z": 61,
-    "@": 62, "_": 63,
-  });
-
-def toradix($base; $table):
-  ( if type != "number" then error("cannot toradix convert: \(.)") end
-  | if . == 0 then "0"
-    else
-      ( [ recurse(if . > 0 then _intdiv(.; $base) else empty end) | . % $base]
-      | reverse
-      | .[1:]
-      | if $base <= ($table | length) then
-          map($table[.]) | join("")
-        else
-          error("base too large")
-        end
-      )
-    end
-  );
-def toradix($base):
-  toradix($base; "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_");
-
-# to jq-flavoured json
-def _tojq($opts):
-  def _is_ident: test("^[a-zA-Z_][a-zA-Z_0-9]*$");
-  def _key: if _is_ident | not then tojson end;
-  def _f($opts; $indent):
-    def _r($prefix):
-      ( type as $t
-      | if $t == "null" then tojson
-        elif $t == "string" then tojson
-        elif $t == "number" then tojson
-        elif $t == "boolean" then tojson
-        elif $t == "array" then
-          [ "[", $opts.compound_newline
-          , ( [ .[]
-              | $prefix, $indent
-              , _r($prefix+$indent), $opts.array_sep
-              ]
-            | .[0:-1]
-            )
-          , $opts.compound_newline
-          , $prefix, "]"
-          ]
-        elif $t == "object" then
-          [ "{", $opts.compound_newline
-          , ( [ to_entries[]
-              | $prefix, $indent
-              , (.key | _key), $opts.key_sep
-              , (.value | _r($prefix+$indent)), $opts.object_sep
-              ]
-            | .[0:-1]
-            )
-          , $opts.compound_newline
-          , $prefix, "}"
-          ]
-        else error("unknown type \($t)")
-        end
-      );
-    _r("");
-  ( _f($opts; $opts.indent * " ")
-  | if _is_array then flatten | join("") end
-  );
-def tojq($opts):
-  _tojq(
-    ( { indent: 0,
-        key_sep: ":",
-        object_sep: ",",
-        array_sep: ",",
-        compound_newline: "",
-      } + $opts
-    | if .indent > 0  then
-        ( .key_sep = ": "
-        | .object_sep = ",\n"
-        | .array_sep = ",\n"
-        | .compound_newline = "\n"
-        )
-      end
-    )
-  );
-def tojq: tojq(null);
-
-# from jq-flavoured json
-def fromjq:
-  def _f:
-    ( . as $v
-    | .term.type
-    | if . == "TermTypeNull" then null
-      elif . == "TermTypeTrue" then true
-      elif . == "TermTypeFalse" then false
-      elif . == "TermTypeString" then $v.term.str.str
-      elif . == "TermTypeNumber" then $v.term.number | tonumber
-      elif . == "TermTypeObject" then
-        ( $v.term.object.key_vals
-        | map(
-            { key: (.key // .key_string.str),
-              value: (.val.queries[0] | _f)
-            }
-          )
-        | from_entries
-        )
-      elif . == "TermTypeArray" then
-        ( def _a: if .op then .left, .right | _a end;
-          [$v.term.array.query | _a | _f]
-        )
-      else error("unknown term")
-      end
-    );
-  try
-    (_query_fromstring | _f)
-  catch
-    error("fromjq only supports constant literals");
-
-# TODO: compat remove at some point
-def hex: _binary_or_orig(tohex; fromhex);
-def base64: _binary_or_orig(tobase64; frombase64);
--- a/pkg/interp/format_decode.jq
+++ b/pkg/interp/format_decode.jq
@ -6,4 +6,6 @@
 | select(.key != "all")
 | "def \(.key)($opts): decode(\(.key | tojson); $opts);"
 , "def \(.key): decode(\(.key | tojson); {});"
+, "def from\(.key)($opts): decode(\(.key | tojson); $opts) | if ._error then error(._error.error) end;"
+, "def from\(.key): from\(.key)({});"
 ] | join("\n")
--- a/pkg/interp/funcs.jq
+++ b/pkg/interp/funcs.jq
@ -2,8 +2,6 @@ include "internal";
 include "options";
 include "binary";
 include "decode";
-include "encoding";
-

 def intdiv(a; b): _intdiv(a; b);

--- a/pkg/interp/interp.go
+++ b/pkg/interp/interp.go
@ -36,7 +36,6 @@ import (
 //go:embed interp.jq
 //go:embed internal.jq
 //go:embed options.jq
-//go:embed encoding.jq
 //go:embed binary.jq
 //go:embed decode.jq
 //go:embed format_decode.jq
--- a/pkg/interp/interp.jq
+++ b/pkg/interp/interp.jq
@ -5,11 +5,17 @@ include "decode";
 def _display_default_opts:
  options({depth: 1});

-def _display_default_opts:
-  options({depth: 1});
+def _todisplay:
+  ( format as $f
+  # TODO: not sure about the error check here
+  | if $f == null or ._error != null then error("value is not a format root or has errors") end
+  | _format_func($f; "_todisplay")
+  );

 def display($opts):
-  ( options($opts) as $opts
+  ( . as $c
+  | options($opts) as $opts
+  | try _todisplay catch $c
  | if _can_display then _display($opts)
    else
      ( if _is_string and $opts.raw_string then print
--- a/pkg/interp/testdata/args.fqtest
+++ b/pkg/interp/testdata/args.fqtest
@ -122,6 +122,7 @@ bsd_loopback_frame   BSD loopback frame
 bson                 Binary JSON
 bzip2                bzip2 compression
 cbor                 Concise Binary Object Representation
+csv                  Comma separated values
 dns                  DNS packet
 dns_tcp              DNS packet (TCP)
 elf                  Executable and Linkable Format
@ -143,6 +144,7 @@ hevc_nalu            H.265/HEVC Network Access Layer Unit
 hevc_pps             H.265/HEVC Picture Parameter Set
 hevc_sps             H.265/HEVC Sequence Parameter Set
 hevc_vps             H.265/HEVC Video Parameter Set
+html                 HyperText Markup Language
 icc_profile          International Color Consortium profile
 icmp                 Internet Control Message Protocol
 icmpv6               Internet Control Message Protocol v6
@ -152,7 +154,7 @@ id3v2                ID3v2 metadata
 ipv4_packet          Internet protocol v4 packet
 ipv6_packet          Internet protocol v6 packet
 jpeg                 Joint Photographic Experts Group file
-json                 JSON
+json                 JavaScript Object Notation
 macho                Mach-O macOS executable
 matroska             Matroska file
 mp3                  MP3 file
@ -181,6 +183,7 @@ sll_packet           Linux cooked capture encapsulation
 tar                  Tar archive
 tcp_segment          Transmission control protocol segment
 tiff                 Tag Image File Format
+toml                 Tom's Obvious, Minimal Language
 udp_datagram         User datagram protocol
 vorbis_comment       Vorbis comment
 vorbis_packet        Vorbis packet
@ -191,6 +194,8 @@ vpx_ccr              VPX Codec Configuration Record
 wav                  WAV file
 webp                 WebP image
 xing                 Xing header
+xml                  Extensible Markup Language
+yaml                 YAML Ain't Markup Language
 zip                  ZIP archive
 $ fq -X
 exitcode: 2
--- a/pkg/interp/testdata/encoding/bigint.fqtest
+++ b/pkg/interp/testdata/encoding/bigint.fqtest
@ -1,36 +0,0 @@
-$ fq -i
-null> {a: bsl(1;100)} | repl
-> object> tojq | ., fromjq
-"{a:1267650600228229401496703205376}"
-{
-  "a": 1267650600228229401496703205376
-}
-> object> tojson | ., fromjson
-"{\"a\":1267650600228229401496703205376}"
-{
-  "a": 1267650600228229401496703205376
-}
-> object> toyaml | ., fromyaml
-"a: \"1267650600228229401496703205376\"\n"
-{
-  "a": "1267650600228229401496703205376"
-}
-> object> totoml | ., fromtoml
-"a = \"1267650600228229401496703205376\"\n"
-{
-  "a": "1267650600228229401496703205376"
-}
-> object> toxml | ., fromxml
-"<a>1267650600228229401496703205376</a>"
-{
-  "a": "1267650600228229401496703205376"
-}
-> object> ^D
-null> [[bsl(1;100)]] | tocsv | ., fromcsv
-"1267650600228229401496703205376\n"
-[
-  [
-    "1267650600228229401496703205376"
-  ]
-]
-null> ^D
--- a/pkg/interp/testdata/encoding/json.fqtest
+++ b/pkg/interp/testdata/encoding/json.fqtest
@ -1,136 +0,0 @@
-$ fq -rRs 'fromjson[] | (tojson | ., fromjson), "----", (tojson({indent:2}) | ., fromjson), "----"' variants.json
-null
-null
----
-null
-null
----
-true
-true
----
-true
-true
----
-false
-false
----
-false
-false
----
-123
-123
----
-123
-123
----
-123.123
-123.123
----
-123.123
-123.123
----
-"string"
-string
----
-"string"
-string
----
-[1,2,3]
-[
-  1,
-  2,
-  3
-]
----
-[
-  1,
-  2,
-  3
-]
-[
-  1,
-  2,
-  3
-]
----
-{"array":[true,false,null,1.2,"string",[1.2,3],{"a":1}],"escape \\\"":456,"false":false,"null":null,"number":1.2,"object":{"a":1},"string":"string","true":true,"white space":123}
-{
-  "array": [
-    true,
-    false,
-    null,
-    1.2,
-    "string",
-    [
-      1.2,
-      3
-    ],
-    {
-      "a": 1
-    }
-  ],
-  "escape \\\"": 456,
-  "false": false,
-  "null": null,
-  "number": 1.2,
-  "object": {
-    "a": 1
-  },
-  "string": "string",
-  "true": true,
-  "white space": 123
-}
----
-{
-  "array": [
-    true,
-    false,
-    null,
-    1.2,
-    "string",
-    [
-      1.2,
-      3
-    ],
-    {
-      "a": 1
-    }
-  ],
-  "escape \\\"": 456,
-  "false": false,
-  "null": null,
-  "number": 1.2,
-  "object": {
-    "a": 1
-  },
-  "string": "string",
-  "true": true,
-  "white space": 123
-}
-{
-  "array": [
-    true,
-    false,
-    null,
-    1.2,
-    "string",
-    [
-      1.2,
-      3
-    ],
-    {
-      "a": 1
-    }
-  ],
-  "escape \\\"": 456,
-  "false": false,
-  "null": null,
-  "number": 1.2,
-  "object": {
-    "a": 1
-  },
-  "string": "string",
-  "true": true,
-  "white space": 123
-}
----
--- a/pkg/interp/testdata/encoding/tsv.fqtest
+++ b/pkg/interp/testdata/encoding/tsv.fqtest
@ -1 +0,0 @@
-# TODO
--- a/pkg/interp/testdata/value_json_array.fqtest
+++ b/pkg/interp/testdata/value_json_array.fqtest
@ -1,7 +1,6 @@
 $ fq -i -n '"[]" | json'
 json> (.) | ., tovalue, type, length?
-   |00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f|0123456789abcdef|
-0x0|5b 5d|                                         |[]|             |.: [0:0] (json)
+[]
 []
 "array"
 0
--- a/pkg/interp/testdata/value_json_object.fqtest
+++ b/pkg/interp/testdata/value_json_object.fqtest
@ -1,7 +1,6 @@
 $ fq -i -n '"{}" | json'
 json> (.) | ., tovalue, type, length?
-   |00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f|0123456789abcdef|
-0x0|7b 7d|                                         |{}|             |.: {} (json)
+{}
 {}
 "object"
 0