fq/formats.md at 1a0ea163631f6ee60e8d01a8c872e8ec5ba6d9c8

wader/fq

mirror of https://github.com/wader/fq.git synced 2024-12-23 13:22:58 +03:00

Mattias Wadman 2fc16ae22a doc: Add some padding margin to formats table to make it less likely to cause git conflicts

2022-12-05 12:25:00 +01:00

40 KiB

Raw Blame History

Supported formats

Name	Description	Dependencies
`aac_frame`	Advanced Audio Coding frame
`adts`	Audio Data Transport Stream	_{adts_frame}
`adts_frame`	Audio Data Transport Stream frame	_{aac_frame}
`amf0`	Action Message Format 0
`apev2`	APEv2 metadata tag	_image
`ar`	Unix archive	_probe
`asn1_ber`	ASN1 BER (basic encoding rules, also CER and DER)
`av1_ccr`	AV1 Codec Configuration Record
`av1_frame`	AV1 frame	_{av1_obu}
`av1_obu`	AV1 Open Bitstream Unit
`avc_annexb`	H.264/AVC Annex B	_{avc_nalu}
`avc_au`	H.264/AVC Access Unit	_{avc_nalu}
`avc_dcr`	H.264/AVC Decoder Configuration Record	_{avc_nalu}
`avc_nalu`	H.264/AVC Network Access Layer Unit	_{avc_sps avc_pps avc_sei}
`avc_pps`	H.264/AVC Picture Parameter Set
`avc_sei`	H.264/AVC Supplemental Enhancement Information
`avc_sps`	H.264/AVC Sequence Parameter Set
`avi`	Audio Video Interleaved	_{avc_au hevc_au mp3_frame flac_frame}
`avro_ocf`	Avro object container file
`bencode`	BitTorrent bencoding
`bitcoin_blkdat`	Bitcoin blk.dat	_{bitcoin_block}
`bitcoin_block`	Bitcoin block	_{bitcoin_transaction}
`bitcoin_script`	Bitcoin script
`bitcoin_transaction`	Bitcoin transaction	_{bitcoin_script}
`bits`	Raw bits
`bplist`	Apple Binary Property List
`bsd_loopback_frame`	BSD loopback frame	_{inet_packet}
`bson`	Binary JSON
`bytes`	Raw bytes
`bzip2`	bzip2 compression	_probe
`cbor`	Concise Binary Object Representation
`csv`	Comma separated values
`dns`	DNS packet
`dns_tcp`	DNS packet (TCP)
`elf`	Executable and Linkable Format
`ether8023_frame`	Ethernet 802.3 frame	_{inet_packet}
`exif`	Exchangeable Image File Format
`fairplay_spc`	FairPlay Server Playback Context
`flac`	Free Lossless Audio Codec file	_{flac_metadatablocks flac_frame}
`flac_frame`	FLAC frame
`flac_metadatablock`	FLAC metadatablock	_{flac_streaminfo flac_picture vorbis_comment}
`flac_metadatablocks`	FLAC metadatablocks	_{flac_metadatablock}
`flac_picture`	FLAC metadatablock picture	_image
`flac_streaminfo`	FLAC streaminfo
`gif`	Graphics Interchange Format
`gzip`	gzip compression	_probe
`hevc_annexb`	H.265/HEVC Annex B	_{hevc_nalu}
`hevc_au`	H.265/HEVC Access Unit	_{hevc_nalu}
`hevc_dcr`	H.265/HEVC Decoder Configuration Record	_{hevc_nalu}
`hevc_nalu`	H.265/HEVC Network Access Layer Unit	_{hevc_vps hevc_pps hevc_sps}
`hevc_pps`	H.265/HEVC Picture Parameter Set
`hevc_sps`	H.265/HEVC Sequence Parameter Set
`hevc_vps`	H.265/HEVC Video Parameter Set
`html`	HyperText Markup Language
`icc_profile`	International Color Consortium profile
`icmp`	Internet Control Message Protocol
`icmpv6`	Internet Control Message Protocol v6
`id3v1`	ID3v1 metadata
`id3v11`	ID3v1.1 metadata
`id3v2`	ID3v2 metadata	_image
`ipv4_packet`	Internet protocol v4 packet	_{ip_packet}
`ipv6_packet`	Internet protocol v6 packet	_{ip_packet}
`jpeg`	Joint Photographic Experts Group file	_{exif icc_profile}
`json`	JavaScript Object Notation
`jsonl`	JavaScript Object Notation Lines
`macho`	Mach-O macOS executable
`macho_fat`	Fat Mach-O macOS executable (multi-architecture)	_macho
`markdown`	Markdown
`matroska`	Matroska file	_{aac_frame av1_ccr av1_frame avc_au avc_dcr flac_frame flac_metadatablocks hevc_au hevc_dcr image mp3_frame mpeg_asc mpeg_pes_packet mpeg_spu opus_packet vorbis_packet vp8_frame vp9_cfm vp9_frame}
`mp3`	MP3 file	_{id3v2 id3v1 id3v11 apev2 mp3_frame}
`mp3_frame`	MPEG audio layer 3 frame	_{mp3_frame_tags}
`mp3_frame_tags`	MP3 frame info/xing tags
`mp4`	ISOBMFF, QuickTime and similar	_{aac_frame av1_ccr av1_frame avc_au avc_dcr flac_frame flac_metadatablocks hevc_au hevc_dcr icc_profile id3v2 image jpeg mp3_frame mpeg_es mpeg_pes_packet opus_packet png prores_frame protobuf_widevine pssh_playready vorbis_packet vp9_frame vpx_ccr}
`mpeg_asc`	MPEG-4 Audio Specific Config
`mpeg_es`	MPEG Elementary Stream	_{mpeg_asc vorbis_packet}
`mpeg_pes`	MPEG Packetized elementary stream	_{mpeg_pes_packet mpeg_spu}
`mpeg_pes_packet`	MPEG Packetized elementary stream packet
`mpeg_spu`	Sub Picture Unit (DVD subtitle)
`mpeg_ts`	MPEG Transport Stream
`msgpack`	MessagePack
`ogg`	OGG file	_{ogg_page vorbis_packet opus_packet flac_metadatablock flac_frame}
`ogg_page`	OGG page
`opus_packet`	Opus packet	_{vorbis_comment}
`pcap`	PCAP packet capture	_{link_frame tcp_stream ipv4_packet}
`pcapng`	PCAPNG packet capture	_{link_frame tcp_stream ipv4_packet}
`png`	Portable Network Graphics file	_{icc_profile exif}
`prores_frame`	Apple ProRes frame
`protobuf`	Protobuf
`protobuf_widevine`	Widevine protobuf	_protobuf
`pssh_playready`	PlayReady PSSH
`rtmp`	Real-Time Messaging Protocol	_{amf0 mpeg_asc}
`sll2_packet`	Linux cooked capture encapsulation v2	_{inet_packet}
`sll_packet`	Linux cooked capture encapsulation	_{inet_packet}
`tar`	Tar archive	_probe
`tcp_segment`	Transmission control protocol segment
`tiff`	Tag Image File Format	_{icc_profile}
`toml`	Tom's Obvious, Minimal Language
`tzif`	Time Zone Information Format
`udp_datagram`	User datagram protocol	_{udp_payload}
`vorbis_comment`	Vorbis comment	_{flac_picture}
`vorbis_packet`	Vorbis packet	_{vorbis_comment}
`vp8_frame`	VP8 frame
`vp9_cfm`	VP9 Codec Feature Metadata
`vp9_frame`	VP9 frame
`vpx_ccr`	VPX Codec Configuration Record
`wasm`	WebAssembly Binary Format
`wav`	WAV file	_{id3v2 id3v1 id3v11}
`webp`	WebP image	_{vp8_frame}
`xml`	Extensible Markup Language
`yaml`	YAML Ain't Markup Language
`zip`	ZIP archive	_probe
`image`	Group	_{gif jpeg mp4 png tiff webp}
`inet_packet`	Group	_{ipv4_packet ipv6_packet}
`ip_packet`	Group	_{icmp icmpv6 tcp_segment udp_datagram}
`link_frame`	Group	_{bsd_loopback_frame ether8023_frame sll2_packet sll_packet}
`probe`	Group	_{adts ar avi avro_ocf bitcoin_blkdat bplist bzip2 elf flac gif gzip jpeg json jsonl macho macho_fat matroska mp3 mp4 mpeg_ts ogg pcap pcapng png tar tiff toml tzif wasm wav webp xml yaml zip}
`tcp_stream`	Group	_{dns_tcp rtmp}
`udp_payload`	Group	_dns

Global format options

Currently the only global option is force and is used to ignore some format assertion errors. It can be used as a decode option or as a CLI -o option:

fq -d mp4 -o force=true file.mp4
fq -d bytes 'mp4({force: true})' file.mp4

Format details

aac_frame

Options

Name	Default	Description
`object_type`	1	Audio object type

Examples

Decode file using aac_frame options

$ fq -d aac_frame -o object_type=1 . file

Decode value as aac_frame

... | aac_frame({object_type:1})

asn1_ber

Supports decoding BER, CER and DER (X.690).

Currently no extra validation is done for CER and DER.
Does not support specifying a schema.
Supports torepr but without schema all sequences and sets will be arrays.

Can be used to decode certificates etc

$ fq -d bytes 'frompem | asn1_ber | d' cert.pem

Can decode nested values

$ fq -d asn1_ber '.constructed[1].value | asn1_ber' file.ber

Manual schema

$ fq -d asn1_ber 'torepr as $r | ["version", "modulus", "private_exponent", "private_exponen", "prime1", "prime2", "exponent1", "exponent2", "coefficient"] | with_entries({key: .value, value: $r[.key]})' pkcs1.der

References

avc_au

Options

Name	Default	Description
`length_size`	0	Length value size

Examples

Decode file using avc_au options

$ fq -d avc_au -o length_size=0 . file

Decode value as avc_au

... | avc_au({length_size:0})

avi

Options

Name	Default	Description
`decode_samples`	true	Decode supported media samples

Examples

Decode file using avi options

$ fq -d avi -o decode_samples=true . file

Decode value as avi

... | avi({decode_samples:true})

Samples

AVI has many redundant ways to index samples so currently .streams[].samples will only include samples the most "modern" way used in the file. That is in order of stream super index, movi ix index then idx1 index.

Extract samples for stream 1

$ fq '.streams[1].samples[] | tobytes' file.avi > stream01.mp3

Show stream summary

$ fq -o decode_samples=false '[.chunks[0] | grep_by(.id=="LIST" and .type=="strl") | grep_by(.id=="strh") as {$type} | grep_by(.id=="strf") as {$format_tag, $compression} | {$type,$format_tag,$compression}]' *.avi

References

avro_ocf

Supports reading Avro Object Container Format (OCF) files based on the 1.11.0 specification.

Capable of handling null, deflate, and snappy codecs for data compression.

Limitations:

Schema does not support self-referential types, only built-in types.
Decimal logical types are not supported for decoding, will just be treated as their primitive type

References

https://avro.apache.org/docs/current/spec.html#Object+Container+Files

Authors

Xentripetal xentripetal@fastmail.com @xentripetal

bencode

Convert represented value to JSON

$ fq -d bencode torepr file.torrent

References

https://wiki.theory.org/BitTorrentSpecification#Bencoding

bitcoin_block

Options

Name	Default	Description
`has_header`	false	Has blkdat header

Examples

Decode file using bitcoin_block options

$ fq -d bitcoin_block -o has_header=false . file

Decode value as bitcoin_block

... | bitcoin_block({has_header:false})

bits

Decode to a slice and indexable binary of bits.

Slice and decode bit range

$ echo 'some {"a":1} json' | fq -d bits '.[40:-48] | fromjson'
{
  "a": 1
}

Index bits

✗ echo 'hello' | fq -d bits '.[4]'
1
$ echo 'hello' | fq -c -d bits '[.[range(8)]]'
[0,1,1,0,1,0,0,0]

bplist

Show full decoding

$ fq d Info.plist

Timestamps

Timestamps in Apple Binary Property Lists are encoded as Cocoa Core Data timestamps, where the raw value is the floating point number of seconds since January 1, 2001. By default, fq will render the raw floating point value. In order to get the raw value or string description, use the todescription function, you can use the tovalue and todescription functions:

$ fq 'torepr.SomeTimeStamp | tovalue' Info.plist
685135328

$ fq 'torepr.SomeTimeStamp | todescription' Info.plist
"2022-09-17T19:22:08Z"

Get JSON representation

$ fq torepr com.apple.UIAutomation.plist
{
  "UIAutomationEnabled": true
}

Authors

David McDonald @dgmcdona

References

bson

Convert represented value to JSON

$ fq -d bson torepr file.bson

Filter represented value

$ fq -d bson 'torepr | select(.name=="bob")' file.bson

References

https://bsonspec.org/spec.html

bytes

Decode to a slice and indexable binary of bytes.

Slice out byte ranges

$ echo -n 'hello' | fq -d bytes '.[-3:]' > last_3_bytes
$ echo -n 'hello' | fq -d bytes '[.[-2:], .[0:2]] | tobytes' > first_last_2_bytes_swapped

Slice and decode byte range

$ echo 'some {"a":1} json' | fq -d bytes '.[5:-6] | fromjson'
{
  "a": 1
}

Index bytes

$ echo 'hello' | fq -d bytes '.[1]'
101

cbor

Convert represented value to JSON

$ fq -d cbor torepr file.cbor

References

csv

Options

Name	Default	Description
`comma`	,	Separator character
`comment`	#	Comment line character

Examples

Decode file using csv options

$ fq -d csv -o comma="," -o comment="#" . file

Decode value as csv

... | csv({comma:",",comment:"#"})

TSV to CSV

$ fq -d csv -o comma="\t" tocsv file.tsv

Convert rows to objects based on header row

$ fq -d csv '.[0] as $t | .[1:] | map(with_entries(.key = $t[.key]))' file.csv

flac_frame

Options

Name	Default	Description
`bits_per_sample`	16	Bits per sample

Examples

Decode file using flac_frame options

$ fq -d flac_frame -o bits_per_sample=16 . file

Decode value as flac_frame

... | flac_frame({bits_per_sample:16})

hevc_au

Options

Name	Default	Description
`length_size`	4	Length value size

Examples

Decode file using hevc_au options

$ fq -d hevc_au -o length_size=4 . file

Decode value as hevc_au

... | hevc_au({length_size:4})

html

Options

Name	Default	Description
`array`	false	Decode as nested arrays
`attribute_prefix`	@	Prefix for attribute keys
`seq`	false	Use seq attribute to preserve element order

Examples

Decode file using html options

$ fq -d html -o array=false -o attribute_prefix="@" -o seq=false . file

Decode value as html

... | html({array:false,attribute_prefix:"@",seq:false})

HTML is decoded in HTML5 mode and will always include <html>, <body> and <head> element.

See xml format for more examples and how to preserve element order and how to encode to xml.

There is no tohtml function, see toxml instead.

Element as object

# decode as object is the default
$ echo '<a href="url">text</a>' | fq -d html
{
  "html": {
    "body": {
      "a": {
        "#text": "text",
        "@href": "url"
      }
    },
    "head": ""
  }
}

Element as array

$ '<a href="url">text</a>' | fq -d html -o array=true
[
  "html",
  null,
  [
    [
      "head",
      null,
      []
    ],
    [
      "body",
      null,
      [
        [
          "a",
          {
            "#text": "text",
            "href": "url"
          },
          []
        ]
      ]
    ]
  ]
]

# decode html files to a {file: "title", ...} object
$ fq -n -d html '[inputs | {key: input_filename, value: .html.head.title?}] | from_entries' *.html

# <a> href:s in file
$ fq -r -o array=true -d html '.. | select(.[0] == "a" and .[1].href)?.[1].href' file.html

macho

Supports decoding vanilla and FAT Mach-O binaries.

Select 64bit load segments

$ fq '.load_commands[] | select(.cmd=="segment_64")' file

References

https://github.com/aidansteele/osx-abi-macho-file-format-reference

Authors

Sıddık AÇIL acils@itu.edu.tr @Akaame

markdown

Array with all level 1 and 2 headers

$ fq -d markdown '[.. | select(.type=="heading" and .level<=2)?.children[0]]' file.md

matroska

Lookup element using path

$ fq 'matroska_path(".Segment.Tracks[0)")' file.mkv

Get path to element

$ fq 'grep_by(.id == "Tracks") | matroska_path' file.mkv

References

mp3

Options

Name	Default	Description
`max_sync_seek`	32768	Max byte distance to next sync
`max_unique_header_configs`	5	Max number of unique frame header configs allowed

Examples

Decode file using mp3 options

$ fq -d mp3 -o max_sync_seek=32768 -o max_unique_header_configs=5 . file

Decode value as mp3

... | mp3({max_sync_seek:32768,max_unique_header_configs:5})

mp4

Options

Name	Default	Description
`allow_truncated`	false	Allow box to be truncated
`decode_samples`	true	Decode supported media samples

Examples

Decode file using mp4 options

$ fq -d mp4 -o allow_truncated=false -o decode_samples=true . file

Decode value as mp4

... | mp4({allow_truncated:false,decode_samples:true})

Lookup mp4 box using a mp4 box path.

# <decode value box> | mp4_path($path) -> <decode value box>
$ fq 'mp4_path(".moov.trak[1]")' file.mp4

Get mp4 box path for a decode value box.

# <decode value box> | mp4_path -> string
$ fq 'grep_by(.type == "trak") | mp4_path' file.mp4

Force decode a single box

$ fq -n '"AAAAHGVsc3QAAAAAAAAAAQAAADIAAAQAAAEAAA==" | frombase64 | mp4({force:true}) | d'

Speed up decoding by not decoding samples

# manually decode first sample as a aac_frame
$ fq -o decode_samples=false '.tracks[0].samples[0] | aac_frame | d' file.mp4

Entries for first edit list as values

$ fq 'first(grep_by(.type=="elst").entries) | tovalue' file.mp4

References

msgpack

Convert represented value to JSON

$ fq -d msgpack torepr file.msgpack

References

https://github.com/msgpack/msgpack/blob/master/spec.md

pcap

Build object with number of (reassembled) TCP bytes sent to/from client IP

# for a pcapng file you would use .[0].tcp_connections for first section
$ fq '.tcp_connections | group_by(.client.ip) | map({key: .[0].client.ip, value: map(.client.stream, .server.stream | tobytes.size) | add}) | from_entries'
{
  "10.1.0.22": 15116,
  "10.99.12.136": 234,
  "10.99.12.150": 218
}

protobuf

Can decode sub messages

$ fq -d protobuf '.fields[6].wire_value | protobuf | d' file

References

https://developers.google.com/protocol-buffers/docs/encoding

rtmp

Current only supports plain RTMP (not RTMPT or encrypted variants etc) with AMF0 (not AMF3).

Show rtmp streams in PCAP file

fq '.tcp_connections[] | select(.server.port=="rtmp") | d' file.cap

References

tzif

Get last transition time

fq '.v2plusdatablock.transition_times[-1] | tovalue' tziffile

Count leap second records

fq '.v2plusdatablock.leap_second_records | length' tziffile

Authors

Takashi Oguma @bitbears-dev @0xb17bea125

References

https://datatracker.ietf.org/doc/html/rfc8536

wasm

Count opcode usage

$ fq '.sections[] | select(.id == "code_section") | [.. | .opcode? // empty] | count | map({key: .[0], value: .[1]}) | from_entries' file.wasm

List exports and imports

$ fq '.sections | {import: map(select(.id == "import_section").content.im.x[].nm.b), export: map(select(.id == "export_section").content.ex.x[].nm.b)}' file.wasm

Authors

Takashi Oguma @bitbears-dev @0xb17bea125

References

https://webassembly.github.io/spec/core/

xml

Options

Name	Default	Description
`array`	false	Decode as nested arrays
`attribute_prefix`	@	Prefix for attribute keys
`seq`	false	Use seq attribute to preserve element order

Examples

Decode file using xml options

$ fq -d xml -o array=false -o attribute_prefix="@" -o seq=false . file

Decode value as xml

... | xml({array:false,attribute_prefix:"@",seq:false})

XML can be decoded and encoded into jq values in two ways, elements as object or array. Which variant to use depends a bit what you want to do. The object variant might be easier to query for a specific value but array might be easier to use to generate xml or to query after all elements of some kind etc.

Encoding is done using the toxml function and it will figure what variant that is used based on the input value. Is has two optional options indent and attribute_prefix.

Elements as object

Element can have different shapes depending on body text, attributes and children:

<a key="value">text</a> is {"a":{"#text":"text","@key":"value"}}, has text (#text) and attributes (@key)
<a>text</a> is {"a":"text"}
<a>text</a> is {"a":{"b":"text"}} one child with only text and no attributes
<a>text</a> is {"a":{"b":["","text"]}} two children with same name end up in an array
<a>text</a> is {"a":{"b":["",{"#text":"text","@key":"value"}]}}

If there is #seq attribute it encodes the child element order. Use -o seq=true to include sequence number when decoding, otherwise order might be lost.

# decode as object is the default
$ echo '<a><b/><b>bbb</b><c attr="value">ccc</c></a>' | fq -d xml -o seq=true
{
  "a": {
    "b": [
      {
        "#seq": 0
      },
      {
        "#seq": 1,
        "#text": "bbb"
      }
    ],
    "c": {
      "#seq": 2,
      "#text": "ccc",
      "@attr": "value"
    }
  }
}

# access text of the <c> element
$ echo '<a><b/><b>bbb</b><c attr="value">ccc</c></a>' | fq '.a.c["#text"]'
"ccc"

# decode to object and encode to xml
$ echo '<a><b/><b>bbb</b><c attr="value">ccc</c></a>' | fq -r -d xml -o seq=true 'toxml({indent:2})'
<a>
  <b></b>
  <b>bbb</b>
  <c attr="value">ccc</c>
</a>

Elements as array

Elements are arrays of the shape ["#text": "body text", "attr_name", {key: "attr value"}|null, [<child element>, ...]].

# decode as array
$ echo '<a><b/><b>bbb</b><c attr="value">ccc</c></a>' | fq -d xml -o array=true
[
  "a",
  null,
  [
    [
      "b",
      null,
      []
    ],
    [
      "b",
      {
        "#text": "bbb"
      },
      []
    ],
    [
      "c",
      {
        "#text": "ccc",
        "attr": "value"
      },
      []
    ]
  ]
]

# decode to array and encode to xml
$ echo '<a><b/><b>bbb</b><c attr="value">ccc</c></a>' | fq -r -d xml -o array=true -o seq=true 'toxml({indent:2})'
<a>
  <b></b>
  <b>bbb</b>
  <c attr="value">ccc</c>
</a>

# access text of the <c> element, the object variant above is probably easier to use
$ echo '<a><b/><b>bbb</b><c attr="value">ccc</c></a>' | fq -o array=true '.[2][2][1]["#text"]'
"ccc"

References

xml.com's Converting Between XML and JSON

zip

Options

Name	Default	Description
`uncompress`	true	Uncompress and probe files

Examples

Decode file using zip options

$ fq -d zip -o uncompress=true . file

Decode value as zip

... | zip({uncompress:true})

Supports ZIP64.

40 KiB Raw Blame History Unescape Escape

Supported formats

Global format options

Format details

aac_frame

Options

Examples

asn1_ber

Can be used to decode certificates etc

Can decode nested values

Manual schema

References

avc_au

Options

Examples

avi

Options

Examples

Samples

Extract samples for stream 1

Show stream summary

References

avro_ocf

References

Authors

bencode

Convert represented value to JSON

References

bitcoin_block

Options

Examples

bits

Slice and decode bit range

Index bits

bplist

Show full decoding

Timestamps

Get JSON representation

Authors

References

bson

Convert represented value to JSON

Filter represented value

References

bytes

Slice out byte ranges

Slice and decode byte range

Index bytes

cbor

Convert represented value to JSON

References

csv

Options

Examples

TSV to CSV

Convert rows to objects based on header row

flac_frame

Options

Examples

hevc_au

Options

Examples

html

Options

Examples

Element as object

Element as array

macho

Select 64bit load segments

References

Authors

markdown

Array with all level 1 and 2 headers

matroska

Lookup element using path

Get path to element

References

mp3

Options

Examples

40 KiB

Raw Blame History