1
1
mirror of https://github.com/wader/fq.git synced 2024-12-23 13:22:58 +03:00
fq/doc/formats.md

40 KiB
Raw Blame History

Supported formats

Name Description Dependencies
aac_frame Advanced Audio Coding frame
adts Audio Data Transport Stream adts_frame
adts_frame Audio Data Transport Stream frame aac_frame
amf0 Action Message Format 0
apev2 APEv2 metadata tag image
ar Unix archive probe
asn1_ber ASN1 BER (basic encoding rules, also CER and DER)
av1_ccr AV1 Codec Configuration Record
av1_frame AV1 frame av1_obu
av1_obu AV1 Open Bitstream Unit
avc_annexb H.264/AVC Annex B avc_nalu
avc_au H.264/AVC Access Unit avc_nalu
avc_dcr H.264/AVC Decoder Configuration Record avc_nalu
avc_nalu H.264/AVC Network Access Layer Unit avc_sps avc_pps avc_sei
avc_pps H.264/AVC Picture Parameter Set
avc_sei H.264/AVC Supplemental Enhancement Information
avc_sps H.264/AVC Sequence Parameter Set
avi Audio Video Interleaved avc_au hevc_au mp3_frame flac_frame
avro_ocf Avro object container file
bencode BitTorrent bencoding
bitcoin_blkdat Bitcoin blk.dat bitcoin_block
bitcoin_block Bitcoin block bitcoin_transaction
bitcoin_script Bitcoin script
bitcoin_transaction Bitcoin transaction bitcoin_script
bits Raw bits
bplist Apple Binary Property List
bsd_loopback_frame BSD loopback frame inet_packet
bson Binary JSON
bytes Raw bytes
bzip2 bzip2 compression probe
cbor Concise Binary Object Representation
csv Comma separated values
dns DNS packet
dns_tcp DNS packet (TCP)
elf Executable and Linkable Format
ether8023_frame Ethernet 802.3 frame inet_packet
exif Exchangeable Image File Format
fairplay_spc FairPlay Server Playback Context
flac Free Lossless Audio Codec file flac_metadatablocks flac_frame
flac_frame FLAC frame
flac_metadatablock FLAC metadatablock flac_streaminfo flac_picture vorbis_comment
flac_metadatablocks FLAC metadatablocks flac_metadatablock
flac_picture FLAC metadatablock picture image
flac_streaminfo FLAC streaminfo
gif Graphics Interchange Format
gzip gzip compression probe
hevc_annexb H.265/HEVC Annex B hevc_nalu
hevc_au H.265/HEVC Access Unit hevc_nalu
hevc_dcr H.265/HEVC Decoder Configuration Record hevc_nalu
hevc_nalu H.265/HEVC Network Access Layer Unit hevc_vps hevc_pps hevc_sps
hevc_pps H.265/HEVC Picture Parameter Set
hevc_sps H.265/HEVC Sequence Parameter Set
hevc_vps H.265/HEVC Video Parameter Set
html HyperText Markup Language
icc_profile International Color Consortium profile
icmp Internet Control Message Protocol
icmpv6 Internet Control Message Protocol v6
id3v1 ID3v1 metadata
id3v11 ID3v1.1 metadata
id3v2 ID3v2 metadata image
ipv4_packet Internet protocol v4 packet ip_packet
ipv6_packet Internet protocol v6 packet ip_packet
jpeg Joint Photographic Experts Group file exif icc_profile
json JavaScript Object Notation
jsonl JavaScript Object Notation Lines
macho Mach-O macOS executable
macho_fat Fat Mach-O macOS executable (multi-architecture) macho
markdown Markdown
matroska Matroska file aac_frame av1_ccr av1_frame avc_au avc_dcr flac_frame flac_metadatablocks hevc_au hevc_dcr image mp3_frame mpeg_asc mpeg_pes_packet mpeg_spu opus_packet vorbis_packet vp8_frame vp9_cfm vp9_frame
mp3 MP3 file id3v2 id3v1 id3v11 apev2 mp3_frame
mp3_frame MPEG audio layer 3 frame mp3_frame_tags
mp3_frame_tags MP3 frame info/xing tags
mp4 ISOBMFF, QuickTime and similar aac_frame av1_ccr av1_frame avc_au avc_dcr flac_frame flac_metadatablocks hevc_au hevc_dcr icc_profile id3v2 image jpeg mp3_frame mpeg_es mpeg_pes_packet opus_packet png prores_frame protobuf_widevine pssh_playready vorbis_packet vp9_frame vpx_ccr
mpeg_asc MPEG-4 Audio Specific Config
mpeg_es MPEG Elementary Stream mpeg_asc vorbis_packet
mpeg_pes MPEG Packetized elementary stream mpeg_pes_packet mpeg_spu
mpeg_pes_packet MPEG Packetized elementary stream packet
mpeg_spu Sub Picture Unit (DVD subtitle)
mpeg_ts MPEG Transport Stream
msgpack MessagePack
ogg OGG file ogg_page vorbis_packet opus_packet flac_metadatablock flac_frame
ogg_page OGG page
opus_packet Opus packet vorbis_comment
pcap PCAP packet capture link_frame tcp_stream ipv4_packet
pcapng PCAPNG packet capture link_frame tcp_stream ipv4_packet
png Portable Network Graphics file icc_profile exif
prores_frame Apple ProRes frame
protobuf Protobuf
protobuf_widevine Widevine protobuf protobuf
pssh_playready PlayReady PSSH
rtmp Real-Time Messaging Protocol amf0 mpeg_asc
sll2_packet Linux cooked capture encapsulation v2 inet_packet
sll_packet Linux cooked capture encapsulation inet_packet
tar Tar archive probe
tcp_segment Transmission control protocol segment
tiff Tag Image File Format icc_profile
toml Tom's Obvious, Minimal Language
tzif Time Zone Information Format
udp_datagram User datagram protocol udp_payload
vorbis_comment Vorbis comment flac_picture
vorbis_packet Vorbis packet vorbis_comment
vp8_frame VP8 frame
vp9_cfm VP9 Codec Feature Metadata
vp9_frame VP9 frame
vpx_ccr VPX Codec Configuration Record
wasm WebAssembly Binary Format
wav WAV file id3v2 id3v1 id3v11
webp WebP image vp8_frame
xml Extensible Markup Language
yaml YAML Ain't Markup Language
zip ZIP archive probe
image Group gif jpeg mp4 png tiff webp
inet_packet Group ipv4_packet ipv6_packet
ip_packet Group icmp icmpv6 tcp_segment udp_datagram
link_frame Group bsd_loopback_frame ether8023_frame sll2_packet sll_packet
probe Group adts ar avi avro_ocf bitcoin_blkdat bplist bzip2 elf flac gif gzip jpeg json jsonl macho macho_fat matroska mp3 mp4 mpeg_ts ogg pcap pcapng png tar tiff toml tzif wasm wav webp xml yaml zip
tcp_stream Group dns_tcp rtmp
udp_payload Group dns

Global format options

Currently the only global option is force and is used to ignore some format assertion errors. It can be used as a decode option or as a CLI -o option:

fq -d mp4 -o force=true file.mp4
fq -d bytes 'mp4({force: true})' file.mp4

Format details

aac_frame

Options

Name Default Description
object_type 1 Audio object type

Examples

Decode file using aac_frame options

$ fq -d aac_frame -o object_type=1 . file

Decode value as aac_frame

... | aac_frame({object_type:1})

asn1_ber

Supports decoding BER, CER and DER (X.690).

  • Currently no extra validation is done for CER and DER.
  • Does not support specifying a schema.
  • Supports torepr but without schema all sequences and sets will be arrays.

Can be used to decode certificates etc

$ fq -d bytes 'frompem | asn1_ber | d' cert.pem

Can decode nested values

$ fq -d asn1_ber '.constructed[1].value | asn1_ber' file.ber

Manual schema

$ fq -d asn1_ber 'torepr as $r | ["version", "modulus", "private_exponent", "private_exponen", "prime1", "prime2", "exponent1", "exponent2", "coefficient"] | with_entries({key: .value, value: $r[.key]})' pkcs1.der

References

avc_au

Options

Name Default Description
length_size 0 Length value size

Examples

Decode file using avc_au options

$ fq -d avc_au -o length_size=0 . file

Decode value as avc_au

... | avc_au({length_size:0})

avi

Options

Name Default Description
decode_samples true Decode supported media samples

Examples

Decode file using avi options

$ fq -d avi -o decode_samples=true . file

Decode value as avi

... | avi({decode_samples:true})

Samples

AVI has many redundant ways to index samples so currently .streams[].samples will only include samples the most "modern" way used in the file. That is in order of stream super index, movi ix index then idx1 index.

Extract samples for stream 1

$ fq '.streams[1].samples[] | tobytes' file.avi > stream01.mp3

Show stream summary

$ fq -o decode_samples=false '[.chunks[0] | grep_by(.id=="LIST" and .type=="strl") | grep_by(.id=="strh") as {$type} | grep_by(.id=="strf") as {$format_tag, $compression} | {$type,$format_tag,$compression}]' *.avi

References

avro_ocf

Supports reading Avro Object Container Format (OCF) files based on the 1.11.0 specification.

Capable of handling null, deflate, and snappy codecs for data compression.

Limitations:

  • Schema does not support self-referential types, only built-in types.
  • Decimal logical types are not supported for decoding, will just be treated as their primitive type

References

Authors

bencode

Convert represented value to JSON

$ fq -d bencode torepr file.torrent

References

bitcoin_block

Options

Name Default Description
has_header false Has blkdat header

Examples

Decode file using bitcoin_block options

$ fq -d bitcoin_block -o has_header=false . file

Decode value as bitcoin_block

... | bitcoin_block({has_header:false})

bits

Decode to a slice and indexable binary of bits.

Slice and decode bit range

$ echo 'some {"a":1} json' | fq -d bits '.[40:-48] | fromjson'
{
  "a": 1
}

Index bits

echo 'hello' | fq -d bits '.[4]'
1
$ echo 'hello' | fq -c -d bits '[.[range(8)]]'
[0,1,1,0,1,0,0,0]

bplist

Show full decoding

$ fq d Info.plist

Timestamps

Timestamps in Apple Binary Property Lists are encoded as Cocoa Core Data timestamps, where the raw value is the floating point number of seconds since January 1, 2001. By default, fq will render the raw floating point value. In order to get the raw value or string description, use the todescription function, you can use the tovalue and todescription functions:

$ fq 'torepr.SomeTimeStamp | tovalue' Info.plist
685135328

$ fq 'torepr.SomeTimeStamp | todescription' Info.plist
"2022-09-17T19:22:08Z"

Get JSON representation

$ fq torepr com.apple.UIAutomation.plist
{
  "UIAutomationEnabled": true
}

Authors

References

bson

Convert represented value to JSON

$ fq -d bson torepr file.bson

Filter represented value

$ fq -d bson 'torepr | select(.name=="bob")' file.bson

References

bytes

Decode to a slice and indexable binary of bytes.

Slice out byte ranges

$ echo -n 'hello' | fq -d bytes '.[-3:]' > last_3_bytes
$ echo -n 'hello' | fq -d bytes '[.[-2:], .[0:2]] | tobytes' > first_last_2_bytes_swapped

Slice and decode byte range

$ echo 'some {"a":1} json' | fq -d bytes '.[5:-6] | fromjson'
{
  "a": 1
}

Index bytes

$ echo 'hello' | fq -d bytes '.[1]'
101

cbor

Convert represented value to JSON

$ fq -d cbor torepr file.cbor

References

csv

Options

Name Default Description
comma , Separator character
comment # Comment line character

Examples

Decode file using csv options

$ fq -d csv -o comma="," -o comment="#" . file

Decode value as csv

... | csv({comma:",",comment:"#"})

TSV to CSV

$ fq -d csv -o comma="\t" tocsv file.tsv

Convert rows to objects based on header row

$ fq -d csv '.[0] as $t | .[1:] | map(with_entries(.key = $t[.key]))' file.csv

flac_frame

Options

Name Default Description
bits_per_sample 16 Bits per sample

Examples

Decode file using flac_frame options

$ fq -d flac_frame -o bits_per_sample=16 . file

Decode value as flac_frame

... | flac_frame({bits_per_sample:16})

hevc_au

Options

Name Default Description
length_size 4 Length value size

Examples

Decode file using hevc_au options

$ fq -d hevc_au -o length_size=4 . file

Decode value as hevc_au

... | hevc_au({length_size:4})

html

Options

Name Default Description
array false Decode as nested arrays
attribute_prefix @ Prefix for attribute keys
seq false Use seq attribute to preserve element order

Examples

Decode file using html options

$ fq -d html -o array=false -o attribute_prefix="@" -o seq=false . file

Decode value as html

... | html({array:false,attribute_prefix:"@",seq:false})

HTML is decoded in HTML5 mode and will always include <html>, <body> and <head> element.

See xml format for more examples and how to preserve element order and how to encode to xml.

There is no tohtml function, see toxml instead.

Element as object

# decode as object is the default
$ echo '<a href="url">text</a>' | fq -d html
{
  "html": {
    "body": {
      "a": {
        "#text": "text",
        "@href": "url"
      }
    },
    "head": ""
  }
}

Element as array

$ '<a href="url">text</a>' | fq -d html -o array=true
[
  "html",
  null,
  [
    [
      "head",
      null,
      []
    ],
    [
      "body",
      null,
      [
        [
          "a",
          {
            "#text": "text",
            "href": "url"
          },
          []
        ]
      ]
    ]
  ]
]

# decode html files to a {file: "title", ...} object
$ fq -n -d html '[inputs | {key: input_filename, value: .html.head.title?}] | from_entries' *.html

# <a> href:s in file
$ fq -r -o array=true -d html '.. | select(.[0] == "a" and .[1].href)?.[1].href' file.html

macho

Supports decoding vanilla and FAT Mach-O binaries.

Select 64bit load segments

$ fq '.load_commands[] | select(.cmd=="segment_64")' file

References

Authors

markdown

Array with all level 1 and 2 headers

$ fq -d markdown '[.. | select(.type=="heading" and .level<=2)?.children[0]]' file.md

matroska

Lookup element using path

$ fq 'matroska_path(".Segment.Tracks[0)")' file.mkv

Get path to element

$ fq 'grep_by(.id == "Tracks") | matroska_path' file.mkv

References

mp3

Options

Name Default Description
max_sync_seek 32768 Max byte distance to next sync
max_unique_header_configs 5 Max number of unique frame header configs allowed

Examples

Decode file using mp3 options

$ fq -d mp3 -o max_sync_seek=32768 -o max_unique_header_configs=5 . file

Decode value as mp3

... | mp3({max_sync_seek:32768,max_unique_header_configs:5})

mp4

Options

Name Default Description
allow_truncated false Allow box to be truncated
decode_samples true Decode supported media samples

Examples

Decode file using mp4 options

$ fq -d mp4 -o allow_truncated=false -o decode_samples=true . file

Decode value as mp4

... | mp4({allow_truncated:false,decode_samples:true})

Lookup mp4 box using a mp4 box path.

# <decode value box> | mp4_path($path) -> <decode value box>
$ fq 'mp4_path(".moov.trak[1]")' file.mp4

Get mp4 box path for a decode value box.

# <decode value box> | mp4_path -> string
$ fq 'grep_by(.type == "trak") | mp4_path' file.mp4

Force decode a single box

$ fq -n '"AAAAHGVsc3QAAAAAAAAAAQAAADIAAAQAAAEAAA==" | frombase64 | mp4({force:true}) | d'

Speed up decoding by not decoding samples

# manually decode first sample as a aac_frame
$ fq -o decode_samples=false '.tracks[0].samples[0] | aac_frame | d' file.mp4

Entries for first edit list as values

$ fq 'first(grep_by(.type=="elst").entries) | tovalue' file.mp4

References

msgpack

Convert represented value to JSON

$ fq -d msgpack torepr file.msgpack

References

pcap

Build object with number of (reassembled) TCP bytes sent to/from client IP

# for a pcapng file you would use .[0].tcp_connections for first section
$ fq '.tcp_connections | group_by(.client.ip) | map({key: .[0].client.ip, value: map(.client.stream, .server.stream | tobytes.size) | add}) | from_entries'
{
  "10.1.0.22": 15116,
  "10.99.12.136": 234,
  "10.99.12.150": 218
}

protobuf

Can decode sub messages

$ fq -d protobuf '.fields[6].wire_value | protobuf | d' file

References

rtmp

Current only supports plain RTMP (not RTMPT or encrypted variants etc) with AMF0 (not AMF3).

Show rtmp streams in PCAP file

fq '.tcp_connections[] | select(.server.port=="rtmp") | d' file.cap

References

tzif

Get last transition time

fq '.v2plusdatablock.transition_times[-1] | tovalue' tziffile

Count leap second records

fq '.v2plusdatablock.leap_second_records | length' tziffile

Authors

References

wasm

Count opcode usage

$ fq '.sections[] | select(.id == "code_section") | [.. | .opcode? // empty] | count | map({key: .[0], value: .[1]}) | from_entries' file.wasm

List exports and imports

$ fq '.sections | {import: map(select(.id == "import_section").content.im.x[].nm.b), export: map(select(.id == "export_section").content.ex.x[].nm.b)}' file.wasm

Authors

References

xml

Options

Name Default Description
array false Decode as nested arrays
attribute_prefix @ Prefix for attribute keys
seq false Use seq attribute to preserve element order

Examples

Decode file using xml options

$ fq -d xml -o array=false -o attribute_prefix="@" -o seq=false . file

Decode value as xml

... | xml({array:false,attribute_prefix:"@",seq:false})

XML can be decoded and encoded into jq values in two ways, elements as object or array. Which variant to use depends a bit what you want to do. The object variant might be easier to query for a specific value but array might be easier to use to generate xml or to query after all elements of some kind etc.

Encoding is done using the toxml function and it will figure what variant that is used based on the input value. Is has two optional options indent and attribute_prefix.

Elements as object

Element can have different shapes depending on body text, attributes and children:

  • <a key="value">text</a> is {"a":{"#text":"text","@key":"value"}}, has text (#text) and attributes (@key)
  • <a>text</a> is {"a":"text"}
  • <a><b>text</b></a> is {"a":{"b":"text"}} one child with only text and no attributes
  • <a><b/><b>text</b></a> is {"a":{"b":["","text"]}} two children with same name end up in an array
  • <a><b/><b key="value">text</b></a> is {"a":{"b":["",{"#text":"text","@key":"value"}]}}

If there is #seq attribute it encodes the child element order. Use -o seq=true to include sequence number when decoding, otherwise order might be lost.

# decode as object is the default
$ echo '<a><b/><b>bbb</b><c attr="value">ccc</c></a>' | fq -d xml -o seq=true
{
  "a": {
    "b": [
      {
        "#seq": 0
      },
      {
        "#seq": 1,
        "#text": "bbb"
      }
    ],
    "c": {
      "#seq": 2,
      "#text": "ccc",
      "@attr": "value"
    }
  }
}

# access text of the <c> element
$ echo '<a><b/><b>bbb</b><c attr="value">ccc</c></a>' | fq '.a.c["#text"]'
"ccc"

# decode to object and encode to xml
$ echo '<a><b/><b>bbb</b><c attr="value">ccc</c></a>' | fq -r -d xml -o seq=true 'toxml({indent:2})'
<a>
  <b></b>
  <b>bbb</b>
  <c attr="value">ccc</c>
</a>

Elements as array

Elements are arrays of the shape ["#text": "body text", "attr_name", {key: "attr value"}|null, [<child element>, ...]].

# decode as array
$ echo '<a><b/><b>bbb</b><c attr="value">ccc</c></a>' | fq -d xml -o array=true
[
  "a",
  null,
  [
    [
      "b",
      null,
      []
    ],
    [
      "b",
      {
        "#text": "bbb"
      },
      []
    ],
    [
      "c",
      {
        "#text": "ccc",
        "attr": "value"
      },
      []
    ]
  ]
]

# decode to array and encode to xml
$ echo '<a><b/><b>bbb</b><c attr="value">ccc</c></a>' | fq -r -d xml -o array=true -o seq=true 'toxml({indent:2})'
<a>
  <b></b>
  <b>bbb</b>
  <c attr="value">ccc</c>
</a>

# access text of the <c> element, the object variant above is probably easier to use
$ echo '<a><b/><b>bbb</b><c attr="value">ccc</c></a>' | fq -o array=true '.[2][2][1]["#text"]'
"ccc"

References

zip

Options

Name Default Description
uncompress true Uncompress and probe files

Examples

Decode file using zip options

$ fq -d zip -o uncompress=true . file

Decode value as zip

... | zip({uncompress:true})

Supports ZIP64.

References

Dependency graph

alt text