1
1
mirror of https://github.com/wader/fq.git synced 2024-09-19 07:47:14 +03:00
fq/format/xml/xml.md
Mattias Wadman e3ae1440c9 interp: Rename to/from<format> functions to to_/from_<format>
Feels less cluttered, easier to read and more consistent.

Still keep tovalue, tobytes etc that are more basic functions this
only renamed format related functions.
Also there is an exceptin for to/fromjson as it comes from jq.

Also fixes lots of spelling errors while reading thru.
2022-12-21 17:48:39 +01:00

2.8 KiB

XML can be decoded and encoded into jq values in two ways, elements as object or array. Which variant to use depends a bit what you want to do. The object variant might be easier to query for a specific value but array might be easier to use to generate xml or to query after all elements of some kind etc.

Encoding is done using the to_xml function and it will figure what variant that is used based on the input value. Is has two optional options indent and attribute_prefix.

Elements as object

Element can have different shapes depending on body text, attributes and children:

  • <a key="value">text</a> is {"a":{"#text":"text","@key":"value"}}, has text (#text) and attributes (@key)
  • <a>text</a> is {"a":"text"}
  • <a><b>text</b></a> is {"a":{"b":"text"}} one child with only text and no attributes
  • <a><b/><b>text</b></a> is {"a":{"b":["","text"]}} two children with same name end up in an array
  • <a><b/><b key="value">text</b></a> is {"a":{"b":["",{"#text":"text","@key":"value"}]}}

If there is #seq attribute it encodes the child element order. Use -o seq=true to include sequence number when decoding, otherwise order might be lost.

# decode as object is the default
$ echo '<a><b/><b>bbb</b><c attr="value">ccc</c></a>' | fq -d xml -o seq=true
{
  "a": {
    "b": [
      {
        "#seq": 0
      },
      {
        "#seq": 1,
        "#text": "bbb"
      }
    ],
    "c": {
      "#seq": 2,
      "#text": "ccc",
      "@attr": "value"
    }
  }
}

# access text of the <c> element
$ echo '<a><b/><b>bbb</b><c attr="value">ccc</c></a>' | fq '.a.c["#text"]'
"ccc"

# decode to object and encode to xml
$ echo '<a><b/><b>bbb</b><c attr="value">ccc</c></a>' | fq -r -d xml -o seq=true 'to_xml({indent:2})'
<a>
  <b></b>
  <b>bbb</b>
  <c attr="value">ccc</c>
</a>

Elements as array

Elements are arrays of the shape ["#text": "body text", "attr_name", {key: "attr value"}|null, [<child element>, ...]].

# decode as array
$ echo '<a><b/><b>bbb</b><c attr="value">ccc</c></a>' | fq -d xml -o array=true
[
  "a",
  null,
  [
    [
      "b",
      null,
      []
    ],
    [
      "b",
      {
        "#text": "bbb"
      },
      []
    ],
    [
      "c",
      {
        "#text": "ccc",
        "attr": "value"
      },
      []
    ]
  ]
]

# decode to array and encode to xml
$ echo '<a><b/><b>bbb</b><c attr="value">ccc</c></a>' | fq -r -d xml -o array=true -o seq=true 'to_xml({indent:2})'
<a>
  <b></b>
  <b>bbb</b>
  <c attr="value">ccc</c>
</a>

# access text of the <c> element, the object variant above is probably easier to use
$ echo '<a><b/><b>bbb</b><c attr="value">ccc</c></a>' | fq -o array=true '.[2][2][1]["#text"]'
"ccc"

References