1
1
mirror of https://github.com/wader/fq.git synced 2024-11-27 06:04:47 +03:00
fq/format/xml/html.md
Mattias Wadman e3ae1440c9 interp: Rename to/from<format> functions to to_/from_<format>
Feels less cluttered, easier to read and more consistent.

Still keep tovalue, tobytes etc that are more basic functions this
only renamed format related functions.
Also there is an exceptin for to/fromjson as it comes from jq.

Also fixes lots of spelling errors while reading thru.
2022-12-21 17:48:39 +01:00

1.1 KiB

HTML is decoded in HTML5 mode and will always include <html>, <body> and <head> element.

See xml format for more examples and how to preserve element order and how to encode to xml.

There is no to_html function, see to_xml instead.

Element as object

# decode as object is the default
$ echo '<a href="url">text</a>' | fq -d html
{
  "html": {
    "body": {
      "a": {
        "#text": "text",
        "@href": "url"
      }
    },
    "head": ""
  }
}

Element as array

$ '<a href="url">text</a>' | fq -d html -o array=true
[
  "html",
  null,
  [
    [
      "head",
      null,
      []
    ],
    [
      "body",
      null,
      [
        [
          "a",
          {
            "#text": "text",
            "href": "url"
          },
          []
        ]
      ]
    ]
  ]
]

# decode html files to a {file: "title", ...} object
$ fq -n -d html '[inputs | {key: input_filename, value: .html.head.title?}] | from_entries' *.html

# <a> href:s in file
$ fq -r -o array=true -d html '.. | select(.[0] == "a" and .[1].href)?.[1].href' file.html