taggy/README.md

taggy
=====

An attoparsec based html parser. [![Build Status](https://secure.travis-ci.org/alpmestan/taggy.png?branch=master)](http://travis-ci.org/alpmestan/taggy)

Currently very WIP but already supports a fairly decent range of common websites. I haven't managed to find a website with which it chokes, using the current parser. The performance is quite promising.

Using `taggy`
=============

_taggy_ has a `taggyWith` function to work on HTML à la _tagsoup_.

``` haskell
taggyWith :: Bool -> LT.Text -> [Tag]
```

The `Bool` there just lets you specify whether you want to convert the special HTML entities to their corresponding unicode character. `True` means "yes convert them please". This function takes lazy `Text` as input.

Or you can use the raw `run` function, which returns a good old `Result` from _attoparsec_.

``` haskell
run :: Bool -> LT.Text -> AttoLT.Result [Tag]
```

For example, if you want to read the html code from a file, and print one tag per line, you could do:

``` haskell
import Data.Attoparsec.Text.Lazy (eitherResult)
import qualified Data.Text.Lazy.IO as T
import Text.Taggy (run)

taggy :: FilePath -> IO ()
taggy fp = do
  content <- T.readFile fp
  either (\s -> putStrLn $ "couldn't parse: " ++ s) 
         (mapM_ print) 
         (eitherResult $ run True content)
```

But _taggy_ also started providing support for DOM-syle documents. This is computed from the list of tags gained by using `taggyWith`.

If you fire up ghci with _taggy_ loaded:

``` bash
$ cabal repl # if working with a copy of this repo
```

You can see this `domify` in action.

``` haskell
λ> :set -XOverloadedStrings
λ> head . domify . taggyWith False $ "<html><head></head><body>yo</body></html>"
NodeElement (Element {eltName = "html", eltAttrs = fromList [], eltChildren = [NodeElement (Element {eltName = "head", eltAttrs = fromList [], eltChildren = []}),NodeElement (Element {eltName = "body", eltAttrs = fromList [], eltChildren = [NodeContent "yo"]})]})
```

Note that the `Text.Taggy.DOM` module contains a function
that composes `domify` and `taggyWith` for you: `parseDOM`.

Lenses for taggy
================

We (well, mostly Vikram Virma to be honest) have
put up a companion [taggy-lens](http://github.com/alpmestan/taggy-lens)
library.

Haddocks
========

I try to keep an up-to-date copy of the docs on my server:

- [taggy](https://hackage.haskell.org/package/taggy)
- [taggy-lens](https://hackage.haskell.org/package/taggy-lens)
Create README.md 2014-06-02 23:25:35 +04:00			`taggy`
			`=====`

readme update 2014-06-17 19:35:20 +04:00			`An attoparsec based html parser. [![Build Status](https://secure.travis-ci.org/alpmestan/taggy.png?branch=master)](http://travis-ci.org/alpmestan/taggy)`
Create README.md 2014-06-02 23:25:35 +04:00
readme update 2014-06-17 11:27:40 +04:00			`Currently very WIP but already supports a fairly decent range of common websites. I haven't managed to find a website with which it chokes, using the current parser. The performance is quite promising.`
readme update 2014-06-12 16:13:14 +04:00
			Using `taggy`
			`=============`

readme update 2014-06-16 18:43:36 +04:00			_taggy_ has a `taggyWith` function to work on HTML à la _tagsoup_.
readme update 2014-06-12 16:13:14 +04:00
			``` haskell
readme update 2014-06-16 18:43:36 +04:00			`taggyWith :: Bool -> LT.Text -> [Tag]`
readme update 2014-06-12 16:13:14 +04:00			```

readme update 2014-06-17 11:27:40 +04:00			The `Bool` there just lets you specify whether you want to convert the special HTML entities to their corresponding unicode character. `True` means "yes convert them please". This function takes lazy `Text` as input.
readme update 2014-06-16 18:43:36 +04:00
readme update 2014-06-12 16:13:14 +04:00			Or you can use the raw `run` function, which returns a good old `Result` from _attoparsec_.

			``` haskell
readme update 2014-06-16 18:43:36 +04:00			`run :: Bool -> LT.Text -> AttoLT.Result [Tag]`
readme update 2014-06-12 16:13:14 +04:00			```

			`For example, if you want to read the html code from a file, and print one tag per line, you could do:`

			``` haskell
			`import Data.Attoparsec.Text.Lazy (eitherResult)`
			`import qualified Data.Text.Lazy.IO as T`
			`import Text.Taggy (run)`

			`taggy :: FilePath -> IO ()`
			`taggy fp = do`
			`content <- T.readFile fp`
			`either (\s -> putStrLn $ "couldn't parse: " ++ s)`
			`(mapM_ print)`
readme update 2014-06-16 18:43:36 +04:00			`(eitherResult $ run True content)`
readme update 2014-06-12 16:13:14 +04:00			```

readme update 2014-06-17 11:27:40 +04:00			But _taggy_ also started providing support for DOM-syle documents. This is computed from the list of tags gained by using `taggyWith`.
readme update 2014-06-12 16:13:14 +04:00
			`If you fire up ghci with _taggy_ loaded:`

			``` bash
			`$ cabal repl # if working with a copy of this repo`
			```

			You can see this `domify` in action.

			``` haskell
			`λ> :set -XOverloadedStrings`
Fix `domify` example in README. 2014-06-23 10:13:35 +04:00			`λ> head . domify . taggyWith False $ "<html><head></head><body>yo</body></html>"`
readme update 2014-06-12 16:13:14 +04:00			`NodeElement (Element {eltName = "html", eltAttrs = fromList [], eltChildren = [NodeElement (Element {eltName = "head", eltAttrs = fromList [], eltChildren = []}),NodeElement (Element {eltName = "body", eltAttrs = fromList [], eltChildren = [NodeContent "yo"]})]})`
			```

readme update 2014-07-01 14:25:39 +04:00			Note that the `Text.Taggy.DOM` module contains a function
			that composes `domify` and `taggyWith` for you: `parseDOM`.

			`Lenses for taggy`
			`================`

			`We (well, mostly Vikram Virma to be honest) have`
			`put up a companion [taggy-lens](http://github.com/alpmestan/taggy-lens)`
			`library.`

			`Haddocks`
			`========`

			`I try to keep an up-to-date copy of the docs on my server:`

Fixed README documentation references. Prompted by <https://github.com/alpmestan/taggy-lens/issues/2>. 2015-03-12 12:44:08 +03:00			`- [taggy](https://hackage.haskell.org/package/taggy)`
			`- [taggy-lens](https://hackage.haskell.org/package/taggy-lens)`
readme update 2014-07-01 14:25:39 +04:00