2017-11-08 20:50:26 +03:00
![Duckling Logo ](https://github.com/facebook/duckling/raw/master/logo.png )
2017-03-15 17:46:15 +03:00
2017-11-08 20:50:26 +03:00
# Duckling [![Build Status](https://travis-ci.org/facebook/duckling.svg?branch=master)](https://travis-ci.org/facebook/duckling)
2017-03-11 01:55:35 +03:00
Duckling is a Haskell library that parses text into structured data.
2017-03-14 21:13:35 +03:00
```
"the first Tuesday of October"
=> {"value":"2017-10-03T00:00:00.000-07:00","grain":"day"}
```
2018-03-06 15:47:22 +03:00
## Requirements
2017-03-11 01:55:35 +03:00
A Haskell environment is required. We recommend using
[stack ](https://haskell-lang.org/get-started ).
2017-05-04 03:52:07 +03:00
On macOS you'll need to install PCRE development headers.
The easiest way to do that is with [Homebrew ](https://brew.sh/ ):
```
brew install pcre
```
2017-05-10 00:16:05 +03:00
If that doesn't help, try running `brew doctor` and fix
the issues it finds.
2017-05-04 03:52:07 +03:00
2017-03-11 01:55:35 +03:00
## Quickstart
To compile and run the binary:
```
$ stack build
2017-03-28 01:49:18 +03:00
$ stack exec duckling-example-exe
2017-03-11 01:55:35 +03:00
```
The first time you run it, it will download all required packages.
2017-05-01 18:58:22 +03:00
This runs a basic HTTP server. Example request:
```
Locales support
Summary:
* Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33)
* `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations
* API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility.
* `Rules/<Lang>.hs` exposes
- `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs`
- `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs`
- `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility
* Corpus, tests & classifiers
- 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible)
- Default classifiers are built on existing corpus
- Locale classifiers are built on
- `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>`
- `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`).
- Locale classifiers use the language corpus extended with the locale examples as training set.
- Locale examples need to use the same `Context` (i.e. reference time) as the language corpus.
- For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`.
- Tests run against each classifier to make sure runtime works as expected.
* MM/DD (en_US) vs DD/MM (en_GB) example to illustrate
Reviewed By: JonCoens, blandinw
Differential Revision: D6038096
fbshipit-source-id: f29c28d
2017-10-13 18:15:32 +03:00
$ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_GB& text=tomorrow at eight'
2017-05-01 18:58:22 +03:00
```
2017-03-28 01:49:18 +03:00
See `exe/ExampleMain.hs` for an example on how to integrate Duckling in your
2017-03-11 01:55:35 +03:00
project.
2017-11-13 20:58:29 +03:00
If your backend doesn't run Haskell or if you don't want to spin your own Duckling server, you can directly use [wit.ai ](https://wit.ai )'s built-in entities.
2017-03-11 01:55:35 +03:00
2017-03-14 21:13:35 +03:00
## Supported dimensions
Duckling supports many languages, but most don't support all dimensions yet
2017-11-13 20:59:03 +03:00
(**we need your help!**).
Please look into [this directory ](https://github.com/facebook/duckling/blob/master/Duckling/Dimensions ) for language-specific support.
2017-03-14 21:13:35 +03:00
| Dimension | Example input | Example value output
| --------- | ------------- | --------------------
2017-05-01 18:58:22 +03:00
| `AmountOfMoney` | "42€" | `{"value":42,"type":"value","unit":"EUR"}`
2017-03-14 21:13:35 +03:00
| `Distance` | "6 miles" | `{"value":6,"type":"value","unit":"mile"}`
| `Duration` | "3 mins" | `{"value":3,"minute":3,"unit":"minute","normalized":{"value":180,"unit":"second"}}`
2017-03-31 01:57:01 +03:00
| `Email` | "duckling-team@fb.com" | `{"value":"duckling-team@fb.com"}`
2017-03-14 23:19:13 +03:00
| `Numeral` | "eighty eight" | `{"value":88,"type":"value"}`
2017-03-14 21:13:35 +03:00
| `Ordinal` | "33rd" | `{"value":33,"type":"value"}`
| `PhoneNumber` | "+1 (650) 123-4567" | `{"value":"(+1) 6501234567"}`
| `Quantity` | "3 cups of sugar" | `{"value":3,"type":"value","product":"sugar","unit":"cup"}`
| `Temperature` | "80F" | `{"value":80,"type":"value","unit":"fahrenheit"}`
| `Time` | "today at 9am" | `{"values":[{"value":"2016-12-14T09:00:00.000-08:00","grain":"hour","type":"value"}],"value":"2016-12-14T09:00:00.000-08:00","grain":"hour","type":"value"}`
| `Url` | "https://api.wit.ai/message?q=hi" | `{"value":"https://api.wit.ai/message?q=hi","domain":"api.wit.ai"}`
| `Volume` | "4 gallons" | `{"value":4,"type":"value","unit":"gallon"}`
## Extending Duckling
To regenerate the classifiers and run the test suite:
```
2017-03-28 17:26:21 +03:00
$ stack build :duckling-regen-exe & & stack exec duckling-regen-exe & & stack test
2017-03-14 21:13:35 +03:00
```
It's important to regenerate the classifiers after updating the code and before
running the test suite.
To extend Duckling's support for a dimension in a given language, typically 2
files need to be updated:
* `Duckling/<dimension>/<language>/Rules.hs`
* `Duckling/<dimension>/<language>/Corpus.hs`
2017-08-26 03:21:37 +03:00
To add a new language:
* Make sure that the language code used follows the [ISO-639-1 standard ](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes ).
* The first dimension to implement is `Numeral` .
2017-11-23 03:46:47 +03:00
* Follow [this example ](https://github.com/facebook/duckling/commit/24d3f199768be970149412c95b1c1bf5d76f8240 ).
To add a new locale:
* There should be a need for diverging rules between the locale and the language.
* Make sure that the locale code is a valid [ISO3166 alpha2 country code ](https://www.iso.org/obp/ui/#search/code/ ).
* Follow [this example ](https://github.com/facebook/duckling/commit/1ab5f447d2635fe6d48887a501d333a52adff5b9 ).
2017-08-26 03:21:37 +03:00
2017-03-14 21:13:35 +03:00
Rules have a name, a pattern and a production.
Patterns are used to perform character-level matching (regexes on input) and
concept-level matching (predicates on tokens).
Productions are arbitrary functions that take a list of tokens and return a new
token.
The corpus (resp. negative corpus) is a list of examples that should (resp.
shouldn't) parse. The reference time for the corpus is Tuesday Feb 12, 2013 at
4:30am.
`Duckling.Debug` provides a few debugging tools:
2017-03-11 01:55:35 +03:00
```
2017-05-11 17:15:29 +03:00
$ stack repl --no-load
2017-03-14 21:13:35 +03:00
> :l Duckling.Debug
2017-10-16 20:46:13 +03:00
> debug (makeLocale EN $ Just US) "in two minutes" [This Time]
2017-03-14 21:13:35 +03:00
in|within|after < duration > (in two minutes)
-- regex (in)
-- < integer > < unit-of-duration > (two minutes)
-- -- integer (0..19) (two)
-- -- -- regex (two)
-- -- minute (grain) (minutes)
-- -- -- regex (minutes)
[Entity {dim = "time", body = "in two minutes", value = "{\"values\":[{\"value\":\"2013-02-12T04:32:00.000-02:00\",\"grain\":\"second\",\"type\":\"value\"}],\"value\":\"2013-02-12T04:32:00.000-02:00\",\"grain\":\"second\",\"type\":\"value\"}", start = 0, end = 14}]
2017-03-11 01:55:35 +03:00
```
## License
Duckling is BSD-licensed. We also provide an additional patent grant.