2022-01-10 22:57:43 +03:00
![Duckling Logo ](https://github.com/facebook/duckling/raw/main/logo.png )
2017-03-15 17:46:15 +03:00
2022-03-05 06:18:57 +03:00
# Duckling [![Support Ukraine](https://img.shields.io/badge/Support-Ukraine-FFD500?style=flat&labelColor=005BBB)](https://opensource.fb.com/support-ukraine) [![Build Status](https://travis-ci.org/facebook/duckling.svg?branch=master)](https://travis-ci.org/facebook/duckling)
2022-01-10 22:57:43 +03:00
2017-03-11 01:55:35 +03:00
Duckling is a Haskell library that parses text into structured data.
2022-01-10 22:57:43 +03:00
```bash
2017-03-14 21:13:35 +03:00
"the first Tuesday of October"
=> {"value":"2017-10-03T00:00:00.000-07:00","grain":"day"}
```
2018-03-06 15:47:22 +03:00
## Requirements
2022-01-10 22:57:43 +03:00
2017-03-11 01:55:35 +03:00
A Haskell environment is required. We recommend using
[stack ](https://haskell-lang.org/get-started ).
2021-09-01 03:19:10 +03:00
On Linux and MacOS you'll need to install PCRE development headers.
On Linux, use your package manager to install them.
On MacOS, the easiest way to install them is with [Homebrew ](https://brew.sh/ ):
2022-01-10 22:57:43 +03:00
```bash
2017-05-04 03:52:07 +03:00
brew install pcre
```
2022-01-10 22:57:43 +03:00
2017-05-10 00:16:05 +03:00
If that doesn't help, try running `brew doctor` and fix
the issues it finds.
2017-05-04 03:52:07 +03:00
2017-03-11 01:55:35 +03:00
## Quickstart
2022-01-10 22:57:43 +03:00
2017-03-11 01:55:35 +03:00
To compile and run the binary:
2022-01-10 22:57:43 +03:00
```bash
stack build
stack exec duckling-example-exe
2017-03-11 01:55:35 +03:00
```
2022-01-10 22:57:43 +03:00
2017-03-11 01:55:35 +03:00
The first time you run it, it will download all required packages.
2017-05-01 18:58:22 +03:00
This runs a basic HTTP server. Example request:
2022-01-10 22:57:43 +03:00
```bash
curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_GB& text=tomorrow at eight'
2017-05-01 18:58:22 +03:00
```
2022-01-10 22:57:43 +03:00
2020-12-15 01:54:06 +03:00
In the example application, all dimensions are enabled by default. Provide the parameter `dims` to specify which ones you want. Examples:
2022-01-10 22:57:43 +03:00
```bash
2020-12-15 01:54:06 +03:00
Identify credit card numbers only:
restrict dimensions to only those specified (#625)
Summary:
Resolves https://github.com/facebook/duckling/issues/624
Before patch (specifying quantity and numeral, but time still shows up):
```
❯ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_US&text="June 21 and 3 cups of sugar"&dims="[\"quantity\",\"numeral\"]"' | jq
[
{
"body": "June 21",
"start": 1,
"value": {
"values": [
{
"value": "2021-06-21T00:00:00.000-07:00",
"grain": "day",
"type": "value"
},
{
"value": "2022-06-21T00:00:00.000-07:00",
"grain": "day",
"type": "value"
},
{
"value": "2023-06-21T00:00:00.000-07:00",
"grain": "day",
"type": "value"
}
],
"value": "2021-06-21T00:00:00.000-07:00",
"grain": "day",
"type": "value"
},
"end": 8,
"dim": "time",
"latent": false
},
{
"body": "3 cups of sugar",
"start": 13,
"value": {
"value": 3,
"type": "value",
"product": "sugar",
"unit": "cup"
},
"end": 28,
"dim": "quantity",
"latent": false
}
]
```
After patch (time no longer shows up):
```
❯ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_US&text="June 21 and 3 cups of sugar"&dims="[\"quantity\",\"numeral\"]"' | jq
[
{
"body": "3 cups of sugar",
"start": 13,
"value": {
"value": 3,
"type": "value",
"product": "sugar",
"unit": "cup"
},
"end": 28,
"dim": "quantity",
"latent": false
}
]
```
Pull Request resolved: https://github.com/facebook/duckling/pull/625
Reviewed By: stroxler
Differential Revision: D28851759
Pulled By: chessai
fbshipit-source-id: d3b3f33092c7e60bf29886939488ed562a213c35
2021-06-03 20:22:54 +03:00
$ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_US& text="4111-1111-1111-1111"& dims="["credit-card-number"]"'
2020-12-15 01:54:06 +03:00
If you want multiple dimensions, comma-separate them in the array:
restrict dimensions to only those specified (#625)
Summary:
Resolves https://github.com/facebook/duckling/issues/624
Before patch (specifying quantity and numeral, but time still shows up):
```
❯ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_US&text="June 21 and 3 cups of sugar"&dims="[\"quantity\",\"numeral\"]"' | jq
[
{
"body": "June 21",
"start": 1,
"value": {
"values": [
{
"value": "2021-06-21T00:00:00.000-07:00",
"grain": "day",
"type": "value"
},
{
"value": "2022-06-21T00:00:00.000-07:00",
"grain": "day",
"type": "value"
},
{
"value": "2023-06-21T00:00:00.000-07:00",
"grain": "day",
"type": "value"
}
],
"value": "2021-06-21T00:00:00.000-07:00",
"grain": "day",
"type": "value"
},
"end": 8,
"dim": "time",
"latent": false
},
{
"body": "3 cups of sugar",
"start": 13,
"value": {
"value": 3,
"type": "value",
"product": "sugar",
"unit": "cup"
},
"end": 28,
"dim": "quantity",
"latent": false
}
]
```
After patch (time no longer shows up):
```
❯ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_US&text="June 21 and 3 cups of sugar"&dims="[\"quantity\",\"numeral\"]"' | jq
[
{
"body": "3 cups of sugar",
"start": 13,
"value": {
"value": 3,
"type": "value",
"product": "sugar",
"unit": "cup"
},
"end": 28,
"dim": "quantity",
"latent": false
}
]
```
Pull Request resolved: https://github.com/facebook/duckling/pull/625
Reviewed By: stroxler
Differential Revision: D28851759
Pulled By: chessai
fbshipit-source-id: d3b3f33092c7e60bf29886939488ed562a213c35
2021-06-03 20:22:54 +03:00
$ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_US& text="3 cups of sugar"& dims="["quantity","numeral"]"'
2020-12-15 01:54:06 +03:00
```
2017-05-01 18:58:22 +03:00
2017-03-28 01:49:18 +03:00
See `exe/ExampleMain.hs` for an example on how to integrate Duckling in your
2017-03-11 01:55:35 +03:00
project.
2017-11-13 20:58:29 +03:00
If your backend doesn't run Haskell or if you don't want to spin your own Duckling server, you can directly use [wit.ai ](https://wit.ai )'s built-in entities.
2017-03-11 01:55:35 +03:00
2017-03-14 21:13:35 +03:00
## Supported dimensions
2022-01-10 22:57:43 +03:00
2017-03-14 21:13:35 +03:00
Duckling supports many languages, but most don't support all dimensions yet
2017-11-13 20:59:03 +03:00
(**we need your help!**).
Please look into [this directory ](https://github.com/facebook/duckling/blob/master/Duckling/Dimensions ) for language-specific support.
2017-03-14 21:13:35 +03:00
| Dimension | Example input | Example value output
| --------- | ------------- | --------------------
2017-05-01 18:58:22 +03:00
| `AmountOfMoney` | "42€" | `{"value":42,"type":"value","unit":"EUR"}`
2019-01-09 21:29:29 +03:00
| `CreditCardNumber` | "4111-1111-1111-1111" | `{"value":"4111111111111111","issuer":"visa"}`
2017-03-14 21:13:35 +03:00
| `Distance` | "6 miles" | `{"value":6,"type":"value","unit":"mile"}`
| `Duration` | "3 mins" | `{"value":3,"minute":3,"unit":"minute","normalized":{"value":180,"unit":"second"}}`
2017-03-31 01:57:01 +03:00
| `Email` | "duckling-team@fb.com" | `{"value":"duckling-team@fb.com"}`
2017-03-14 23:19:13 +03:00
| `Numeral` | "eighty eight" | `{"value":88,"type":"value"}`
2017-03-14 21:13:35 +03:00
| `Ordinal` | "33rd" | `{"value":33,"type":"value"}`
| `PhoneNumber` | "+1 (650) 123-4567" | `{"value":"(+1) 6501234567"}`
| `Quantity` | "3 cups of sugar" | `{"value":3,"type":"value","product":"sugar","unit":"cup"}`
| `Temperature` | "80F" | `{"value":80,"type":"value","unit":"fahrenheit"}`
| `Time` | "today at 9am" | `{"values":[{"value":"2016-12-14T09:00:00.000-08:00","grain":"hour","type":"value"}],"value":"2016-12-14T09:00:00.000-08:00","grain":"hour","type":"value"}`
| `Url` | "https://api.wit.ai/message?q=hi" | `{"value":"https://api.wit.ai/message?q=hi","domain":"api.wit.ai"}`
| `Volume` | "4 gallons" | `{"value":4,"type":"value","unit":"gallon"}`
2018-05-10 01:06:33 +03:00
[Custom dimensions ](https://github.com/facebook/duckling/blob/master/exe/CustomDimensionExample.hs ) are also supported.
2017-03-14 21:13:35 +03:00
## Extending Duckling
2022-01-10 22:57:43 +03:00
2017-03-14 21:13:35 +03:00
To regenerate the classifiers and run the test suite:
2022-01-10 22:57:43 +03:00
```bash
stack build :duckling-regen-exe & & stack exec duckling-regen-exe & & stack test
2017-03-14 21:13:35 +03:00
```
It's important to regenerate the classifiers after updating the code and before
running the test suite.
2019-07-10 01:03:20 +03:00
To extend Duckling's support for a dimension in a given language, typically 4
2017-03-14 21:13:35 +03:00
files need to be updated:
2022-01-10 22:57:43 +03:00
2018-05-10 01:06:33 +03:00
* `Duckling/<Dimension>/<Lang>/Rules.hs`
2022-01-10 22:57:43 +03:00
2018-05-10 01:06:33 +03:00
* `Duckling/<Dimension>/<Lang>/Corpus.hs`
2022-01-10 22:57:43 +03:00
2018-05-10 01:06:33 +03:00
* `Duckling/Dimensions/<Lang>.hs` (if not already present in `Duckling/Dimensions/Common.hs` )
2022-01-10 22:57:43 +03:00
2019-07-10 01:03:20 +03:00
* `Duckling/Rules/<Lang>.hs`
2017-03-14 21:13:35 +03:00
2017-08-26 03:21:37 +03:00
To add a new language:
2022-01-10 22:57:43 +03:00
2017-08-26 03:21:37 +03:00
* Make sure that the language code used follows the [ISO-639-1 standard ](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes ).
* The first dimension to implement is `Numeral` .
2017-11-23 03:46:47 +03:00
* Follow [this example ](https://github.com/facebook/duckling/commit/24d3f199768be970149412c95b1c1bf5d76f8240 ).
To add a new locale:
2022-01-10 22:57:43 +03:00
2017-11-23 03:46:47 +03:00
* There should be a need for diverging rules between the locale and the language.
* Make sure that the locale code is a valid [ISO3166 alpha2 country code ](https://www.iso.org/obp/ui/#search/code/ ).
* Follow [this example ](https://github.com/facebook/duckling/commit/1ab5f447d2635fe6d48887a501d333a52adff5b9 ).
2017-08-26 03:21:37 +03:00
2017-03-14 21:13:35 +03:00
Rules have a name, a pattern and a production.
Patterns are used to perform character-level matching (regexes on input) and
concept-level matching (predicates on tokens).
Productions are arbitrary functions that take a list of tokens and return a new
token.
The corpus (resp. negative corpus) is a list of examples that should (resp.
shouldn't) parse. The reference time for the corpus is Tuesday Feb 12, 2013 at
4:30am.
`Duckling.Debug` provides a few debugging tools:
2022-01-10 22:57:43 +03:00
```bash
2017-05-11 17:15:29 +03:00
$ stack repl --no-load
2017-03-14 21:13:35 +03:00
> :l Duckling.Debug
2021-03-27 00:17:16 +03:00
> debug (makeLocale EN $ Just US) "in two minutes" [Seal Time]
2017-03-14 21:13:35 +03:00
in|within|after < duration > (in two minutes)
-- regex (in)
-- < integer > < unit-of-duration > (two minutes)
-- -- integer (0..19) (two)
-- -- -- regex (two)
-- -- minute (grain) (minutes)
-- -- -- regex (minutes)
2018-04-21 00:08:39 +03:00
[Entity {dim = "time", body = "in two minutes", value = RVal Time (TimeValue (SimpleValue (InstantValue {vValue = 2013-02-12 04:32:00 -0200, vGrain = Second})) [SimpleValue (InstantValue {vValue = 2013-02-12 04:32:00 -0200, vGrain = Second})] Nothing), start = 0, end = 14}]
2017-03-11 01:55:35 +03:00
```
## License
2022-01-10 22:57:43 +03:00
Duckling is [BSD-licensed ](LICENSE ).