Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.
Go to file
Bartosz Nitka 26b1327bcd Make Document type abstract
Summary:
This will let me do smarter things on document construction,
like precomputing where all the whitespace is so that
I can answer `isAdjacent` in O(1) time.

If I'm measuring things right my next diff will cut down
allocations 4x on problematic inputs.

Reviewed By: patapizza

Differential Revision: D4742664

fbshipit-source-id: 7e14e25
2017-03-20 20:49:24 -07:00
Duckling Make Document type abstract 2017-03-20 20:49:24 -07:00
.gitignore .gitignore .stack-work 2017-03-15 10:04:30 -07:00
CONTRIBUTING.md CONTRIBUTING.md 2017-03-10 14:49:18 -08:00
duckling.cabal Rename Finance to AmountOfMoney 2017-03-16 14:49:44 -07:00
ExampleMain.hs Move onto dependent-sum instead of custom local data Some 2017-03-15 10:34:17 -07:00
LICENSE Initial commit 2017-03-08 10:33:56 -08:00
logo.png Adding logo 2017-03-15 08:04:31 -07:00
PATENTS Initial commit 2017-03-08 10:33:56 -08:00
README.md Rename Finance to AmountOfMoney 2017-03-16 14:49:44 -07:00
RegenMain.hs Initial commit 2017-03-08 10:33:56 -08:00
stack.yaml Initial commit 2017-03-08 10:33:56 -08:00
TestMain.hs Initial commit 2017-03-08 10:33:56 -08:00

Duckling Logo

Duckling

Duckling is a Haskell library that parses text into structured data.

"the first Tuesday of October"
=> {"value":"2017-10-03T00:00:00.000-07:00","grain":"day"}

Requirements

A Haskell environment is required. We recommend using stack.

Quickstart

To compile and run the binary:

$ stack build
$ stack exec duckling-example-exec

The first time you run it, it will download all required packages.

To run a source file directly (after compiling once):

$ stack ExampleMain.hs

See ExampleMain.hs for an example on how to integrate Duckling in your project.

Supported dimensions

Duckling supports many languages, but most don't support all dimensions yet (we need your help!).

Dimension Example input Example value output
Distance "6 miles" {"value":6,"type":"value","unit":"mile"}
Duration "3 mins" {"value":3,"minute":3,"unit":"minute","normalized":{"value":180,"unit":"second"}}
Email "duckling@wit.ai" {"value":"duckling@wit.ai"}
AmountOfMoney "42€" {"value":42,"type":"value","unit":"EUR"}
Numeral "eighty eight" {"value":88,"type":"value"}
Ordinal "33rd" {"value":33,"type":"value"}
PhoneNumber "+1 (650) 123-4567" {"value":"(+1) 6501234567"}
Quantity "3 cups of sugar" {"value":3,"type":"value","product":"sugar","unit":"cup"}
Temperature "80F" {"value":80,"type":"value","unit":"fahrenheit"}
Time "today at 9am" {"values":[{"value":"2016-12-14T09:00:00.000-08:00","grain":"hour","type":"value"}],"value":"2016-12-14T09:00:00.000-08:00","grain":"hour","type":"value"}
Url "https://api.wit.ai/message?q=hi" {"value":"https://api.wit.ai/message?q=hi","domain":"api.wit.ai"}
Volume "4 gallons" {"value":4,"type":"value","unit":"gallon"}

Extending Duckling

To regenerate the classifiers and run the test suite:

$ stack RegenMain.hs && stack test

It's important to regenerate the classifiers after updating the code and before running the test suite.

To extend Duckling's support for a dimension in a given language, typically 2 files need to be updated:

  • Duckling/<dimension>/<language>/Rules.hs
  • Duckling/<dimension>/<language>/Corpus.hs

Rules have a name, a pattern and a production. Patterns are used to perform character-level matching (regexes on input) and concept-level matching (predicates on tokens). Productions are arbitrary functions that take a list of tokens and return a new token.

The corpus (resp. negative corpus) is a list of examples that should (resp. shouldn't) parse. The reference time for the corpus is Tuesday Feb 12, 2013 at 4:30am.

Duckling.Debug provides a few debugging tools:

> :l Duckling.Debug
> debug EN "in two minutes" [This Time]
in|within|after <duration> (in two minutes)
-- regex (in)
-- <integer> <unit-of-duration> (two minutes)
-- -- integer (0..19) (two)
-- -- -- regex (two)
-- -- minute (grain) (minutes)
-- -- -- regex (minutes)
[Entity {dim = "time", body = "in two minutes", value = "{\"values\":[{\"value\":\"2013-02-12T04:32:00.000-02:00\",\"grain\":\"second\",\"type\":\"value\"}],\"value\":\"2013-02-12T04:32:00.000-02:00\",\"grain\":\"second\",\"type\":\"value\"}", start = 0, end = 14}]

License

Duckling is BSD-licensed. We also provide an additional patent grant.