Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.
Go to file
Maíra Bello 328e59ebc4 Quantity/PT: Extend quantity to include grams in portuguese (#631)
Summary:
I'm using Duckling in my project but I noticed that quantities in kg weren't being detected correctly, though other entities such as numeral/volume were all working as expected. Investigating more I noticed that this was just because in Portuguese the Quantity entity was only configured to detect cups and pounds, never grams. And even for cups/pounds, products weren't being detected correctly.

So I've just adapted the rules from English Quantity to work in Portuguese as well, while keeping the cups/pounds too. It's all working as expected now and it's backwards compatible.

Pull Request resolved: https://github.com/facebook/duckling/pull/631

Reviewed By: stroxler

Differential Revision: D29701339

Pulled By: chessai

fbshipit-source-id: fca08a14c50844d418f101b885ca54554d993f58
2021-07-19 15:48:04 -07:00
.github/workflows update ci (#594) 2021-04-12 18:32:23 -07:00
Duckling Quantity/PT: Extend quantity to include grams in portuguese (#631) 2021-07-19 15:48:04 -07:00
exe restrict dimensions to only those specified (#625) 2021-06-03 10:33:42 -07:00
tests Add isArabic rule (#577) 2021-07-12 13:37:23 -07:00
.dockerignore Improve Docker build (#341) 2020-04-17 08:22:43 -07:00
.gitignore Update dependencies/CI 2020-10-29 11:02:49 -07:00
.hlint.yaml Ignore the no-"-1" linter error for duckling 2021-03-17 10:47:28 -07:00
CHANGELOG.md Add custom isRangeValid implementation for ZH 2021-06-04 12:48:32 -07:00
CODE_OF_CONDUCT.md add FB code of conduct 2018-01-02 08:15:29 -08:00
CONTRIBUTING.md Documentation: Coding style 2018-05-14 14:30:36 -07:00
Dockerfile Dockerfile: debugs the build and uses Debian Buster everywhere (#539) 2020-11-02 13:33:00 -08:00
duckling.cabal Init 2021-05-18 13:50:19 -07:00
LICENSE Initial commit 2017-03-08 10:33:56 -08:00
logo.png Adding logo 2017-03-15 08:04:31 -07:00
README.md restrict dimensions to only those specified (#625) 2021-06-03 10:33:42 -07:00
stack.yaml Update dependencies/CI 2020-10-29 11:02:49 -07:00

Duckling Logo

Duckling Build Status

Duckling is a Haskell library that parses text into structured data.

"the first Tuesday of October"
=> {"value":"2017-10-03T00:00:00.000-07:00","grain":"day"}

Requirements

A Haskell environment is required. We recommend using stack.

On macOS you'll need to install PCRE development headers. The easiest way to do that is with Homebrew:

brew install pcre

If that doesn't help, try running brew doctor and fix the issues it finds.

Quickstart

To compile and run the binary:

$ stack build
$ stack exec duckling-example-exe

The first time you run it, it will download all required packages.

This runs a basic HTTP server. Example request:

$ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_GB&text=tomorrow at eight'

In the example application, all dimensions are enabled by default. Provide the parameter dims to specify which ones you want. Examples:

Identify credit card numbers only:
$ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_US&text="4111-1111-1111-1111"&dims="["credit-card-number"]"'
If you want multiple dimensions, comma-separate them in the array:
$ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_US&text="3 cups of sugar"&dims="["quantity","numeral"]"'

See exe/ExampleMain.hs for an example on how to integrate Duckling in your project. If your backend doesn't run Haskell or if you don't want to spin your own Duckling server, you can directly use wit.ai's built-in entities.

Supported dimensions

Duckling supports many languages, but most don't support all dimensions yet (we need your help!). Please look into this directory for language-specific support.

Dimension Example input Example value output
AmountOfMoney "42€" {"value":42,"type":"value","unit":"EUR"}
CreditCardNumber "4111-1111-1111-1111" {"value":"4111111111111111","issuer":"visa"}
Distance "6 miles" {"value":6,"type":"value","unit":"mile"}
Duration "3 mins" {"value":3,"minute":3,"unit":"minute","normalized":{"value":180,"unit":"second"}}
Email "duckling-team@fb.com" {"value":"duckling-team@fb.com"}
Numeral "eighty eight" {"value":88,"type":"value"}
Ordinal "33rd" {"value":33,"type":"value"}
PhoneNumber "+1 (650) 123-4567" {"value":"(+1) 6501234567"}
Quantity "3 cups of sugar" {"value":3,"type":"value","product":"sugar","unit":"cup"}
Temperature "80F" {"value":80,"type":"value","unit":"fahrenheit"}
Time "today at 9am" {"values":[{"value":"2016-12-14T09:00:00.000-08:00","grain":"hour","type":"value"}],"value":"2016-12-14T09:00:00.000-08:00","grain":"hour","type":"value"}
Url "https://api.wit.ai/message?q=hi" {"value":"https://api.wit.ai/message?q=hi","domain":"api.wit.ai"}
Volume "4 gallons" {"value":4,"type":"value","unit":"gallon"}

Custom dimensions are also supported.

Extending Duckling

To regenerate the classifiers and run the test suite:

$ stack build :duckling-regen-exe && stack exec duckling-regen-exe && stack test

It's important to regenerate the classifiers after updating the code and before running the test suite.

To extend Duckling's support for a dimension in a given language, typically 4 files need to be updated:

  • Duckling/<Dimension>/<Lang>/Rules.hs
  • Duckling/<Dimension>/<Lang>/Corpus.hs
  • Duckling/Dimensions/<Lang>.hs (if not already present in Duckling/Dimensions/Common.hs)
  • Duckling/Rules/<Lang>.hs

To add a new language:

To add a new locale:

Rules have a name, a pattern and a production. Patterns are used to perform character-level matching (regexes on input) and concept-level matching (predicates on tokens). Productions are arbitrary functions that take a list of tokens and return a new token.

The corpus (resp. negative corpus) is a list of examples that should (resp. shouldn't) parse. The reference time for the corpus is Tuesday Feb 12, 2013 at 4:30am.

Duckling.Debug provides a few debugging tools:

$ stack repl --no-load
> :l Duckling.Debug
> debug (makeLocale EN $ Just US) "in two minutes" [Seal Time]
in|within|after <duration> (in two minutes)
-- regex (in)
-- <integer> <unit-of-duration> (two minutes)
-- -- integer (0..19) (two)
-- -- -- regex (two)
-- -- minute (grain) (minutes)
-- -- -- regex (minutes)
[Entity {dim = "time", body = "in two minutes", value = RVal Time (TimeValue (SimpleValue (InstantValue {vValue = 2013-02-12 04:32:00 -0200, vGrain = Second})) [SimpleValue (InstantValue {vValue = 2013-02-12 04:32:00 -0200, vGrain = Second})] Nothing), start = 0, end = 14}]

License

Duckling is BSD-licensed.