Commit Graph

73 Commits

Author SHA1 Message Date
Gerben Janssen van Doorn
f70f991b38 Added support Lao numerals
Summary: Duckling didn't support Lao numerals, this diff adds it.

Reviewed By: patapizza

Differential Revision: D9323242

fbshipit-source-id: 5dad0d4dfb4843281a327947690e664c91ab8f1a
2018-08-17 10:31:17 -07:00
PhalPheaktra Chhaya
b541354c31 Add Numeral dimension for new language KM. (#234)
Summary:
Hello,
I am new to Haskell, but I would like to add Khmer language (KM) to Duckling.
I have tried to extended Duckling by adding Numeral dimension for new language KM.
Please have a look at it and see what we can improve.

Thanks!
Pull Request resolved: https://github.com/facebook/duckling/pull/234

Reviewed By: blandinw

Differential Revision: D9032639

Pulled By: chinmay87

fbshipit-source-id: 7db19edf732fe6500629cc89e18e0655d7bbc48b
2018-08-03 14:45:55 -07:00
jfulse
e1088c1856 Norwegian improvements (#232)
Summary:
Some small improvements to the Norwegian implementation:

- The written number 8 had a typo: "otte" -> "åtte"
- Add support for half an hour before as e.g. "halv to"
- Add support for alternative clock denotation "klokka"
- Add support for alternative tomorrow denotation "i morra"
Pull Request resolved: https://github.com/facebook/duckling/pull/232

Reviewed By: girifb

Differential Revision: D8986441

Pulled By: chinmay87

fbshipit-source-id: 286617d30415febe1f0eda4bc7475ca5c9610734
2018-07-25 13:15:50 -07:00
Julien Odent
9c367ab6cd Don't accept dashes (-) as token separators
Summary:
This is causing some issues, e.g. `20-30` resolving to 8:30pm (latent).
Updating `Numeral` rules to account for that (`EN`, `FR`, `NL` following tests).

Differential Revision: D8854891

fbshipit-source-id: ba17099b014d9cf2f48a7d85147cc890b02578f5
2018-07-16 06:00:40 -07:00
Arunavha Chanda
d5555d0149 Numeral/BN: Adding Bengali numeral support to Duckling
Summary: Added support for Bengali numerals

Reviewed By: patapizza

Differential Revision: D8730468

fbshipit-source-id: dc36017e24d796f35abc477a0b8b317218c64a6a
2018-07-09 12:30:30 -07:00
Cuong Dinh Tri Nguyen
4e11613d39 Supported AmountOfMoney intervals and improved Numeral
Summary:
Support amount of money intervals for VI

Modify ruleNg, ruleDollar, and ruleVND to better capture the usage of VI

In Numeral, add "ngàn" - a common synonym of "nghìn", and "chục" - colloquially used to count tens. Remove "?" in regex of other words as it does not make sense.

Reviewed By: haoxuany

Differential Revision: D8734066

fbshipit-source-id: 15f879ab796025882c85f0ce9f1677c501b364a0
2018-07-09 10:00:42 -07:00
Tero Laxström
4ed1ed83ed Basics for Finnish (#210)
Summary:
Adds Locale and Numeral for Finnish
Closes https://github.com/facebook/duckling/pull/210

Reviewed By: JonCoens

Differential Revision: D8430386

Pulled By: patapizza

fbshipit-source-id: a3c8b3b3419b7f43e2ef332cdb1fb8fc07da3bec
2018-06-19 10:45:27 -07:00
David Magaltadze
713e5db9d6 Added support of Numeral for ka_GE (#211)
Summary:
* `ka_GE` locale
* full support for Numeral
Closes https://github.com/facebook/duckling/pull/211

Reviewed By: chinmay87

Differential Revision: D8456241

Pulled By: patapizza

fbshipit-source-id: 35fb432191bd20a4965503efde94c5ea29d0c50d
2018-06-18 13:31:14 -07:00
Julien Odent
e8286f762c Numeral/RO: Fix multipliers with values above 20
Summary: In Romanian, for numerals above 20, we say "20 de milioane", not "20 milioane".

Reviewed By: haoxuany

Differential Revision: D8334109

fbshipit-source-id: a7fc83440334ab9b1f0511f315029e28449f9771
2018-06-11 11:00:38 -07:00
Andreea Danielescu
bee838e1c5 AmountOfMoney/RO Support intervals
Summary: Added support for intervals (between X and Y), min, max and approximates for currency in RO.

Reviewed By: patapizza

Differential Revision: D8322948

fbshipit-source-id: 462cfd5575c87e757d2c35c8078f539af3e8150f
2018-06-08 09:00:31 -07:00
Julien Odent
567450beea Distance/RO: Fix for values above 20, refactoring
Summary:
* One rule to rule them all, to easily duplicate with "de" suffix.
* This would actually allow things like "4 de metri", though it's fine as it doesn't alter meaning.

Reviewed By: haoxuany

Differential Revision: D8325328

fbshipit-source-id: 3ec0f30431f3cb00152cc9509f6052b1ae29cd08
2018-06-07 16:45:43 -07:00
Abdallatif
a00a0d7bdf fix dual big numbers in Arabic
Summary:
for 200, 2000, 2000000
Closes https://github.com/facebook/duckling/pull/202

Reviewed By: adelnobel

Differential Revision: D8257813

Pulled By: patapizza

fbshipit-source-id: d83ea31b9fdf4d28a61b75e84583ef0d7e7bea30
2018-06-04 14:45:37 -07:00
Abdallatif
9f2fffec23 fix million in ar
Summary:
Fixes https://github.com/wit-ai/wit/issues/1080

Closes https://github.com/facebook/duckling/pull/195

Reviewed By: adelnobel

Differential Revision: D8030001

Pulled By: patapizza

fbshipit-source-id: 0dc0253ff8527c94c8e81a891f33898764336d6d
2018-05-16 14:45:42 -07:00
Ziyang Liu
5460d8df0e Support custom dimensions
Summary:
Support custom dimensions

Had to move the definition of `Dimension` from `Duckling.Dimensions.Types` to `Duckling.Types` to avoid cyclic imports between these two modules.

A sample custom dimension is in `exe/CustomDimensionExample.hs`.

Limitations of custom dimensions:

- All rules for a custom dimension must be in the same module with the definition of the custom dimension. Otherwise there will be cyclic imports, because the definition of the dimension and the rules refer to each other.
- The custom dimension must be specified when using `parse`, since there's no way to get all the existing custom dimensions.

Reviewed By: patapizza

Differential Revision: D7630360

fbshipit-source-id: 30e12dcb33611f5692c4f5949de377bf61b75e1e
2018-04-19 15:30:51 -07:00
Quan
c166179d89 Use comma "," as decimal separator for VI
Summary:
In Vietnam, we use comma "," as decimal separator instead of dot "."
Closes https://github.com/facebook/duckling/pull/176

Reviewed By: patapizza

Differential Revision: D7601328

Pulled By: panagosg7

fbshipit-source-id: 7f5725006f9054fe45a24757ec2abc36b7aa6605
2018-04-18 23:15:30 -07:00
Aaron Yue
babe317723 fix intersect rule to work with negative numbers
Summary:
Numerals that require intersection with negative numbers don't work, since the negative sign gets parsed
before the intersect rules happen. This fixes it by adding a guard in for positive in the intersection rule.

Reviewed By: patapizza

Differential Revision: D7592225

fbshipit-source-id: 2bc9c708cadeea4012c1f3ef487c61a144325f2a
2018-04-13 19:45:40 -07:00
Aaron Yue
fe0807dced Use lambda-case for Rule production
Summary: change Rule production functions to lambda-case

Reviewed By: patapizza

Differential Revision: D7424413

fbshipit-source-id: edac0290d310578f633ff0208434d1eca038ad9c
2018-04-12 10:15:57 -07:00
Aaron Yue
4ef255e577 Generalize and expand digit specifier usage for hanzi
Summary:
generalize chinese digit specifier (十百千万亿) parsing, and add hanzi tests
These digits specifiers can be parsed as (<num><speci>)<num>,
by using the multiplicater value <num><speci>, and a connect function that adds them together
(two cases, skipping digits [which requires a 零 in between], and digits in consecutive locations).
Note that 个 is technically a digit specifier,
but in Chinese it is never used directly as a numeral specifier, and always as a counter.

Reviewed By: zliu41

Differential Revision: D7424249

fbshipit-source-id: 20a85a7df1f908ee9879e92b904178fa26a9a5e5
2018-04-12 10:00:33 -07:00
Giri Anantharaman
519c9519a3 Support Tamil numerals
Summary:
* Setup Tamil (TA) language
* Added Numeral Dimension

Reviewed By: patapizza

Differential Revision: D7323636

fbshipit-source-id: 4b1a42197ff4799880cded9ce86b8d7fae1507bc
2018-03-19 16:45:36 -07:00
Chinmay Deshmukh
5ac990bbe2 Return latent entities
Summary: Add an option to return latent time entities. This can be used when one is pretty certain that the input contains a datetime.

Reviewed By: patapizza

Differential Revision: D7254245

fbshipit-source-id: e9e0503cace2691804056fcebdc18fd9090fb181
2018-03-19 14:45:27 -07:00
Panagiotis Vekris
169df6b46b ruleSkipHundreds only accepts alphabetic numerals
Summary:
Before, duckling would parse `"1 2 2"` as the three digit number 122 through
`ruleSkipHundreds`. This, however, allowed the string `"Pay Kiran1 10eur"` to be
parsed as "110 EUR", which was reported in https://github.com/facebook/duckling/issues/159.

This change only accepts alphabetic numerals to pass through `ruleSkipHundreds`
(for example `"one twenty two"`), which presumably was the original intention of
this rule. This fixes the above issue, without any change in Corpus.

Reviewed By: patapizza

Differential Revision: D7151934

fbshipit-source-id: 024a7a0b6a53beb3a0d42d4bb7f542ce3b05726b
2018-03-06 17:00:29 -08:00
Julien Odent
aed5b8f779 Numeral: don't compose negative numbers, fix double negatives
Summary:
* added `isMultipliable` helper and used that in patterns along with `isPositive`
* fixed double negatives in most languages

Reviewed By: niteria

Differential Revision: D7034982

fbshipit-source-id: a0bb67056d3107167830ece0c34d761c5563c5a7
2018-02-21 12:02:45 -08:00
bidhan-a
43079e7113 Setup Nepali (NE) and add Numeral dimension
Summary:
- Setup Nepali (NE) language
- Add basic Numeral dimension
Closes https://github.com/facebook/duckling/pull/156

Reviewed By: JonCoens

Differential Revision: D6965558

Pulled By: patapizza

fbshipit-source-id: f46c9b104d4345f20bd0cf53f8c9c8754855f314
2018-02-13 07:45:31 -08:00
Julien Odent
b00fa512af AmountOfMoney: fixes
Summary:
* don't recursively compose cents
* don't allow decreasing ranges

Reviewed By: blandinw

Differential Revision: D6849132

fbshipit-source-id: ed6ca30388642c21e677a628971747a4fb3dfbef
2018-01-30 14:45:36 -08:00
Julien Odent
bef7a44fa8 Remove redundant brackets and language pragmas
Summary: .

Reviewed By: JonCoens

Differential Revision: D6838082

fbshipit-source-id: 94757bdb80c6d3c29a7a6554429940a1b7403108
2018-01-29 16:45:28 -08:00
Abdallatif Sulaiman
ce2f48fbb5 Quickfix for Arabic Numeral
Summary: Closes https://github.com/facebook/duckling/pull/137

Reviewed By: adelnobel

Differential Revision: D6660959

Pulled By: patapizza

fbshipit-source-id: 51169c63558b5a35f16bdaaada9649c88a4d2994
2018-01-05 11:45:40 -08:00
Abdallatif Sulaiman
1393098bcc Added Time Dimension to Arabic
Summary:
Hi, in this pr:
* Added time dimension to Arabic language, thanks to Hussein-Dahir & Yazeed-Obaid for writing time corpus.
* Fixed some bugs in numeral & ordinals and added more test cases for them.

Also, I don't really understand why do we use classifiers in time dimension?
Closes https://github.com/facebook/duckling/pull/123

Reviewed By: blandinw

Differential Revision: D6583313

Pulled By: patapizza

fbshipit-source-id: f7acdef0c032d7b7fd7d224832fdaf484d2df825
2017-12-19 14:30:42 -08:00
Newinfinite007
c133bad24a Hindi Language Numeral Dimension(minimalistic model). Tests passed.
Summary: Closes https://github.com/facebook/duckling/pull/119

Reviewed By: JonCoens

Differential Revision: D6597628

Pulled By: patapizza

fbshipit-source-id: 8bac0f686d6cecc38d9998e37042fe48f73530dc
2017-12-19 13:15:30 -08:00
Henri Dwyer
15b3988ec5 update MY pattern match to HashMap
Summary: Create hashmaps for the pattern matching rules, and make the regex matching read the value from the hashmap.

Reviewed By: patapizza

Differential Revision: D6592508

fbshipit-source-id: 0f4dd9bc548f71d0e6e52c938bd2e39bfd909c01
2017-12-18 13:00:35 -08:00
Henri Dwyer
10db72fd05 update KO pattern match to HashMap
Summary: Create hashmaps for the pattern matching rules, and make the regex matching read the value from the hashmap.

Reviewed By: panagosg7

Differential Revision: D6536790

fbshipit-source-id: ecb0bf28cf22b82ef10c3c2408eda15409189431
2017-12-15 11:45:28 -08:00
Julien Odent
fb3979d257 Numeral: move ruleFractions to common + cleanup numeral flag for Time
Summary:
* Closes PRs #115, #116.
* `Numeral`: Moves `ruleFractions` to common
* `Numeral/PT`: fixes dot in regex
* `Time`: Moves `isNumeralSafeToUse` to `isIntegerBetween` (more robust and cleaner)
* `Time/EN`: refactored `ruleCycleLastNextN` to `ruleDurationLastNext` (so it also accounts for non-safe Numerals, like "in a few days")

Reviewed By: blandinw

Differential Revision: D6502958

fbshipit-source-id: 6d9c08a23dab88f441aadade08d663c8e0da415d
2017-12-15 10:15:31 -08:00
Julien Odent
6df3b26707 Numeral: common rule + supporting hindu-arabic numerals for Burmese
Summary:
* `ruleIntegerNumeric` was used in all languages but Burmese.
* it seems like the hindu-arabic numerals are slowly getting in Burmese (e.g. recent car plates)
* Moving the rule in `Duckling/Numeral/Common.hs`

Reviewed By: blandinw

Differential Revision: D6498349

fbshipit-source-id: e868dc9960f18f0781e4aa98a0dfcd14969537c9
2017-12-06 16:00:28 -08:00
Genki Furumi
56b57df153 HashMap lookups for large regexes
Summary: Replaced pattern matching with Hashmap.

Reviewed By: patapizza

Differential Revision: D6493427

fbshipit-source-id: 5ae9370387a738423724cabbe7c0d44c4889e185
2017-12-06 11:00:38 -08:00
Alex Torres
498e8b16e6 fix pt rules for numeral
Summary:
When I write "dois mil e duzentos" the result should be 2200, but duckling recognize the numbers separated and give the result:

`[{"dim":"number","body":"dois","value":{"value":2,"type":"value"},"start":0,"end":4},{"dim":"number","body":"mil","value":{"value":1000,"type":"value"},"start":5,"end":8},{"dim":"time","body":"mil","value":{"values":[],"value":"1000-01-01T00:00:00.000-07:53","grain":"year","type":"value"},"start":5,"end":8},{"dim":"number","body":"duzentos","value":{"value":200,"type":"value"},"start":11,"end":19}]`

Now with this commit, duckling gives the correct result:

`[{"dim":"number","body":"dois mil e duzentos","value":{"value":2200,"type":"value"},"start":0,"end":19}]`
Closes https://github.com/facebook/duckling/pull/117

Reviewed By: blandinw

Differential Revision: D6477925

Pulled By: patapizza

fbshipit-source-id: 26ab503cc8def739c51ceb5bae7546016ba65ad6
2017-12-04 18:00:36 -08:00
Panagiotis Vekris
12a726aee7 Support for Greek times and dates
Summary:
This adds support for greek times.

There are still some issues with expressions of the form:
```
9:30 - 11:00 την πέμπτη
```
Where `11:00 την πέμπτη` is parsed first (as 11:30 on Thu), instead of prioritizing `9:30 - 11:00` as the training data suggests. These tests are for the moment excluded from the corpus.

Reviewed By: patapizza

Differential Revision: D6376271

fbshipit-source-id: 2f31e058fb88386429070e3b51cd33f93b9c5936
2017-12-04 16:45:40 -08:00
Julien Odent
12000c2f86 Numeral/GA: fix old vigesimal + refactoring
Summary: Merging rules in two, fixing production rule.

Reviewed By: JonCoens, blandinw

Differential Revision: D6466793

fbshipit-source-id: 0fa143b049e9ccce7f4946e14bbd15fc37c324e1
2017-12-01 18:15:31 -08:00
Jana Šefčíková
7fe748ffec Numeral/GA: HashMap lookups for large regexes
Summary: Use hashMap oneToTenMap instead of guard.

Reviewed By: patapizza

Differential Revision: D6414352

fbshipit-source-id: a1788df0d3521a3a870e453294708d9fe2c10908
2017-12-01 10:15:24 -08:00
Julien Odent
54ccbf81df Numeral/DE: fix keiner + cleanup
Summary:
* fixed keine[rn]
* removed redundant `ruleInteger`
* replaced pattern matching with hashmap lookup
* renamed `dozensMap` to `tensMap`

Reviewed By: blandinw

Differential Revision: D6336252

fbshipit-source-id: 740734ab7b0b289adc4f466f966c4c5e59af75ad
2017-11-15 13:15:29 -08:00
igor-drozdov
29d776dee5 Added TimeGrain and Duration Dimensions to Russian language
Summary:
- Added Duration dimension to Russian language
- Added TimeGrain dimension to Russian language
- Refactored isNatural and isNaturalWith out of Duration helpers into Numeral helpers
- Implemented <integer> and a half rule for Russian Numeral
- Changed the type of inSeconds to polymorphic one
Closes https://github.com/facebook/duckling/pull/105

Reviewed By: blandinw

Differential Revision: D6312604

Pulled By: patapizza

fbshipit-source-id: 9ae237b4beb6915ff8da013230457937d8e56733
2017-11-15 10:45:24 -08:00
Panagiotis Vekris
e8937e1cd6 Support for Greek durations
Summary: Adding support for Greek time grains and durations.

Reviewed By: patapizza

Differential Revision: D6249955

fbshipit-source-id: 1c69e26
2017-11-06 18:49:36 -08:00
Julien Odent
ba46d592cd Prevent double negatives + cleanup
Summary:
* Prevent double negatives (resulting from `ruleNegative` applying twice and from engine tokenizer)
* Hashmap lookup for tens
* cleanup

Reviewed By: blandinw

Differential Revision: D6221107

fbshipit-source-id: 42e401d
2017-11-03 00:31:22 -07:00
Panagiotis Vekris
fda8c7c759 Support Greek numerals
Summary:
- Setup Greek language (EL)
- Added Greek Numerals

Reviewed By: patapizza

Differential Revision: D6217873

fbshipit-source-id: 379170f
2017-11-02 17:16:18 -07:00
Jeffrey Karres
67190c4238 HashMap lookups for large regexes
Summary: Replacing guards with hashmap

Reviewed By: patapizza

Differential Revision: D6192748

fbshipit-source-id: 644825b
2017-10-30 18:04:29 -07:00
Aleksey Suslov
df6484945a Few improvements for Numeral/RU
Summary:
- fix typos in 11, 300 and 400 forms
- add more 0 rules
- introduce 1.5 rules
Closes https://github.com/facebookincubator/duckling/pull/96

Differential Revision: D6136073

Pulled By: patapizza

fbshipit-source-id: b1252f2
2017-10-24 10:49:27 -07:00
Julien Odent
ed58115caf Numeral: refactor 'Text.singleton' usages
Summary:
* refactored `Text.singleton` usages into `Text` literals
* removed redundant `join` imports with `NoRebindableSyntax` language pragma
* ET: merged 2 rules into one

Reviewed By: blandinw

Differential Revision: D6080231

fbshipit-source-id: 47c18df
2017-10-17 13:19:39 -07:00
Julien Odent
b2de97800f Numeral/FR: allow space as a thousand separator
Summary: Fixes #91.

Reviewed By: blandinw

Differential Revision: D6079821

fbshipit-source-id: f3160c1
2017-10-17 12:20:32 -07:00
Matthijs Mullender
33a08bb76b Support Dutch Durations
Summary:
This change adds support for durations in Dutch/Netherlands (NL)
Implemented: TimeGrain/NL, Durations/NL

Reviewed By: patapizza

Differential Revision: D6049404

fbshipit-source-id: 3621cdb
2017-10-13 12:49:30 -07:00
Julien Odent
ab0ad0256e Locales support
Summary:
* Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33)
* `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations
* API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility.
*  `Rules/<Lang>.hs` exposes
  - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs`
  - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs`
  - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility
* Corpus, tests & classifiers
  - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible)
  - Default classifiers are built on existing corpus
  - Locale classifiers are built on
  - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>`
  - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`).
  - Locale classifiers use the language corpus extended with the locale examples as training set.
  - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus.
  - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`.
  - Tests run against each classifier to make sure runtime works as expected.
* MM/DD (en_US) vs DD/MM (en_GB) example to illustrate

Reviewed By: JonCoens, blandinw

Differential Revision: D6038096

fbshipit-source-id: f29c28d
2017-10-13 08:34:21 -07:00
Ian Stewart-Binks
2b566eeac0 Numeral/JA: HashMap lookups for large regexes
Summary: Replaced pattern matching with Hashmap. Also, removed ruleInteger17 and moved its regex to ruleInteger.

Reviewed By: patapizza

Differential Revision: D5812629

fbshipit-source-id: f0c1a06
2017-10-06 10:34:33 -07:00
Julien Odent
83ea150d94 Convert back escaped characters in rules
Summary:
We noticed that using UTF-8 characters directly in regexes work.
Hence converting back the escaped characters for readability and maintenance.

Reviewed By: blandinw

Differential Revision: D5787146

fbshipit-source-id: e5a4b9a
2017-09-07 12:49:33 -07:00