Commit Graph

59 Commits

Author SHA1 Message Date
Quan
c166179d89 Use comma "," as decimal separator for VI
Summary:
In Vietnam, we use comma "," as decimal separator instead of dot "."
Closes https://github.com/facebook/duckling/pull/176

Reviewed By: patapizza

Differential Revision: D7601328

Pulled By: panagosg7

fbshipit-source-id: 7f5725006f9054fe45a24757ec2abc36b7aa6605
2018-04-18 23:15:30 -07:00
Aaron Yue
babe317723 fix intersect rule to work with negative numbers
Summary:
Numerals that require intersection with negative numbers don't work, since the negative sign gets parsed
before the intersect rules happen. This fixes it by adding a guard in for positive in the intersection rule.

Reviewed By: patapizza

Differential Revision: D7592225

fbshipit-source-id: 2bc9c708cadeea4012c1f3ef487c61a144325f2a
2018-04-13 19:45:40 -07:00
Aaron Yue
fe0807dced Use lambda-case for Rule production
Summary: change Rule production functions to lambda-case

Reviewed By: patapizza

Differential Revision: D7424413

fbshipit-source-id: edac0290d310578f633ff0208434d1eca038ad9c
2018-04-12 10:15:57 -07:00
Aaron Yue
4ef255e577 Generalize and expand digit specifier usage for hanzi
Summary:
generalize chinese digit specifier (十百千万亿) parsing, and add hanzi tests
These digits specifiers can be parsed as (<num><speci>)<num>,
by using the multiplicater value <num><speci>, and a connect function that adds them together
(two cases, skipping digits [which requires a 零 in between], and digits in consecutive locations).
Note that 个 is technically a digit specifier,
but in Chinese it is never used directly as a numeral specifier, and always as a counter.

Reviewed By: zliu41

Differential Revision: D7424249

fbshipit-source-id: 20a85a7df1f908ee9879e92b904178fa26a9a5e5
2018-04-12 10:00:33 -07:00
Giri Anantharaman
519c9519a3 Support Tamil numerals
Summary:
* Setup Tamil (TA) language
* Added Numeral Dimension

Reviewed By: patapizza

Differential Revision: D7323636

fbshipit-source-id: 4b1a42197ff4799880cded9ce86b8d7fae1507bc
2018-03-19 16:45:36 -07:00
Chinmay Deshmukh
5ac990bbe2 Return latent entities
Summary: Add an option to return latent time entities. This can be used when one is pretty certain that the input contains a datetime.

Reviewed By: patapizza

Differential Revision: D7254245

fbshipit-source-id: e9e0503cace2691804056fcebdc18fd9090fb181
2018-03-19 14:45:27 -07:00
Panagiotis Vekris
169df6b46b ruleSkipHundreds only accepts alphabetic numerals
Summary:
Before, duckling would parse `"1 2 2"` as the three digit number 122 through
`ruleSkipHundreds`. This, however, allowed the string `"Pay Kiran1 10eur"` to be
parsed as "110 EUR", which was reported in https://github.com/facebook/duckling/issues/159.

This change only accepts alphabetic numerals to pass through `ruleSkipHundreds`
(for example `"one twenty two"`), which presumably was the original intention of
this rule. This fixes the above issue, without any change in Corpus.

Reviewed By: patapizza

Differential Revision: D7151934

fbshipit-source-id: 024a7a0b6a53beb3a0d42d4bb7f542ce3b05726b
2018-03-06 17:00:29 -08:00
Julien Odent
aed5b8f779 Numeral: don't compose negative numbers, fix double negatives
Summary:
* added `isMultipliable` helper and used that in patterns along with `isPositive`
* fixed double negatives in most languages

Reviewed By: niteria

Differential Revision: D7034982

fbshipit-source-id: a0bb67056d3107167830ece0c34d761c5563c5a7
2018-02-21 12:02:45 -08:00
bidhan-a
43079e7113 Setup Nepali (NE) and add Numeral dimension
Summary:
- Setup Nepali (NE) language
- Add basic Numeral dimension
Closes https://github.com/facebook/duckling/pull/156

Reviewed By: JonCoens

Differential Revision: D6965558

Pulled By: patapizza

fbshipit-source-id: f46c9b104d4345f20bd0cf53f8c9c8754855f314
2018-02-13 07:45:31 -08:00
Julien Odent
b00fa512af AmountOfMoney: fixes
Summary:
* don't recursively compose cents
* don't allow decreasing ranges

Reviewed By: blandinw

Differential Revision: D6849132

fbshipit-source-id: ed6ca30388642c21e677a628971747a4fb3dfbef
2018-01-30 14:45:36 -08:00
Julien Odent
bef7a44fa8 Remove redundant brackets and language pragmas
Summary: .

Reviewed By: JonCoens

Differential Revision: D6838082

fbshipit-source-id: 94757bdb80c6d3c29a7a6554429940a1b7403108
2018-01-29 16:45:28 -08:00
Abdallatif Sulaiman
ce2f48fbb5 Quickfix for Arabic Numeral
Summary: Closes https://github.com/facebook/duckling/pull/137

Reviewed By: adelnobel

Differential Revision: D6660959

Pulled By: patapizza

fbshipit-source-id: 51169c63558b5a35f16bdaaada9649c88a4d2994
2018-01-05 11:45:40 -08:00
Abdallatif Sulaiman
1393098bcc Added Time Dimension to Arabic
Summary:
Hi, in this pr:
* Added time dimension to Arabic language, thanks to Hussein-Dahir & Yazeed-Obaid for writing time corpus.
* Fixed some bugs in numeral & ordinals and added more test cases for them.

Also, I don't really understand why do we use classifiers in time dimension?
Closes https://github.com/facebook/duckling/pull/123

Reviewed By: blandinw

Differential Revision: D6583313

Pulled By: patapizza

fbshipit-source-id: f7acdef0c032d7b7fd7d224832fdaf484d2df825
2017-12-19 14:30:42 -08:00
Newinfinite007
c133bad24a Hindi Language Numeral Dimension(minimalistic model). Tests passed.
Summary: Closes https://github.com/facebook/duckling/pull/119

Reviewed By: JonCoens

Differential Revision: D6597628

Pulled By: patapizza

fbshipit-source-id: 8bac0f686d6cecc38d9998e37042fe48f73530dc
2017-12-19 13:15:30 -08:00
Henri Dwyer
15b3988ec5 update MY pattern match to HashMap
Summary: Create hashmaps for the pattern matching rules, and make the regex matching read the value from the hashmap.

Reviewed By: patapizza

Differential Revision: D6592508

fbshipit-source-id: 0f4dd9bc548f71d0e6e52c938bd2e39bfd909c01
2017-12-18 13:00:35 -08:00
Henri Dwyer
10db72fd05 update KO pattern match to HashMap
Summary: Create hashmaps for the pattern matching rules, and make the regex matching read the value from the hashmap.

Reviewed By: panagosg7

Differential Revision: D6536790

fbshipit-source-id: ecb0bf28cf22b82ef10c3c2408eda15409189431
2017-12-15 11:45:28 -08:00
Julien Odent
fb3979d257 Numeral: move ruleFractions to common + cleanup numeral flag for Time
Summary:
* Closes PRs #115, #116.
* `Numeral`: Moves `ruleFractions` to common
* `Numeral/PT`: fixes dot in regex
* `Time`: Moves `isNumeralSafeToUse` to `isIntegerBetween` (more robust and cleaner)
* `Time/EN`: refactored `ruleCycleLastNextN` to `ruleDurationLastNext` (so it also accounts for non-safe Numerals, like "in a few days")

Reviewed By: blandinw

Differential Revision: D6502958

fbshipit-source-id: 6d9c08a23dab88f441aadade08d663c8e0da415d
2017-12-15 10:15:31 -08:00
Julien Odent
6df3b26707 Numeral: common rule + supporting hindu-arabic numerals for Burmese
Summary:
* `ruleIntegerNumeric` was used in all languages but Burmese.
* it seems like the hindu-arabic numerals are slowly getting in Burmese (e.g. recent car plates)
* Moving the rule in `Duckling/Numeral/Common.hs`

Reviewed By: blandinw

Differential Revision: D6498349

fbshipit-source-id: e868dc9960f18f0781e4aa98a0dfcd14969537c9
2017-12-06 16:00:28 -08:00
Genki Furumi
56b57df153 HashMap lookups for large regexes
Summary: Replaced pattern matching with Hashmap.

Reviewed By: patapizza

Differential Revision: D6493427

fbshipit-source-id: 5ae9370387a738423724cabbe7c0d44c4889e185
2017-12-06 11:00:38 -08:00
Alex Torres
498e8b16e6 fix pt rules for numeral
Summary:
When I write "dois mil e duzentos" the result should be 2200, but duckling recognize the numbers separated and give the result:

`[{"dim":"number","body":"dois","value":{"value":2,"type":"value"},"start":0,"end":4},{"dim":"number","body":"mil","value":{"value":1000,"type":"value"},"start":5,"end":8},{"dim":"time","body":"mil","value":{"values":[],"value":"1000-01-01T00:00:00.000-07:53","grain":"year","type":"value"},"start":5,"end":8},{"dim":"number","body":"duzentos","value":{"value":200,"type":"value"},"start":11,"end":19}]`

Now with this commit, duckling gives the correct result:

`[{"dim":"number","body":"dois mil e duzentos","value":{"value":2200,"type":"value"},"start":0,"end":19}]`
Closes https://github.com/facebook/duckling/pull/117

Reviewed By: blandinw

Differential Revision: D6477925

Pulled By: patapizza

fbshipit-source-id: 26ab503cc8def739c51ceb5bae7546016ba65ad6
2017-12-04 18:00:36 -08:00
Panagiotis Vekris
12a726aee7 Support for Greek times and dates
Summary:
This adds support for greek times.

There are still some issues with expressions of the form:
```
9:30 - 11:00 την πέμπτη
```
Where `11:00 την πέμπτη` is parsed first (as 11:30 on Thu), instead of prioritizing `9:30 - 11:00` as the training data suggests. These tests are for the moment excluded from the corpus.

Reviewed By: patapizza

Differential Revision: D6376271

fbshipit-source-id: 2f31e058fb88386429070e3b51cd33f93b9c5936
2017-12-04 16:45:40 -08:00
Julien Odent
12000c2f86 Numeral/GA: fix old vigesimal + refactoring
Summary: Merging rules in two, fixing production rule.

Reviewed By: JonCoens, blandinw

Differential Revision: D6466793

fbshipit-source-id: 0fa143b049e9ccce7f4946e14bbd15fc37c324e1
2017-12-01 18:15:31 -08:00
Jana Šefčíková
7fe748ffec Numeral/GA: HashMap lookups for large regexes
Summary: Use hashMap oneToTenMap instead of guard.

Reviewed By: patapizza

Differential Revision: D6414352

fbshipit-source-id: a1788df0d3521a3a870e453294708d9fe2c10908
2017-12-01 10:15:24 -08:00
Julien Odent
54ccbf81df Numeral/DE: fix keiner + cleanup
Summary:
* fixed keine[rn]
* removed redundant `ruleInteger`
* replaced pattern matching with hashmap lookup
* renamed `dozensMap` to `tensMap`

Reviewed By: blandinw

Differential Revision: D6336252

fbshipit-source-id: 740734ab7b0b289adc4f466f966c4c5e59af75ad
2017-11-15 13:15:29 -08:00
igor-drozdov
29d776dee5 Added TimeGrain and Duration Dimensions to Russian language
Summary:
- Added Duration dimension to Russian language
- Added TimeGrain dimension to Russian language
- Refactored isNatural and isNaturalWith out of Duration helpers into Numeral helpers
- Implemented <integer> and a half rule for Russian Numeral
- Changed the type of inSeconds to polymorphic one
Closes https://github.com/facebook/duckling/pull/105

Reviewed By: blandinw

Differential Revision: D6312604

Pulled By: patapizza

fbshipit-source-id: 9ae237b4beb6915ff8da013230457937d8e56733
2017-11-15 10:45:24 -08:00
Panagiotis Vekris
e8937e1cd6 Support for Greek durations
Summary: Adding support for Greek time grains and durations.

Reviewed By: patapizza

Differential Revision: D6249955

fbshipit-source-id: 1c69e26
2017-11-06 18:49:36 -08:00
Julien Odent
ba46d592cd Prevent double negatives + cleanup
Summary:
* Prevent double negatives (resulting from `ruleNegative` applying twice and from engine tokenizer)
* Hashmap lookup for tens
* cleanup

Reviewed By: blandinw

Differential Revision: D6221107

fbshipit-source-id: 42e401d
2017-11-03 00:31:22 -07:00
Panagiotis Vekris
fda8c7c759 Support Greek numerals
Summary:
- Setup Greek language (EL)
- Added Greek Numerals

Reviewed By: patapizza

Differential Revision: D6217873

fbshipit-source-id: 379170f
2017-11-02 17:16:18 -07:00
Jeffrey Karres
67190c4238 HashMap lookups for large regexes
Summary: Replacing guards with hashmap

Reviewed By: patapizza

Differential Revision: D6192748

fbshipit-source-id: 644825b
2017-10-30 18:04:29 -07:00
Aleksey Suslov
df6484945a Few improvements for Numeral/RU
Summary:
- fix typos in 11, 300 and 400 forms
- add more 0 rules
- introduce 1.5 rules
Closes https://github.com/facebookincubator/duckling/pull/96

Differential Revision: D6136073

Pulled By: patapizza

fbshipit-source-id: b1252f2
2017-10-24 10:49:27 -07:00
Julien Odent
ed58115caf Numeral: refactor 'Text.singleton' usages
Summary:
* refactored `Text.singleton` usages into `Text` literals
* removed redundant `join` imports with `NoRebindableSyntax` language pragma
* ET: merged 2 rules into one

Reviewed By: blandinw

Differential Revision: D6080231

fbshipit-source-id: 47c18df
2017-10-17 13:19:39 -07:00
Julien Odent
b2de97800f Numeral/FR: allow space as a thousand separator
Summary: Fixes #91.

Reviewed By: blandinw

Differential Revision: D6079821

fbshipit-source-id: f3160c1
2017-10-17 12:20:32 -07:00
Matthijs Mullender
33a08bb76b Support Dutch Durations
Summary:
This change adds support for durations in Dutch/Netherlands (NL)
Implemented: TimeGrain/NL, Durations/NL

Reviewed By: patapizza

Differential Revision: D6049404

fbshipit-source-id: 3621cdb
2017-10-13 12:49:30 -07:00
Julien Odent
ab0ad0256e Locales support
Summary:
* Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33)
* `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations
* API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility.
*  `Rules/<Lang>.hs` exposes
  - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs`
  - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs`
  - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility
* Corpus, tests & classifiers
  - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible)
  - Default classifiers are built on existing corpus
  - Locale classifiers are built on
  - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>`
  - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`).
  - Locale classifiers use the language corpus extended with the locale examples as training set.
  - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus.
  - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`.
  - Tests run against each classifier to make sure runtime works as expected.
* MM/DD (en_US) vs DD/MM (en_GB) example to illustrate

Reviewed By: JonCoens, blandinw

Differential Revision: D6038096

fbshipit-source-id: f29c28d
2017-10-13 08:34:21 -07:00
Ian Stewart-Binks
2b566eeac0 Numeral/JA: HashMap lookups for large regexes
Summary: Replaced pattern matching with Hashmap. Also, removed ruleInteger17 and moved its regex to ruleInteger.

Reviewed By: patapizza

Differential Revision: D5812629

fbshipit-source-id: f0c1a06
2017-10-06 10:34:33 -07:00
Julien Odent
83ea150d94 Convert back escaped characters in rules
Summary:
We noticed that using UTF-8 characters directly in regexes work.
Hence converting back the escaped characters for readability and maintenance.

Reviewed By: blandinw

Differential Revision: D5787146

fbshipit-source-id: e5a4b9a
2017-09-07 12:49:33 -07:00
Stepan Parunashvili
6f774abe38 georgian numeral support
Summary: Introducing Georgian (KA), and the very beginnings of numeral support

Reviewed By: patapizza

Differential Revision: D5757952

fbshipit-source-id: 89d05f8
2017-09-05 12:19:29 -07:00
dubovinszky
24d3f19976 HU Setup + Numeral
Summary:
- Setup Hungarian (HU) language
- Added Numeral Dimension
Closes https://github.com/facebookincubator/duckling/pull/79

Reviewed By: blandinw

Differential Revision: D5595812

Pulled By: patapizza

fbshipit-source-id: 5959938
2017-08-09 17:49:56 -07:00
Veselin Stoyanov
5d03b45af9 Setup Bulgarian language and Numeral Dimension
Summary:
- Setup Bulgarian (BG) language
- Added Numeral Dimension
Closes https://github.com/facebookincubator/duckling/pull/78

Reviewed By: niteria

Differential Revision: D5575513

Pulled By: patapizza

fbshipit-source-id: e566155
2017-08-09 08:19:24 -07:00
Julien Odent
bfb6ba0387 Numeral flag for Time patterns
Summary:
Today things like `at single`, `at a few`, `at a couple of` would return a `Time`.
Discussed with blandinw to do this very explicit hack right now until other use cases show up.

Reviewed By: niteria

Differential Revision: D5325369

fbshipit-source-id: aec0402
2017-06-27 07:34:21 -07:00
Andrew Farmer
068e23db13 Prepare for DuplicateRecordFields
Summary:
The one restriction on using DuplicateRecordFields is that record
selectors have to be imported under their constructor, instead of as
top-level functions. Do this for si_sigma so D5242707 passes the compat
check.

Reviewed By: watashi

Differential Revision: D5326634

fbshipit-source-id: 74ec0dd
2017-06-26 22:34:21 -07:00
André
3ec5390ac2 numerals between 100 and 999 in Portuguese fixed
Summary:
I fixed some bugs I found in Portuguese. This is my first attempt to contribute so let me know if there's any thing I could do better next time! thanks! awesome project!
Closes https://github.com/facebookincubator/duckling/pull/56

Differential Revision: D5318968

Pulled By: patapizza

fbshipit-source-id: 94ff30f
2017-06-26 08:19:28 -07:00
Daniel Rodríguez
36808e6086 HashMap lookups for large regexes.
Summary:
Transform large case matches into HashMap lookups.

Add an extra example for a rule set that wasn't tested before.

Reviewed By: patapizza

Differential Revision: D5253349

fbshipit-source-id: 303dbca
2017-06-19 11:34:18 -07:00
Anand Bhaskar
b8277411e7 Refactor rule 'number.number hours'
Summary: Created a helper for the rule to reuse across languages.

Reviewed By: patapizza

Differential Revision: D5189741

fbshipit-source-id: 7b4dcd4
2017-06-06 09:34:22 -07:00
Mohankumar Dhayalan
21c9b8ed7a HashMap lookups for large regexes
Summary: Added Hashmap lookups for Regex for Numeral/ID

Reviewed By: patapizza

Differential Revision: D5128492

fbshipit-source-id: 5ab928b
2017-05-25 11:04:18 -07:00
Şeref R.Ayar
69ce841710 Comma as decimal mark for Numeral TR
Summary: Closes https://github.com/facebookincubator/duckling/pull/28

Differential Revision: D5120967

Pulled By: patapizza

fbshipit-source-id: 41a5e4b
2017-05-24 09:04:17 -07:00
Julien Odent
3b64603d81 Don't allow 0 as fraction denominator
Summary:
"1/0" was returning "null" -- this is not a valid fraction.
Now "1/0" returns 2 Numeral.

Reviewed By: niteria

Differential Revision: D5037579

fbshipit-source-id: 70fa4c9
2017-05-11 12:04:20 -07:00
Julien Odent
37829902b7 CS: Setup + basic Numeral
Summary:
* Setup for Czech
* Basic `Numeral` (0-10 integers + digits) from http://www.omniglot.com/language/numbers/czech.htm

Reviewed By: JonCoens

Differential Revision: D5044775

fbshipit-source-id: b5cd9d2
2017-05-11 09:49:27 -07:00
Matt Schultz
ff9b54ad43 Added English fractional Numeral rule (ex: "3/4", "1/2", "5/7")
Summary:
Also added real-world test to English `Quantity` corpus ("3/4 cup", as a culinary example)
Closes https://github.com/facebookincubator/duckling/pull/14

Reviewed By: patapizza

Differential Revision: D5035990

Pulled By: niteria

fbshipit-source-id: c1b8f65
2017-05-10 07:04:16 -07:00
Julien Odent
5ba2c9e9a1 NB: Bringing latest changes
Summary:
* Numeral: fixed "hundre" (not "hundred")
* Numeral: added "tretti", "søtti"
* Time: updated last times to support "sist"
* Time: christmas days

Reviewed By: niteria

Differential Revision: D4958919

fbshipit-source-id: e4eecf5
2017-04-28 08:04:22 -07:00