Summary:
Hello,
I am new to Haskell, but I would like to add Khmer language (KM) to Duckling.
I have tried to extended Duckling by adding Numeral dimension for new language KM.
Please have a look at it and see what we can improve.
Thanks!
Pull Request resolved: https://github.com/facebook/duckling/pull/234
Reviewed By: blandinw
Differential Revision: D9032639
Pulled By: chinmay87
fbshipit-source-id: 7db19edf732fe6500629cc89e18e0655d7bbc48b
Summary:
Some small improvements to the Norwegian implementation:
- The written number 8 had a typo: "otte" -> "åtte"
- Add support for half an hour before as e.g. "halv to"
- Add support for alternative clock denotation "klokka"
- Add support for alternative tomorrow denotation "i morra"
Pull Request resolved: https://github.com/facebook/duckling/pull/232
Reviewed By: girifb
Differential Revision: D8986441
Pulled By: chinmay87
fbshipit-source-id: 286617d30415febe1f0eda4bc7475ca5c9610734
Summary:
This is causing some issues, e.g. `20-30` resolving to 8:30pm (latent).
Updating `Numeral` rules to account for that (`EN`, `FR`, `NL` following tests).
Differential Revision: D8854891
fbshipit-source-id: ba17099b014d9cf2f48a7d85147cc890b02578f5
Summary:
Support amount of money intervals for VI
Modify ruleNg, ruleDollar, and ruleVND to better capture the usage of VI
In Numeral, add "ngàn" - a common synonym of "nghìn", and "chục" - colloquially used to count tens. Remove "?" in regex of other words as it does not make sense.
Reviewed By: haoxuany
Differential Revision: D8734066
fbshipit-source-id: 15f879ab796025882c85f0ce9f1677c501b364a0
Summary: In Romanian, for numerals above 20, we say "20 de milioane", not "20 milioane".
Reviewed By: haoxuany
Differential Revision: D8334109
fbshipit-source-id: a7fc83440334ab9b1f0511f315029e28449f9771
Summary: Added support for intervals (between X and Y), min, max and approximates for currency in RO.
Reviewed By: patapizza
Differential Revision: D8322948
fbshipit-source-id: 462cfd5575c87e757d2c35c8078f539af3e8150f
Summary:
* One rule to rule them all, to easily duplicate with "de" suffix.
* This would actually allow things like "4 de metri", though it's fine as it doesn't alter meaning.
Reviewed By: haoxuany
Differential Revision: D8325328
fbshipit-source-id: 3ec0f30431f3cb00152cc9509f6052b1ae29cd08
Summary:
Support custom dimensions
Had to move the definition of `Dimension` from `Duckling.Dimensions.Types` to `Duckling.Types` to avoid cyclic imports between these two modules.
A sample custom dimension is in `exe/CustomDimensionExample.hs`.
Limitations of custom dimensions:
- All rules for a custom dimension must be in the same module with the definition of the custom dimension. Otherwise there will be cyclic imports, because the definition of the dimension and the rules refer to each other.
- The custom dimension must be specified when using `parse`, since there's no way to get all the existing custom dimensions.
Reviewed By: patapizza
Differential Revision: D7630360
fbshipit-source-id: 30e12dcb33611f5692c4f5949de377bf61b75e1e
Summary:
Numerals that require intersection with negative numbers don't work, since the negative sign gets parsed
before the intersect rules happen. This fixes it by adding a guard in for positive in the intersection rule.
Reviewed By: patapizza
Differential Revision: D7592225
fbshipit-source-id: 2bc9c708cadeea4012c1f3ef487c61a144325f2a
Summary:
generalize chinese digit specifier (十百千万亿) parsing, and add hanzi tests
These digits specifiers can be parsed as (<num><speci>)<num>,
by using the multiplicater value <num><speci>, and a connect function that adds them together
(two cases, skipping digits [which requires a 零 in between], and digits in consecutive locations).
Note that 个 is technically a digit specifier,
but in Chinese it is never used directly as a numeral specifier, and always as a counter.
Reviewed By: zliu41
Differential Revision: D7424249
fbshipit-source-id: 20a85a7df1f908ee9879e92b904178fa26a9a5e5
Summary: Add an option to return latent time entities. This can be used when one is pretty certain that the input contains a datetime.
Reviewed By: patapizza
Differential Revision: D7254245
fbshipit-source-id: e9e0503cace2691804056fcebdc18fd9090fb181
Summary:
Before, duckling would parse `"1 2 2"` as the three digit number 122 through
`ruleSkipHundreds`. This, however, allowed the string `"Pay Kiran1 10eur"` to be
parsed as "110 EUR", which was reported in https://github.com/facebook/duckling/issues/159.
This change only accepts alphabetic numerals to pass through `ruleSkipHundreds`
(for example `"one twenty two"`), which presumably was the original intention of
this rule. This fixes the above issue, without any change in Corpus.
Reviewed By: patapizza
Differential Revision: D7151934
fbshipit-source-id: 024a7a0b6a53beb3a0d42d4bb7f542ce3b05726b
Summary:
* added `isMultipliable` helper and used that in patterns along with `isPositive`
* fixed double negatives in most languages
Reviewed By: niteria
Differential Revision: D7034982
fbshipit-source-id: a0bb67056d3107167830ece0c34d761c5563c5a7
Summary:
Hi, in this pr:
* Added time dimension to Arabic language, thanks to Hussein-Dahir & Yazeed-Obaid for writing time corpus.
* Fixed some bugs in numeral & ordinals and added more test cases for them.
Also, I don't really understand why do we use classifiers in time dimension?
Closes https://github.com/facebook/duckling/pull/123
Reviewed By: blandinw
Differential Revision: D6583313
Pulled By: patapizza
fbshipit-source-id: f7acdef0c032d7b7fd7d224832fdaf484d2df825
Summary: Create hashmaps for the pattern matching rules, and make the regex matching read the value from the hashmap.
Reviewed By: patapizza
Differential Revision: D6592508
fbshipit-source-id: 0f4dd9bc548f71d0e6e52c938bd2e39bfd909c01
Summary: Create hashmaps for the pattern matching rules, and make the regex matching read the value from the hashmap.
Reviewed By: panagosg7
Differential Revision: D6536790
fbshipit-source-id: ecb0bf28cf22b82ef10c3c2408eda15409189431
Summary:
* Closes PRs #115, #116.
* `Numeral`: Moves `ruleFractions` to common
* `Numeral/PT`: fixes dot in regex
* `Time`: Moves `isNumeralSafeToUse` to `isIntegerBetween` (more robust and cleaner)
* `Time/EN`: refactored `ruleCycleLastNextN` to `ruleDurationLastNext` (so it also accounts for non-safe Numerals, like "in a few days")
Reviewed By: blandinw
Differential Revision: D6502958
fbshipit-source-id: 6d9c08a23dab88f441aadade08d663c8e0da415d
Summary:
* `ruleIntegerNumeric` was used in all languages but Burmese.
* it seems like the hindu-arabic numerals are slowly getting in Burmese (e.g. recent car plates)
* Moving the rule in `Duckling/Numeral/Common.hs`
Reviewed By: blandinw
Differential Revision: D6498349
fbshipit-source-id: e868dc9960f18f0781e4aa98a0dfcd14969537c9
Summary:
When I write "dois mil e duzentos" the result should be 2200, but duckling recognize the numbers separated and give the result:
`[{"dim":"number","body":"dois","value":{"value":2,"type":"value"},"start":0,"end":4},{"dim":"number","body":"mil","value":{"value":1000,"type":"value"},"start":5,"end":8},{"dim":"time","body":"mil","value":{"values":[],"value":"1000-01-01T00:00:00.000-07:53","grain":"year","type":"value"},"start":5,"end":8},{"dim":"number","body":"duzentos","value":{"value":200,"type":"value"},"start":11,"end":19}]`
Now with this commit, duckling gives the correct result:
`[{"dim":"number","body":"dois mil e duzentos","value":{"value":2200,"type":"value"},"start":0,"end":19}]`
Closes https://github.com/facebook/duckling/pull/117
Reviewed By: blandinw
Differential Revision: D6477925
Pulled By: patapizza
fbshipit-source-id: 26ab503cc8def739c51ceb5bae7546016ba65ad6
Summary:
This adds support for greek times.
There are still some issues with expressions of the form:
```
9:30 - 11:00 την πέμπτη
```
Where `11:00 την πέμπτη` is parsed first (as 11:30 on Thu), instead of prioritizing `9:30 - 11:00` as the training data suggests. These tests are for the moment excluded from the corpus.
Reviewed By: patapizza
Differential Revision: D6376271
fbshipit-source-id: 2f31e058fb88386429070e3b51cd33f93b9c5936
Summary:
- Added Duration dimension to Russian language
- Added TimeGrain dimension to Russian language
- Refactored isNatural and isNaturalWith out of Duration helpers into Numeral helpers
- Implemented <integer> and a half rule for Russian Numeral
- Changed the type of inSeconds to polymorphic one
Closes https://github.com/facebook/duckling/pull/105
Reviewed By: blandinw
Differential Revision: D6312604
Pulled By: patapizza
fbshipit-source-id: 9ae237b4beb6915ff8da013230457937d8e56733
Summary:
* Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33)
* `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations
* API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility.
* `Rules/<Lang>.hs` exposes
- `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs`
- `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs`
- `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility
* Corpus, tests & classifiers
- 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible)
- Default classifiers are built on existing corpus
- Locale classifiers are built on
- `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>`
- `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`).
- Locale classifiers use the language corpus extended with the locale examples as training set.
- Locale examples need to use the same `Context` (i.e. reference time) as the language corpus.
- For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`.
- Tests run against each classifier to make sure runtime works as expected.
* MM/DD (en_US) vs DD/MM (en_GB) example to illustrate
Reviewed By: JonCoens, blandinw
Differential Revision: D6038096
fbshipit-source-id: f29c28d
Summary: Replaced pattern matching with Hashmap. Also, removed ruleInteger17 and moved its regex to ruleInteger.
Reviewed By: patapizza
Differential Revision: D5812629
fbshipit-source-id: f0c1a06
Summary:
We noticed that using UTF-8 characters directly in regexes work.
Hence converting back the escaped characters for readability and maintenance.
Reviewed By: blandinw
Differential Revision: D5787146
fbshipit-source-id: e5a4b9a