Commit Graph

89 Commits

Author SHA1 Message Date
Newinfinite007
c133bad24a Hindi Language Numeral Dimension(minimalistic model). Tests passed.
Summary: Closes https://github.com/facebook/duckling/pull/119

Reviewed By: JonCoens

Differential Revision: D6597628

Pulled By: patapizza

fbshipit-source-id: 8bac0f686d6cecc38d9998e37042fe48f73530dc
2017-12-19 13:15:30 -08:00
Julien Odent
fb3979d257 Numeral: move ruleFractions to common + cleanup numeral flag for Time
Summary:
* Closes PRs #115, #116.
* `Numeral`: Moves `ruleFractions` to common
* `Numeral/PT`: fixes dot in regex
* `Time`: Moves `isNumeralSafeToUse` to `isIntegerBetween` (more robust and cleaner)
* `Time/EN`: refactored `ruleCycleLastNextN` to `ruleDurationLastNext` (so it also accounts for non-safe Numerals, like "in a few days")

Reviewed By: blandinw

Differential Revision: D6502958

fbshipit-source-id: 6d9c08a23dab88f441aadade08d663c8e0da415d
2017-12-15 10:15:31 -08:00
Julien Odent
6df3b26707 Numeral: common rule + supporting hindu-arabic numerals for Burmese
Summary:
* `ruleIntegerNumeric` was used in all languages but Burmese.
* it seems like the hindu-arabic numerals are slowly getting in Burmese (e.g. recent car plates)
* Moving the rule in `Duckling/Numeral/Common.hs`

Reviewed By: blandinw

Differential Revision: D6498349

fbshipit-source-id: e868dc9960f18f0781e4aa98a0dfcd14969537c9
2017-12-06 16:00:28 -08:00
Alex Torres
498e8b16e6 fix pt rules for numeral
Summary:
When I write "dois mil e duzentos" the result should be 2200, but duckling recognize the numbers separated and give the result:

`[{"dim":"number","body":"dois","value":{"value":2,"type":"value"},"start":0,"end":4},{"dim":"number","body":"mil","value":{"value":1000,"type":"value"},"start":5,"end":8},{"dim":"time","body":"mil","value":{"values":[],"value":"1000-01-01T00:00:00.000-07:53","grain":"year","type":"value"},"start":5,"end":8},{"dim":"number","body":"duzentos","value":{"value":200,"type":"value"},"start":11,"end":19}]`

Now with this commit, duckling gives the correct result:

`[{"dim":"number","body":"dois mil e duzentos","value":{"value":2200,"type":"value"},"start":0,"end":19}]`
Closes https://github.com/facebook/duckling/pull/117

Reviewed By: blandinw

Differential Revision: D6477925

Pulled By: patapizza

fbshipit-source-id: 26ab503cc8def739c51ceb5bae7546016ba65ad6
2017-12-04 18:00:36 -08:00
Panagiotis Vekris
12a726aee7 Support for Greek times and dates
Summary:
This adds support for greek times.

There are still some issues with expressions of the form:
```
9:30 - 11:00 την πέμπτη
```
Where `11:00 την πέμπτη` is parsed first (as 11:30 on Thu), instead of prioritizing `9:30 - 11:00` as the training data suggests. These tests are for the moment excluded from the corpus.

Reviewed By: patapizza

Differential Revision: D6376271

fbshipit-source-id: 2f31e058fb88386429070e3b51cd33f93b9c5936
2017-12-04 16:45:40 -08:00
Julien Odent
fb10a6e6ba Time/EN: Parse spelled out times + AM/PM
Summary:
When using speech recognition, we might see things like "six thirty six a m" or
"ten thirty p m".
Also fixed the argument order of `timeOfDayAMPM` to be more idiomatic.

Reviewed By: blandinw

Differential Revision: D6316542

fbshipit-source-id: 0008c049040219b3a1dd80d9e4661ba8a246fa7f
2017-11-14 13:30:26 -08:00
Julien Odent
436e4662d9 Time/NL: don't be too eager on days of week
Summary:
"d" would parse as "dinsdag" (Tuesday), "zo" would parse as "zondag" (Sunday, also means "so").
Made dots mandatory, to prevent further issues (e.g. "zon" means "sun").

Reviewed By: mullender

Differential Revision: D6312693

fbshipit-source-id: 58c5824e3ff174fc9c293c3f2d13e152c60e51de
2017-11-13 09:00:40 -08:00
Panagiotis Vekris
fda8c7c759 Support Greek numerals
Summary:
- Setup Greek language (EL)
- Added Greek Numerals

Reviewed By: patapizza

Differential Revision: D6217873

fbshipit-source-id: 379170f
2017-11-02 17:16:18 -07:00
Julien Odent
f0a0c1e6b8 Time/EN: don't parse "this in 2 minutes" + fix thanksgiving in EN locales
Summary:
* add flag for this/next/last time
* fix thanskgiving in EN locales
* `analyzedRangeTest` helper with `rangeTests` for `Time/EN`

Reviewed By: blandinw

Differential Revision: D6191209

fbshipit-source-id: 6eaa117
2017-10-31 12:34:21 -07:00
Sam Pepose
251028e691 Uses correct date format for all EN locales
Summary: The date format changes between EN locales (https://en.wikipedia.org/wiki/Date_format_by_country). This diff fixes how dates are handled in each locale.

Reviewed By: patapizza

Differential Revision: D6156147

fbshipit-source-id: 22f296c
2017-10-26 08:49:21 -07:00
Matthijs Mullender
1ade1935b2 Support Dutch dates and times
Summary: [Duckling][Time][NL] Support Dutch dates and times

Reviewed By: patapizza

Differential Revision: D6090294

fbshipit-source-id: 54b8729
2017-10-19 14:04:38 -07:00
Julien Odent
3fa0988fcd Time/GB: fix parsing Oct-Dec months
Summary: Regex was happy with first option.

Reviewed By: blandinw

Differential Revision: D6092130

fbshipit-source-id: 0f89f22
2017-10-18 15:34:43 -07:00
Fredrik Wallén
1b176c3665 Added support for understanding Swedish last <day-of-the-week> expression
Summary:
One of the most common ways of saying "last \<day-of-the-week\>" in Swedish is saying "i \<day-of-the-week\>s", for example "i tisdags" (last Tuesday) or "i lördags" (last Saturday). This pull request adds support for this.
Closes https://github.com/facebookincubator/duckling/pull/92

Reviewed By: blandinw, kodafb

Differential Revision: D6064814

Pulled By: patapizza

fbshipit-source-id: 6ea5466
2017-10-16 12:34:20 -07:00
Julien Odent
305358b2f7 Time/PT: don't parse 'ter'
Summary:
`ter` also means `to have`, which is very common.
`ter` is a very rare form for `Tuesday`, only used in calendar-like contexts

Reviewed By: blandinw

Differential Revision: D6066506

fbshipit-source-id: c8cd231
2017-10-16 12:04:25 -07:00
Julien Odent
1ab5f447d2 en_CA + fix Canadian Thanksgiving
Summary:
* `en_CA` locale
* In Canada, Thanksgiving Day is the second Monday of October.
* Black Friday is the same as the US.
* However Canada observes both DDMM and MMDD formats. Defer to later, falling back to US.

Reviewed By: blandinw

Differential Revision: D6058909

fbshipit-source-id: 3d4e05e
2017-10-16 10:04:43 -07:00
Julien Odent
fb1dcaa138 Chinese locales + fix TW National Day
Summary:
* Moving `ruleNationalDay` from `ZH` rules to specific locales: `zh_CN`, `zh_HK`, `zh_MO`
* Fixed National Day for `zh_TW`.

Reviewed By: blandinw

Differential Revision: D6057565

fbshipit-source-id: 8f9f2ab
2017-10-13 17:04:43 -07:00
Julien Odent
ab0ad0256e Locales support
Summary:
* Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33)
* `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations
* API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility.
*  `Rules/<Lang>.hs` exposes
  - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs`
  - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs`
  - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility
* Corpus, tests & classifiers
  - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible)
  - Default classifiers are built on existing corpus
  - Locale classifiers are built on
  - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>`
  - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`).
  - Locale classifiers use the language corpus extended with the locale examples as training set.
  - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus.
  - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`.
  - Tests run against each classifier to make sure runtime works as expected.
* MM/DD (en_US) vs DD/MM (en_GB) example to illustrate

Reviewed By: JonCoens, blandinw

Differential Revision: D6038096

fbshipit-source-id: f29c28d
2017-10-13 08:34:21 -07:00
Aleka Cheung
ddefc949e2 Consolidate days of weeks and months
Summary: added ruleDaysofWeek and ruleMonths in the Norwegian rules

Reviewed By: patapizza

Differential Revision: D6035360

fbshipit-source-id: 90bd67f
2017-10-12 18:34:33 -07:00
Julien Odent
b3fb913a23 Time: don't parse subsequent numbers
Summary:
- Fixes #89. "<number> <number>" would parse as Time if in the right range.
- Applied same rule for all languages. Note that for Italian and Polish, I updated "<hour> <minute>" tests to be in the form "at <hour> <minute>".
- Replaced `liftM2` with more generic `and|or . sequence [f1, f2, ...]`.

Reviewed By: blandinw

Differential Revision: D5992879

fbshipit-source-id: 5409ffb
2017-10-06 12:19:29 -07:00
Julien Odent
fd77036a72 Regen classifiers
Summary: Regenerating classifiers with latest code.

Reviewed By: blandinw

Differential Revision: D5987906

fbshipit-source-id: 16f2c41
2017-10-05 12:19:44 -07:00
Brock Overcash
8d4b103b10 Consolidate HE named months
Summary: Consolidate HE language named-months in the same style of other language, particularly EN. Translations were taken from previous namedMonth# rules and merged into the combined rule. Hebrew-speaking verification would be helpful to verify the integrity of the translations, although they were already taken from the existing code.

Reviewed By: blandinw

Differential Revision: D5839625

fbshipit-source-id: d53f9c7
2017-09-26 16:19:50 -07:00
Julien Odent
1eea25049f Time/RO: don't parse 'sa' as Saturday
Summary: In Romanian, `sa` is fairly common: hai sa ne vedem (let's see), hai sa mergem (let's go).

Reviewed By: blandinw

Differential Revision: D5801345

fbshipit-source-id: db677e4
2017-09-09 11:04:57 -07:00
Julien Odent
b954380937 ES/Ordinal: Fixes + tests
Summary:
* fixes '1st' variants (e.g. primeros, primera)
* fixes accents

Reviewed By: JonCoens

Differential Revision: D5772079

fbshipit-source-id: 6a09d79
2017-09-06 10:19:31 -07:00
Stepan Parunashvili
6f774abe38 georgian numeral support
Summary: Introducing Georgian (KA), and the very beginnings of numeral support

Reviewed By: patapizza

Differential Revision: D5757952

fbshipit-source-id: 89d05f8
2017-09-05 12:19:29 -07:00
Yao Xiao
434e88b511 Consolidate days of week and months
Summary:
Remove hardcoded named-days and named-months, and
replace them with ruleDaysOfWeek and ruleMonths.

Reviewed By: patapizza

Differential Revision: D5742209

fbshipit-source-id: 339fc0a
2017-09-01 13:49:36 -07:00
Kevin Doherty
c41b71c665 Consolidate days of the week and months for GA
Summary:
The Galeic ruleset has 12 separate rules for months, and 7 for days. This
change replaces those with a list of months/days and a single function
to create a list of rules from those. This is the same approach as is currently in the English ruleset.

Reviewed By: patapizza

Differential Revision: D5756222

fbshipit-source-id: ac4bc42
2017-09-01 12:34:29 -07:00
Henry Swanson
b62be42077 Consolidate other times in DE ruleset
Summary:
Combined each of seasons, instants, and holidays into a data list and a
function to generate the list of Rules.

*Instants = today, tomorrow, now, end of year, etc.

Reviewed By: patapizza

Differential Revision: D5730896

fbshipit-source-id: 23170e7
2017-08-29 16:34:30 -07:00
Henry Swanson
96432e5b7d Consolidate days of the week and months for DE
Summary:
The German ruleset has 12 separate rules for months, and 7 for days. This
change replaces those with a list of months/days and a single function
to create a list of rules from those. This is the same approach as is currently in the English ruleset.

Reviewed By: patapizza

Differential Revision: D5728656

fbshipit-source-id: 8590f4a
2017-08-29 14:49:39 -07:00
Caren Thomas
5d9b774b9d Consolidate days of week and months for Swedish rules
Summary: Consolidated all the days of week rules into one rule, and did the same for all the month rules.

Reviewed By: patapizza

Differential Revision: D5721202

fbshipit-source-id: 2b4a56f
2017-08-28 16:04:35 -07:00
Dana Thomas
0d387c0775 Consolidating days of week and names of months into separate rules for French Duckling
Summary: Consolidated the previous days of weeks and month names in french duckling file to become only 2 rules. Allows for more concise, updated code.

Reviewed By: patapizza

Differential Revision: D5710056

fbshipit-source-id: 816ef88
2017-08-28 11:34:27 -07:00
Andrew Shields
7df488fbe2 reuse ruleDaysOfWeek and ruleMonths in Danish rules
Summary: Changed Danish time rules to use ruleDaysOfWeek and ruleMonths.

Reviewed By: patapizza

Differential Revision: D5709782

fbshipit-source-id: aa03065
2017-08-25 17:04:29 -07:00
Fredrik Wallén
9b5b7bc6ce Corrected ordinals for Swedish and updated tests accordingly
Summary:
There are problems in the ordinal recognition for Swedish. The most severe one is that all the numbers above 15 are actually Danish, not Swedish. Apart from that digits and digits followed by a dot are considered ordinals.

This pull request fixes this and also adds support for ordinals up to 100. The structure of the code is similar to in the ordinal recognition in English. Tests are also updated, both the ordinal tests and the time tests where incorrect ordinals were used.
Closes https://github.com/facebookincubator/duckling/pull/86

Reviewed By: JonCoens

Differential Revision: D5698145

Pulled By: patapizza

fbshipit-source-id: c31d7bc
2017-08-25 17:04:29 -07:00
Margaret Li
b3d10dbf05 consolidated rules for days of week/months in duckling ZH
Reviewed By: patapizza

Differential Revision: D5702961

fbshipit-source-id: 49906d2
2017-08-25 16:34:48 -07:00
Julien Odent
9f856cec48 Time/PT: don't parse 'um' alone
Summary: In Portuguese, "um" means the numeral "one" and the article "a".

Reviewed By: bfiss

Differential Revision: D5703396

fbshipit-source-id: 92ed04f
2017-08-24 21:49:36 -07:00
Matt Lim
faa91d026b Consolidate days of week and months, for Vietnamese rules
Summary:
Remove hardcoded named-days and named-months, and
replace them with ruleDaysOfWeek and ruleMonths.

Reviewed By: patapizza

Differential Revision: D5695475

fbshipit-source-id: d30557f
2017-08-24 10:06:18 -07:00
Julien Odent
2f28e4e33d Time/PL: Don't parse ordinals without context
Summary: "pierwszy" and "drugiej" shouldn't parse as hours without context (e.g. at/until x).

Reviewed By: blandinw

Differential Revision: D5694804

fbshipit-source-id: 40e3eb7
2017-08-24 08:34:33 -07:00
dubovinszky
60565c15aa HU Time, TimeGrain
Summary: Closes https://github.com/facebookincubator/duckling/pull/83

Reviewed By: blandinw

Differential Revision: D5681515

Pulled By: patapizza

fbshipit-source-id: 918d0a4
2017-08-22 19:34:33 -07:00
Joseph Button
ff76927956 Consolidate days of week and months
Summary: Consolidating the rules for months and days of the week in Italian following the pattern seen in English.

Reviewed By: patapizza

Differential Revision: D5665259

fbshipit-source-id: 45d6c3b
2017-08-22 15:04:27 -07:00
Julien Odent
004995b595 Don't allow matches in the middle of words
Summary:
We don't allow matches adjacent to a character of the same class.
We were treating uppercase and lowercase characters differently.
"jon Friday" wouldn't match "on" but "Jon Friday" would.

Reviewed By: blandinw

Differential Revision: D5653681

fbshipit-source-id: be67358
2017-08-17 15:49:26 -07:00
Maury Turay
4a7aacae2f Consolidate days of week and months for Romanian rules
Reviewed By: patapizza

Differential Revision: D5645993

fbshipit-source-id: d2b69a1
2017-08-17 15:34:19 -07:00
Satya Bodduluri
da41db3766 Added ruleIntervalDDDDMonth to EN
Summary: Added ruleIntervalDDDDMonth to EN to handle cases such as "23rd to 26th Oct" and "1-8 september"

Reviewed By: patapizza

Differential Revision: D5637280

fbshipit-source-id: a1fdcd2
2017-08-16 14:34:26 -07:00
Jesse Hellemn
98b58647b1 Adapting Spanish rules to handles names days and months in the same rules
Summary: Moved all named days to the same rule, moved all named months to the same rule. Kept same regexes, just consolidated them.

Reviewed By: patapizza

Differential Revision: D5637061

fbshipit-source-id: e08ecf9
2017-08-16 09:34:24 -07:00
Atyansh Jaiswal
e7431739ec Fixed Ordinal parsing for format "August 27th-30th"
Summary: Changed ruleIntervalMonthDDDD to use the ordinal predicate instead of ugly regex

Reviewed By: patapizza

Differential Revision: D5628188

fbshipit-source-id: 1dbe195
2017-08-15 10:34:37 -07:00
Hiten Parmar
be113689ac Add support for parsing day intervals beginning with from "from 10 to 16 August"
Summary: Added EN rule "ruleIntervalFromDDDDMonth" to support "from 10 to 16 August". Used "isDOMValue" helper rather than regex.

Reviewed By: patapizza

Differential Revision: D5610623

fbshipit-source-id: 00a5208
2017-08-12 02:19:23 -07:00
dubovinszky
24d3f19976 HU Setup + Numeral
Summary:
- Setup Hungarian (HU) language
- Added Numeral Dimension
Closes https://github.com/facebookincubator/duckling/pull/79

Reviewed By: blandinw

Differential Revision: D5595812

Pulled By: patapizza

fbshipit-source-id: 5959938
2017-08-09 17:49:56 -07:00
Veselin Stoyanov
5d03b45af9 Setup Bulgarian language and Numeral Dimension
Summary:
- Setup Bulgarian (BG) language
- Added Numeral Dimension
Closes https://github.com/facebookincubator/duckling/pull/78

Reviewed By: niteria

Differential Revision: D5575513

Pulled By: patapizza

fbshipit-source-id: e566155
2017-08-09 08:19:24 -07:00
Julien Odent
9037126937 [Duckling] Time/DE: don't parse 'nächste 5'
Summary: Fixes https://github.com/wit-ai/wit/issues/694.

Reviewed By: niteria

Differential Revision: D5590023

fbshipit-source-id: 6356615
2017-08-09 06:49:26 -07:00
Julien Odent
61800297c8 Time/PL: don't parse 'nie' as Time
Summary: 'nie' means 'no' in Polish, and isn't a common abbreviation for 'niedziela' (Sunday).

Reviewed By: blandinw

Differential Revision: D5587036

fbshipit-source-id: bfda7fc
2017-08-08 16:49:58 -07:00
Julien Odent
a84eb62180 Time: Fix nthDOWOfMonth
Summary:
Fixes #65.
* fixes US holidays
* Black Friday is actually the first day after Thanksgiving day (not necessary the fourth Friday of November)

Reviewed By: JonCoens

Differential Revision: D5533906

fbshipit-source-id: 1824cba
2017-08-01 09:04:31 -07:00
Julien Odent
ef461c3133 Time/FR: don't parse 'a un'
Summary:
In French, the form "at hh" is not valid (it requires an hour indicator).
This fixes false positives such as in "John a un rendez-vous."

Fixes https://github.com/wit-ai/wit/issues/666.

Reviewed By: JonCoens

Differential Revision: D5530713

fbshipit-source-id: ecee1e5
2017-08-01 08:49:41 -07:00