Commit Graph

754 Commits

Author SHA1 Message Date
Julien Odent
ed58115caf Numeral: refactor 'Text.singleton' usages
Summary:
* refactored `Text.singleton` usages into `Text` literals
* removed redundant `join` imports with `NoRebindableSyntax` language pragma
* ET: merged 2 rules into one

Reviewed By: blandinw

Differential Revision: D6080231

fbshipit-source-id: 47c18df
2017-10-17 13:19:39 -07:00
Julien Odent
b2de97800f Numeral/FR: allow space as a thousand separator
Summary: Fixes #91.

Reviewed By: blandinw

Differential Revision: D6079821

fbshipit-source-id: f3160c1
2017-10-17 12:20:32 -07:00
Fredrik Wallén
1b176c3665 Added support for understanding Swedish last <day-of-the-week> expression
Summary:
One of the most common ways of saying "last \<day-of-the-week\>" in Swedish is saying "i \<day-of-the-week\>s", for example "i tisdags" (last Tuesday) or "i lördags" (last Saturday). This pull request adds support for this.
Closes https://github.com/facebookincubator/duckling/pull/92

Reviewed By: blandinw, kodafb

Differential Revision: D6064814

Pulled By: patapizza

fbshipit-source-id: 6ea5466
2017-10-16 12:34:20 -07:00
Julien Odent
0e950620b8 Email: restrict domain extensions to letters when spelling out
Summary: We would parse things like "tonight at 6.40".

Reviewed By: blandinw

Differential Revision: D6066926

fbshipit-source-id: d18a8c6
2017-10-16 12:19:22 -07:00
Julien Odent
305358b2f7 Time/PT: don't parse 'ter'
Summary:
`ter` also means `to have`, which is very common.
`ter` is a very rare form for `Tuesday`, only used in calendar-like contexts

Reviewed By: blandinw

Differential Revision: D6066506

fbshipit-source-id: c8cd231
2017-10-16 12:04:25 -07:00
Ilya Murzinov
5c390c6cc7 Fixed example in readme
Summary: Closes https://github.com/facebookincubator/duckling/pull/93

Differential Revision: D6065706

Pulled By: patapizza

fbshipit-source-id: e9ef61b
2017-10-16 11:04:25 -07:00
Julien Odent
1ab5f447d2 en_CA + fix Canadian Thanksgiving
Summary:
* `en_CA` locale
* In Canada, Thanksgiving Day is the second Monday of October.
* Black Friday is the same as the US.
* However Canada observes both DDMM and MMDD formats. Defer to later, falling back to US.

Reviewed By: blandinw

Differential Revision: D6058909

fbshipit-source-id: 3d4e05e
2017-10-16 10:04:43 -07:00
Julien Odent
fb1dcaa138 Chinese locales + fix TW National Day
Summary:
* Moving `ruleNationalDay` from `ZH` rules to specific locales: `zh_CN`, `zh_HK`, `zh_MO`
* Fixed National Day for `zh_TW`.

Reviewed By: blandinw

Differential Revision: D6057565

fbshipit-source-id: 8f9f2ab
2017-10-13 17:04:43 -07:00
Matthijs Mullender
33a08bb76b Support Dutch Durations
Summary:
This change adds support for durations in Dutch/Netherlands (NL)
Implemented: TimeGrain/NL, Durations/NL

Reviewed By: patapizza

Differential Revision: D6049404

fbshipit-source-id: 3621cdb
2017-10-13 12:49:30 -07:00
Julien Odent
ab0ad0256e Locales support
Summary:
* Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33)
* `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations
* API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility.
*  `Rules/<Lang>.hs` exposes
  - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs`
  - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs`
  - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility
* Corpus, tests & classifiers
  - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible)
  - Default classifiers are built on existing corpus
  - Locale classifiers are built on
  - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>`
  - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`).
  - Locale classifiers use the language corpus extended with the locale examples as training set.
  - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus.
  - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`.
  - Tests run against each classifier to make sure runtime works as expected.
* MM/DD (en_US) vs DD/MM (en_GB) example to illustrate

Reviewed By: JonCoens, blandinw

Differential Revision: D6038096

fbshipit-source-id: f29c28d
2017-10-13 08:34:21 -07:00
Aleka Cheung
ddefc949e2 Consolidate days of weeks and months
Summary: added ruleDaysofWeek and ruleMonths in the Norwegian rules

Reviewed By: patapizza

Differential Revision: D6035360

fbshipit-source-id: 90bd67f
2017-10-12 18:34:33 -07:00
Julien Odent
b3fb913a23 Time: don't parse subsequent numbers
Summary:
- Fixes #89. "<number> <number>" would parse as Time if in the right range.
- Applied same rule for all languages. Note that for Italian and Polish, I updated "<hour> <minute>" tests to be in the form "at <hour> <minute>".
- Replaced `liftM2` with more generic `and|or . sequence [f1, f2, ...]`.

Reviewed By: blandinw

Differential Revision: D5992879

fbshipit-source-id: 5409ffb
2017-10-06 12:19:29 -07:00
Ian Stewart-Binks
2b566eeac0 Numeral/JA: HashMap lookups for large regexes
Summary: Replaced pattern matching with Hashmap. Also, removed ruleInteger17 and moved its regex to ruleInteger.

Reviewed By: patapizza

Differential Revision: D5812629

fbshipit-source-id: f0c1a06
2017-10-06 10:34:33 -07:00
Julien Odent
fd77036a72 Regen classifiers
Summary: Regenerating classifiers with latest code.

Reviewed By: blandinw

Differential Revision: D5987906

fbshipit-source-id: 16f2c41
2017-10-05 12:19:44 -07:00
Laurent Landowski
2f48eaf371 misspelling "tonights" and "weekends"
Summary:
adding misspelling "tonights" of "tonight's" and "weekends" of "weekend's"
Closes https://github.com/facebookincubator/duckling/pull/90

Reviewed By: patapizza

Differential Revision: D5987512

Pulled By: l5t

fbshipit-source-id: ac15251
2017-10-05 11:19:25 -07:00
Brock Overcash
8d4b103b10 Consolidate HE named months
Summary: Consolidate HE language named-months in the same style of other language, particularly EN. Translations were taken from previous namedMonth# rules and merged into the combined rule. Hebrew-speaking verification would be helpful to verify the integrity of the translations, although they were already taken from the existing code.

Reviewed By: blandinw

Differential Revision: D5839625

fbshipit-source-id: d53f9c7
2017-09-26 16:19:50 -07:00
Kent White
c65b0a7c56 Move Polish daysOfWeek and months rules to be handled all in one place
Summary: Changed the duckling Polish days and months rules to be in line with the new style including all in one rules

Reviewed By: blandinw

Differential Revision: D5903171

fbshipit-source-id: 7f2ba60
2017-09-26 16:04:37 -07:00
Julien Odent
1eea25049f Time/RO: don't parse 'sa' as Saturday
Summary: In Romanian, `sa` is fairly common: hai sa ne vedem (let's see), hai sa mergem (let's go).

Reviewed By: blandinw

Differential Revision: D5801345

fbshipit-source-id: db677e4
2017-09-09 11:04:57 -07:00
Julien Odent
83ea150d94 Convert back escaped characters in rules
Summary:
We noticed that using UTF-8 characters directly in regexes work.
Hence converting back the escaped characters for readability and maintenance.

Reviewed By: blandinw

Differential Revision: D5787146

fbshipit-source-id: e5a4b9a
2017-09-07 12:49:33 -07:00
Julien Odent
b954380937 ES/Ordinal: Fixes + tests
Summary:
* fixes '1st' variants (e.g. primeros, primera)
* fixes accents

Reviewed By: JonCoens

Differential Revision: D5772079

fbshipit-source-id: 6a09d79
2017-09-06 10:19:31 -07:00
Stepan Parunashvili
6f774abe38 georgian numeral support
Summary: Introducing Georgian (KA), and the very beginnings of numeral support

Reviewed By: patapizza

Differential Revision: D5757952

fbshipit-source-id: 89d05f8
2017-09-05 12:19:29 -07:00
Yao Xiao
434e88b511 Consolidate days of week and months
Summary:
Remove hardcoded named-days and named-months, and
replace them with ruleDaysOfWeek and ruleMonths.

Reviewed By: patapizza

Differential Revision: D5742209

fbshipit-source-id: 339fc0a
2017-09-01 13:49:36 -07:00
Kevin Doherty
c41b71c665 Consolidate days of the week and months for GA
Summary:
The Galeic ruleset has 12 separate rules for months, and 7 for days. This
change replaces those with a list of months/days and a single function
to create a list of rules from those. This is the same approach as is currently in the English ruleset.

Reviewed By: patapizza

Differential Revision: D5756222

fbshipit-source-id: ac4bc42
2017-09-01 12:34:29 -07:00
Henry Swanson
b62be42077 Consolidate other times in DE ruleset
Summary:
Combined each of seasons, instants, and holidays into a data list and a
function to generate the list of Rules.

*Instants = today, tomorrow, now, end of year, etc.

Reviewed By: patapizza

Differential Revision: D5730896

fbshipit-source-id: 23170e7
2017-08-29 16:34:30 -07:00
Henry Swanson
96432e5b7d Consolidate days of the week and months for DE
Summary:
The German ruleset has 12 separate rules for months, and 7 for days. This
change replaces those with a list of months/days and a single function
to create a list of rules from those. This is the same approach as is currently in the English ruleset.

Reviewed By: patapizza

Differential Revision: D5728656

fbshipit-source-id: 8590f4a
2017-08-29 14:49:39 -07:00
Caren Thomas
5d9b774b9d Consolidate days of week and months for Swedish rules
Summary: Consolidated all the days of week rules into one rule, and did the same for all the month rules.

Reviewed By: patapizza

Differential Revision: D5721202

fbshipit-source-id: 2b4a56f
2017-08-28 16:04:35 -07:00
Dana Thomas
0d387c0775 Consolidating days of week and names of months into separate rules for French Duckling
Summary: Consolidated the previous days of weeks and month names in french duckling file to become only 2 rules. Allows for more concise, updated code.

Reviewed By: patapizza

Differential Revision: D5710056

fbshipit-source-id: 816ef88
2017-08-28 11:34:27 -07:00
Julien Odent
b5e646d8f6 New language instructions
Summary: Updated `README.md` with instructions on how to add a new language in Duckling.

Reviewed By: JonCoens

Differential Revision: D5712523

fbshipit-source-id: 6f5eda0
2017-08-25 17:34:26 -07:00
Andrew Shields
7df488fbe2 reuse ruleDaysOfWeek and ruleMonths in Danish rules
Summary: Changed Danish time rules to use ruleDaysOfWeek and ruleMonths.

Reviewed By: patapizza

Differential Revision: D5709782

fbshipit-source-id: aa03065
2017-08-25 17:04:29 -07:00
Fredrik Wallén
9b5b7bc6ce Corrected ordinals for Swedish and updated tests accordingly
Summary:
There are problems in the ordinal recognition for Swedish. The most severe one is that all the numbers above 15 are actually Danish, not Swedish. Apart from that digits and digits followed by a dot are considered ordinals.

This pull request fixes this and also adds support for ordinals up to 100. The structure of the code is similar to in the ordinal recognition in English. Tests are also updated, both the ordinal tests and the time tests where incorrect ordinals were used.
Closes https://github.com/facebookincubator/duckling/pull/86

Reviewed By: JonCoens

Differential Revision: D5698145

Pulled By: patapizza

fbshipit-source-id: c31d7bc
2017-08-25 17:04:29 -07:00
Margaret Li
b3d10dbf05 consolidated rules for days of week/months in duckling ZH
Reviewed By: patapizza

Differential Revision: D5702961

fbshipit-source-id: 49906d2
2017-08-25 16:34:48 -07:00
Julien Odent
9f856cec48 Time/PT: don't parse 'um' alone
Summary: In Portuguese, "um" means the numeral "one" and the article "a".

Reviewed By: bfiss

Differential Revision: D5703396

fbshipit-source-id: 92ed04f
2017-08-24 21:49:36 -07:00
Matt Lim
faa91d026b Consolidate days of week and months, for Vietnamese rules
Summary:
Remove hardcoded named-days and named-months, and
replace them with ruleDaysOfWeek and ruleMonths.

Reviewed By: patapizza

Differential Revision: D5695475

fbshipit-source-id: d30557f
2017-08-24 10:06:18 -07:00
Julien Odent
2f28e4e33d Time/PL: Don't parse ordinals without context
Summary: "pierwszy" and "drugiej" shouldn't parse as hours without context (e.g. at/until x).

Reviewed By: blandinw

Differential Revision: D5694804

fbshipit-source-id: 40e3eb7
2017-08-24 08:34:33 -07:00
dubovinszky
60565c15aa HU Time, TimeGrain
Summary: Closes https://github.com/facebookincubator/duckling/pull/83

Reviewed By: blandinw

Differential Revision: D5681515

Pulled By: patapizza

fbshipit-source-id: 918d0a4
2017-08-22 19:34:33 -07:00
Joseph Button
ff76927956 Consolidate days of week and months
Summary: Consolidating the rules for months and days of the week in Italian following the pattern seen in English.

Reviewed By: patapizza

Differential Revision: D5665259

fbshipit-source-id: 45d6c3b
2017-08-22 15:04:27 -07:00
Atyansh Jaiswal
4e96a15c15 Refactored weekend rules to use the weekend helper for all languages
Summary: This is a simple refactor that uses the weekend helper for all languages

Reviewed By: patapizza

Differential Revision: D5677330

fbshipit-source-id: 9984539
2017-08-22 10:34:24 -07:00
Julien Odent
004995b595 Don't allow matches in the middle of words
Summary:
We don't allow matches adjacent to a character of the same class.
We were treating uppercase and lowercase characters differently.
"jon Friday" wouldn't match "on" but "Jon Friday" would.

Reviewed By: blandinw

Differential Revision: D5653681

fbshipit-source-id: be67358
2017-08-17 15:49:26 -07:00
Maury Turay
4a7aacae2f Consolidate days of week and months for Romanian rules
Reviewed By: patapizza

Differential Revision: D5645993

fbshipit-source-id: d2b69a1
2017-08-17 15:34:19 -07:00
Satya Bodduluri
da41db3766 Added ruleIntervalDDDDMonth to EN
Summary: Added ruleIntervalDDDDMonth to EN to handle cases such as "23rd to 26th Oct" and "1-8 september"

Reviewed By: patapizza

Differential Revision: D5637280

fbshipit-source-id: a1fdcd2
2017-08-16 14:34:26 -07:00
Daniel Kantor
5cad4359e2 Added HU Ordinals
Summary: Closes https://github.com/facebookincubator/duckling/pull/82

Reviewed By: JonCoens

Differential Revision: D5631927

Pulled By: patapizza

fbshipit-source-id: d68b238
2017-08-16 11:19:24 -07:00
Jesse Hellemn
98b58647b1 Adapting Spanish rules to handles names days and months in the same rules
Summary: Moved all named days to the same rule, moved all named months to the same rule. Kept same regexes, just consolidated them.

Reviewed By: patapizza

Differential Revision: D5637061

fbshipit-source-id: e08ecf9
2017-08-16 09:34:24 -07:00
Atyansh Jaiswal
e7431739ec Fixed Ordinal parsing for format "August 27th-30th"
Summary: Changed ruleIntervalMonthDDDD to use the ordinal predicate instead of ugly regex

Reviewed By: patapizza

Differential Revision: D5628188

fbshipit-source-id: 1dbe195
2017-08-15 10:34:37 -07:00
Veselin Stoyanov
e9b1c8932a Added AmountOfMoney dimension to Bulgarian language
Summary:
- Added AmountOfMoney dimension to Bulgarian language
Closes https://github.com/facebookincubator/duckling/pull/80

Reviewed By: JonCoens

Differential Revision: D5606699

Pulled By: patapizza

fbshipit-source-id: c18f5d4
2017-08-14 09:34:36 -07:00
Hiten Parmar
be113689ac Add support for parsing day intervals beginning with from "from 10 to 16 August"
Summary: Added EN rule "ruleIntervalFromDDDDMonth" to support "from 10 to 16 August". Used "isDOMValue" helper rather than regex.

Reviewed By: patapizza

Differential Revision: D5610623

fbshipit-source-id: 00a5208
2017-08-12 02:19:23 -07:00
Julien Odent
4c348b1b9d Try -j1 to fix Travis
Summary:
The Travis build fails.
Trying https://github.com/haskell/cabal/issues/2546 to see if that helps.

Reviewed By: niteria

Differential Revision: D5612089

fbshipit-source-id: d4df127
2017-08-11 10:49:23 -07:00
dubovinszky
24d3f19976 HU Setup + Numeral
Summary:
- Setup Hungarian (HU) language
- Added Numeral Dimension
Closes https://github.com/facebookincubator/duckling/pull/79

Reviewed By: blandinw

Differential Revision: D5595812

Pulled By: patapizza

fbshipit-source-id: 5959938
2017-08-09 17:49:56 -07:00
Veselin Stoyanov
5d03b45af9 Setup Bulgarian language and Numeral Dimension
Summary:
- Setup Bulgarian (BG) language
- Added Numeral Dimension
Closes https://github.com/facebookincubator/duckling/pull/78

Reviewed By: niteria

Differential Revision: D5575513

Pulled By: patapizza

fbshipit-source-id: e566155
2017-08-09 08:19:24 -07:00
Julien Odent
9037126937 [Duckling] Time/DE: don't parse 'nächste 5'
Summary: Fixes https://github.com/wit-ai/wit/issues/694.

Reviewed By: niteria

Differential Revision: D5590023

fbshipit-source-id: 6356615
2017-08-09 06:49:26 -07:00
Julien Odent
61800297c8 Time/PL: don't parse 'nie' as Time
Summary: 'nie' means 'no' in Polish, and isn't a common abbreviation for 'niedziela' (Sunday).

Reviewed By: blandinw

Differential Revision: D5587036

fbshipit-source-id: bfda7fc
2017-08-08 16:49:58 -07:00