Commit Graph

240 Commits

Author SHA1 Message Date
Panagiotis Vekris
12a726aee7 Support for Greek times and dates
Summary:
This adds support for greek times.

There are still some issues with expressions of the form:
```
9:30 - 11:00 την πέμπτη
```
Where `11:00 την πέμπτη` is parsed first (as 11:30 on Thu), instead of prioritizing `9:30 - 11:00` as the training data suggests. These tests are for the moment excluded from the corpus.

Reviewed By: patapizza

Differential Revision: D6376271

fbshipit-source-id: 2f31e058fb88386429070e3b51cd33f93b9c5936
2017-12-04 16:45:40 -08:00
Lukas Koebis
0d7939b246 Email/EN: Allow 'dot' in emails.
Summary: Duckling supports emails like "hey at things.com" but not "hey at things dot com". Let's add that.

Reviewed By: patapizza

Differential Revision: D6387471

fbshipit-source-id: 349a3a1ee55983daf5fda7579623f0c7bb32357f
2017-12-04 14:01:10 -08:00
Julien Odent
12000c2f86 Numeral/GA: fix old vigesimal + refactoring
Summary: Merging rules in two, fixing production rule.

Reviewed By: JonCoens, blandinw

Differential Revision: D6466793

fbshipit-source-id: 0fa143b049e9ccce7f4946e14bbd15fc37c324e1
2017-12-01 18:15:31 -08:00
Jana Šefčíková
7fe748ffec Numeral/GA: HashMap lookups for large regexes
Summary: Use hashMap oneToTenMap instead of guard.

Reviewed By: patapizza

Differential Revision: D6414352

fbshipit-source-id: a1788df0d3521a3a870e453294708d9fe2c10908
2017-12-01 10:15:24 -08:00
Michael Azzam
8fbd6a0414 Consolidate days of week and months
Summary: Refactor days of week and months rules to avoid mindless copy-pasta.

Reviewed By: patapizza

Differential Revision: D6438622

fbshipit-source-id: 489261b53ba50f6624996ad4581c2bf1206a8cc1
2017-11-29 17:00:37 -08:00
Julien Odent
d8c8533ed3 Time/EN: Fix remaining case sensitive rules
Summary: Fixes #112.

Reviewed By: blandinw

Differential Revision: D6408648

fbshipit-source-id: 944d8d19417b73d1e8a4996b5ea523f732f8dab7
2017-11-24 10:30:29 -08:00
Julien Odent
cc7cf74ff4 New locale instruction
Summary: to be more specific

Reviewed By: blandinw

Differential Revision: D6391421

fbshipit-source-id: 5c8e34b7a072ad2820a064c66a2364e866147630
2017-11-22 17:02:55 -08:00
Michal Nanasi
93e7010844 Backed out changeset 3b7abaab879c
Summary: Sigma is crashing in PCRe.

Reviewed By: codemiller, simonmar

Differential Revision: D6385023

fbshipit-source-id: cdd7e394606fb38df976d1bb81da414329ab7adf
2017-11-21 07:30:28 -08:00
Lukas Koebis
e361c9b694 Allow 'dot' in emails.
Summary: Duckling supports emails like "hey at things.com" but not "hey at things dot com". Let's add that.

Reviewed By: patapizza

Differential Revision: D6364483

fbshipit-source-id: 3b7abaab879c44c30c80c928de72c202068894a8
2017-11-20 12:15:29 -08:00
Julien Odent
54ccbf81df Numeral/DE: fix keiner + cleanup
Summary:
* fixed keine[rn]
* removed redundant `ruleInteger`
* replaced pattern matching with hashmap lookup
* renamed `dozensMap` to `tensMap`

Reviewed By: blandinw

Differential Revision: D6336252

fbshipit-source-id: 740734ab7b0b289adc4f466f966c4c5e59af75ad
2017-11-15 13:15:29 -08:00
igor-drozdov
29d776dee5 Added TimeGrain and Duration Dimensions to Russian language
Summary:
- Added Duration dimension to Russian language
- Added TimeGrain dimension to Russian language
- Refactored isNatural and isNaturalWith out of Duration helpers into Numeral helpers
- Implemented <integer> and a half rule for Russian Numeral
- Changed the type of inSeconds to polymorphic one
Closes https://github.com/facebook/duckling/pull/105

Reviewed By: blandinw

Differential Revision: D6312604

Pulled By: patapizza

fbshipit-source-id: 9ae237b4beb6915ff8da013230457937d8e56733
2017-11-15 10:45:24 -08:00
Igor Drozdov
f6492b5da0 Added Quantity dimension to Russian language
Summary: Closes https://github.com/facebook/duckling/pull/106

Reviewed By: blandinw

Differential Revision: D6312605

Pulled By: patapizza

fbshipit-source-id: 69ec673f95ec8a2d86ec207a6d75cd8ebfcdb4f6
2017-11-14 21:00:28 -08:00
Julien Odent
fb10a6e6ba Time/EN: Parse spelled out times + AM/PM
Summary:
When using speech recognition, we might see things like "six thirty six a m" or
"ten thirty p m".
Also fixed the argument order of `timeOfDayAMPM` to be more idiomatic.

Reviewed By: blandinw

Differential Revision: D6316542

fbshipit-source-id: 0008c049040219b3a1dd80d9e4661ba8a246fa7f
2017-11-14 13:30:26 -08:00
Julien Odent
98ff268918 Point to language-specific support in README
Summary: Highlights opportunities.

Reviewed By: JonCoens

Differential Revision: D6312814

fbshipit-source-id: dfa1f9a6bf6d83d7dfc69b88209a7242ad2f4aef
2017-11-13 10:00:37 -08:00
Julien Odent
277895193f Link to wit.ai in README
Summary:
Suggest to use wit.ai's built-in entities if running Duckling is not
an option.

Reviewed By: JonCoens

Differential Revision: D6312789

fbshipit-source-id: 08f1c5759cdd7761b456914cd5c42e1c111c6408
2017-11-13 10:00:37 -08:00
Julien Odent
436e4662d9 Time/NL: don't be too eager on days of week
Summary:
"d" would parse as "dinsdag" (Tuesday), "zo" would parse as "zondag" (Sunday, also means "so").
Made dots mandatory, to prevent further issues (e.g. "zon" means "sun").

Reviewed By: mullender

Differential Revision: D6312693

fbshipit-source-id: 58c5824e3ff174fc9c293c3f2d13e152c60e51de
2017-11-13 09:00:40 -08:00
Panagiotis Vekris
c6a7fedb7b Extract useful time-rule constructors to Time.Helpers
Summary: Some constructors for Time related rules are being reused among different language, so we can extract them in `Duckling/Time/Helpers.hs`.

Reviewed By: patapizza

Differential Revision: D6269106

fbshipit-source-id: 7d9969ce425ee27a2e1a32ea48932e16c7e6b1f1
2017-11-08 14:45:35 -08:00
Panagiotis Vekris
536f2844e3 Support Greek ordinals
Summary: Adding support for Greek ordinals

Reviewed By: patapizza

Differential Revision: D6263781

fbshipit-source-id: ff339ee51e4e8ad6b0c8f3fa75f5652391dbe48e
2017-11-08 11:00:31 -08:00
Igor Drozdov
5d2c5c78ba Added Distance and Volume Dimensions for Russian language
Summary:
- Added Distance Dimension for Russian language (RU)
- Added Volume Dimension for Russian language (RU)
- Extended `Duckling.Distance.Types.Unit` type definition by adding `Millimetre` representation
Closes https://github.com/facebook/duckling/pull/101

Reviewed By: JonCoens

Differential Revision: D6254070

Pulled By: patapizza

fbshipit-source-id: 579f7a259f76ff1c23ccfe2371afea385eb56aa1
2017-11-08 11:00:31 -08:00
Abdallatif Sulaiman
8cbdabef09 More rules to AR Ordinal dimension
Summary:
- Add more tests to Ar ordinal corpus
- Add HashMap lookups and add more rules to composite numbers
Closes https://github.com/facebook/duckling/pull/100

Reviewed By: JonCoens

Differential Revision: D6254264

Pulled By: patapizza

fbshipit-source-id: 6d50286f6bd3a60e34b0eb855fb38e044c3c39dc
2017-11-08 10:45:29 -08:00
Julien Odent
87c875f1e5 Fix Travis links in README
Summary: We've graduated from `facebookincubator` to `facebook`.

Reviewed By: JonCoens

Differential Revision: D6272667

fbshipit-source-id: d433c8f1417d94f703fc455a466be64253ed85b8
2017-11-08 10:00:36 -08:00
Julien Odent
bc3cd146cd fbshipit-source-id: f5a450fa3c872707fffeb9c241db77e44acd0f26 2017-11-08 09:33:19 -08:00
Panagiotis Vekris
e8937e1cd6 Support for Greek durations
Summary: Adding support for Greek time grains and durations.

Reviewed By: patapizza

Differential Revision: D6249955

fbshipit-source-id: 1c69e26
2017-11-06 18:49:36 -08:00
Julien Odent
ba46d592cd Prevent double negatives + cleanup
Summary:
* Prevent double negatives (resulting from `ruleNegative` applying twice and from engine tokenizer)
* Hashmap lookup for tens
* cleanup

Reviewed By: blandinw

Differential Revision: D6221107

fbshipit-source-id: 42e401d
2017-11-03 00:31:22 -07:00
Panagiotis Vekris
fda8c7c759 Support Greek numerals
Summary:
- Setup Greek language (EL)
- Added Greek Numerals

Reviewed By: patapizza

Differential Revision: D6217873

fbshipit-source-id: 379170f
2017-11-02 17:16:18 -07:00
Julien Odent
f0a0c1e6b8 Time/EN: don't parse "this in 2 minutes" + fix thanksgiving in EN locales
Summary:
* add flag for this/next/last time
* fix thanskgiving in EN locales
* `analyzedRangeTest` helper with `rangeTests` for `Time/EN`

Reviewed By: blandinw

Differential Revision: D6191209

fbshipit-source-id: 6eaa117
2017-10-31 12:34:21 -07:00
Jeffrey Karres
67190c4238 HashMap lookups for large regexes
Summary: Replacing guards with hashmap

Reviewed By: patapizza

Differential Revision: D6192748

fbshipit-source-id: 644825b
2017-10-30 18:04:29 -07:00
Julien Odent
63073fb3b6 Remove Time from JA targets
Summary: Duckling doesn't know `Time` in Japanese yet.

Reviewed By: blandinw

Differential Revision: D6191243

fbshipit-source-id: d2c426a
2017-10-30 16:34:24 -07:00
Hollin Wilkins
b7fc2c239d Update Dockerfile to build with correct version of Haskell.
Summary:
With the current Dockerfile, I get errors when trying to build:

```
Downloading lts-8.8 build plan ...
Downloaded lts-8.8 build plan.
Updating package index Hackage (mirrored at https://s3.amazonaws.com/hackage.fpcomplete.com/) ...
Selected mirror https://s3.amazonaws.com/hackage.fpcomplete.com/
Downloading root
Selected mirror https://s3.amazonaws.com/hackage.fpcomplete.com/
Downloading timestamp
Downloading snapshot
Downloading mirrors
Cannot update index (no local copy)
Downloading index
Updated package list downloaded
Populating index cache ...
Populated index cache.
Compiler version mismatched, found ghc-8.2.1 (x86_64), but expected minor version match with ghc-8.0.2 (x86_64) (based on resolver setting in /duckling/stack.yaml).
To install the correct GHC into /root/.stack/programs/x86_64-linux/, try running "stack setup" or use the "--install-ghc" flag.
The command '/bin/sh -c stack build' returned a non-zero code: 1
```

With these fixes, I run `stack setup` inside the docker image in case it was not setup previously on the host machine.
Closes https://github.com/facebookincubator/duckling/pull/98

Differential Revision: D6174261

Pulled By: patapizza

fbshipit-source-id: 4719d94
2017-10-27 11:34:48 -07:00
Jonathan Coens
349771dbde Update to lts-9.10
Summary: I also did a check against nightly (which runs 8.2.1) and we were fine there too.

Reviewed By: blandinw

Differential Revision: D6169348

fbshipit-source-id: dd0727d
2017-10-26 18:34:27 -07:00
Sam Pepose
251028e691 Uses correct date format for all EN locales
Summary: The date format changes between EN locales (https://en.wikipedia.org/wiki/Date_format_by_country). This diff fixes how dates are handled in each locale.

Reviewed By: patapizza

Differential Revision: D6156147

fbshipit-source-id: 22f296c
2017-10-26 08:49:21 -07:00
Aleksey Suslov
df6484945a Few improvements for Numeral/RU
Summary:
- fix typos in 11, 300 and 400 forms
- add more 0 rules
- introduce 1.5 rules
Closes https://github.com/facebookincubator/duckling/pull/96

Differential Revision: D6136073

Pulled By: patapizza

fbshipit-source-id: b1252f2
2017-10-24 10:49:27 -07:00
Julien Odent
980c0f279e Time: don't parse at + phone number
Summary: Fixes #95.

Reviewed By: blandinw

Differential Revision: D6129893

fbshipit-source-id: e863021
2017-10-23 16:34:34 -07:00
Matthijs Mullender
1ade1935b2 Support Dutch dates and times
Summary: [Duckling][Time][NL] Support Dutch dates and times

Reviewed By: patapizza

Differential Revision: D6090294

fbshipit-source-id: 54b8729
2017-10-19 14:04:38 -07:00
Julien Odent
3fa0988fcd Time/GB: fix parsing Oct-Dec months
Summary: Regex was happy with first option.

Reviewed By: blandinw

Differential Revision: D6092130

fbshipit-source-id: 0f89f22
2017-10-18 15:34:43 -07:00
Abdallatif Sulaiman
18cd2210ac Add Duration Dimension to Arabic Language
Summary: Closes https://github.com/facebookincubator/duckling/pull/94

Reviewed By: blandinw

Differential Revision: D6078221

Pulled By: patapizza

fbshipit-source-id: b653b24
2017-10-17 16:04:28 -07:00
Julien Odent
ed58115caf Numeral: refactor 'Text.singleton' usages
Summary:
* refactored `Text.singleton` usages into `Text` literals
* removed redundant `join` imports with `NoRebindableSyntax` language pragma
* ET: merged 2 rules into one

Reviewed By: blandinw

Differential Revision: D6080231

fbshipit-source-id: 47c18df
2017-10-17 13:19:39 -07:00
Julien Odent
b2de97800f Numeral/FR: allow space as a thousand separator
Summary: Fixes #91.

Reviewed By: blandinw

Differential Revision: D6079821

fbshipit-source-id: f3160c1
2017-10-17 12:20:32 -07:00
Fredrik Wallén
1b176c3665 Added support for understanding Swedish last <day-of-the-week> expression
Summary:
One of the most common ways of saying "last \<day-of-the-week\>" in Swedish is saying "i \<day-of-the-week\>s", for example "i tisdags" (last Tuesday) or "i lördags" (last Saturday). This pull request adds support for this.
Closes https://github.com/facebookincubator/duckling/pull/92

Reviewed By: blandinw, kodafb

Differential Revision: D6064814

Pulled By: patapizza

fbshipit-source-id: 6ea5466
2017-10-16 12:34:20 -07:00
Julien Odent
0e950620b8 Email: restrict domain extensions to letters when spelling out
Summary: We would parse things like "tonight at 6.40".

Reviewed By: blandinw

Differential Revision: D6066926

fbshipit-source-id: d18a8c6
2017-10-16 12:19:22 -07:00
Julien Odent
305358b2f7 Time/PT: don't parse 'ter'
Summary:
`ter` also means `to have`, which is very common.
`ter` is a very rare form for `Tuesday`, only used in calendar-like contexts

Reviewed By: blandinw

Differential Revision: D6066506

fbshipit-source-id: c8cd231
2017-10-16 12:04:25 -07:00
Ilya Murzinov
5c390c6cc7 Fixed example in readme
Summary: Closes https://github.com/facebookincubator/duckling/pull/93

Differential Revision: D6065706

Pulled By: patapizza

fbshipit-source-id: e9ef61b
2017-10-16 11:04:25 -07:00
Julien Odent
1ab5f447d2 en_CA + fix Canadian Thanksgiving
Summary:
* `en_CA` locale
* In Canada, Thanksgiving Day is the second Monday of October.
* Black Friday is the same as the US.
* However Canada observes both DDMM and MMDD formats. Defer to later, falling back to US.

Reviewed By: blandinw

Differential Revision: D6058909

fbshipit-source-id: 3d4e05e
2017-10-16 10:04:43 -07:00
Julien Odent
fb1dcaa138 Chinese locales + fix TW National Day
Summary:
* Moving `ruleNationalDay` from `ZH` rules to specific locales: `zh_CN`, `zh_HK`, `zh_MO`
* Fixed National Day for `zh_TW`.

Reviewed By: blandinw

Differential Revision: D6057565

fbshipit-source-id: 8f9f2ab
2017-10-13 17:04:43 -07:00
Matthijs Mullender
33a08bb76b Support Dutch Durations
Summary:
This change adds support for durations in Dutch/Netherlands (NL)
Implemented: TimeGrain/NL, Durations/NL

Reviewed By: patapizza

Differential Revision: D6049404

fbshipit-source-id: 3621cdb
2017-10-13 12:49:30 -07:00
Julien Odent
ab0ad0256e Locales support
Summary:
* Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33)
* `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations
* API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility.
*  `Rules/<Lang>.hs` exposes
  - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs`
  - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs`
  - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility
* Corpus, tests & classifiers
  - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible)
  - Default classifiers are built on existing corpus
  - Locale classifiers are built on
  - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>`
  - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`).
  - Locale classifiers use the language corpus extended with the locale examples as training set.
  - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus.
  - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`.
  - Tests run against each classifier to make sure runtime works as expected.
* MM/DD (en_US) vs DD/MM (en_GB) example to illustrate

Reviewed By: JonCoens, blandinw

Differential Revision: D6038096

fbshipit-source-id: f29c28d
2017-10-13 08:34:21 -07:00
Aleka Cheung
ddefc949e2 Consolidate days of weeks and months
Summary: added ruleDaysofWeek and ruleMonths in the Norwegian rules

Reviewed By: patapizza

Differential Revision: D6035360

fbshipit-source-id: 90bd67f
2017-10-12 18:34:33 -07:00
Julien Odent
b3fb913a23 Time: don't parse subsequent numbers
Summary:
- Fixes #89. "<number> <number>" would parse as Time if in the right range.
- Applied same rule for all languages. Note that for Italian and Polish, I updated "<hour> <minute>" tests to be in the form "at <hour> <minute>".
- Replaced `liftM2` with more generic `and|or . sequence [f1, f2, ...]`.

Reviewed By: blandinw

Differential Revision: D5992879

fbshipit-source-id: 5409ffb
2017-10-06 12:19:29 -07:00
Ian Stewart-Binks
2b566eeac0 Numeral/JA: HashMap lookups for large regexes
Summary: Replaced pattern matching with Hashmap. Also, removed ruleInteger17 and moved its regex to ruleInteger.

Reviewed By: patapizza

Differential Revision: D5812629

fbshipit-source-id: f0c1a06
2017-10-06 10:34:33 -07:00
Julien Odent
fd77036a72 Regen classifiers
Summary: Regenerating classifiers with latest code.

Reviewed By: blandinw

Differential Revision: D5987906

fbshipit-source-id: 16f2c41
2017-10-05 12:19:44 -07:00