Summary: Introducing Georgian (KA), and the very beginnings of numeral support
Reviewed By: patapizza
Differential Revision: D5757952
fbshipit-source-id: 89d05f8
Summary:
Remove hardcoded named-days and named-months, and
replace them with ruleDaysOfWeek and ruleMonths.
Reviewed By: patapizza
Differential Revision: D5742209
fbshipit-source-id: 339fc0a
Summary:
The Galeic ruleset has 12 separate rules for months, and 7 for days. This
change replaces those with a list of months/days and a single function
to create a list of rules from those. This is the same approach as is currently in the English ruleset.
Reviewed By: patapizza
Differential Revision: D5756222
fbshipit-source-id: ac4bc42
Summary:
Combined each of seasons, instants, and holidays into a data list and a
function to generate the list of Rules.
*Instants = today, tomorrow, now, end of year, etc.
Reviewed By: patapizza
Differential Revision: D5730896
fbshipit-source-id: 23170e7
Summary:
The German ruleset has 12 separate rules for months, and 7 for days. This
change replaces those with a list of months/days and a single function
to create a list of rules from those. This is the same approach as is currently in the English ruleset.
Reviewed By: patapizza
Differential Revision: D5728656
fbshipit-source-id: 8590f4a
Summary: Consolidated all the days of week rules into one rule, and did the same for all the month rules.
Reviewed By: patapizza
Differential Revision: D5721202
fbshipit-source-id: 2b4a56f
Summary: Consolidated the previous days of weeks and month names in french duckling file to become only 2 rules. Allows for more concise, updated code.
Reviewed By: patapizza
Differential Revision: D5710056
fbshipit-source-id: 816ef88
Summary: Changed Danish time rules to use ruleDaysOfWeek and ruleMonths.
Reviewed By: patapizza
Differential Revision: D5709782
fbshipit-source-id: aa03065
Summary:
There are problems in the ordinal recognition for Swedish. The most severe one is that all the numbers above 15 are actually Danish, not Swedish. Apart from that digits and digits followed by a dot are considered ordinals.
This pull request fixes this and also adds support for ordinals up to 100. The structure of the code is similar to in the ordinal recognition in English. Tests are also updated, both the ordinal tests and the time tests where incorrect ordinals were used.
Closes https://github.com/facebookincubator/duckling/pull/86
Reviewed By: JonCoens
Differential Revision: D5698145
Pulled By: patapizza
fbshipit-source-id: c31d7bc
Summary: In Portuguese, "um" means the numeral "one" and the article "a".
Reviewed By: bfiss
Differential Revision: D5703396
fbshipit-source-id: 92ed04f
Summary:
Remove hardcoded named-days and named-months, and
replace them with ruleDaysOfWeek and ruleMonths.
Reviewed By: patapizza
Differential Revision: D5695475
fbshipit-source-id: d30557f
Summary: Consolidating the rules for months and days of the week in Italian following the pattern seen in English.
Reviewed By: patapizza
Differential Revision: D5665259
fbshipit-source-id: 45d6c3b
Summary:
We don't allow matches adjacent to a character of the same class.
We were treating uppercase and lowercase characters differently.
"jon Friday" wouldn't match "on" but "Jon Friday" would.
Reviewed By: blandinw
Differential Revision: D5653681
fbshipit-source-id: be67358
Summary: Added ruleIntervalDDDDMonth to EN to handle cases such as "23rd to 26th Oct" and "1-8 september"
Reviewed By: patapizza
Differential Revision: D5637280
fbshipit-source-id: a1fdcd2
Summary: Moved all named days to the same rule, moved all named months to the same rule. Kept same regexes, just consolidated them.
Reviewed By: patapizza
Differential Revision: D5637061
fbshipit-source-id: e08ecf9
Summary: Changed ruleIntervalMonthDDDD to use the ordinal predicate instead of ugly regex
Reviewed By: patapizza
Differential Revision: D5628188
fbshipit-source-id: 1dbe195
Summary: Added EN rule "ruleIntervalFromDDDDMonth" to support "from 10 to 16 August". Used "isDOMValue" helper rather than regex.
Reviewed By: patapizza
Differential Revision: D5610623
fbshipit-source-id: 00a5208
Summary: 'nie' means 'no' in Polish, and isn't a common abbreviation for 'niedziela' (Sunday).
Reviewed By: blandinw
Differential Revision: D5587036
fbshipit-source-id: bfda7fc
Summary:
Fixes#65.
* fixes US holidays
* Black Friday is actually the first day after Thanksgiving day (not necessary the fourth Friday of November)
Reviewed By: JonCoens
Differential Revision: D5533906
fbshipit-source-id: 1824cba
Summary:
In French, the form "at hh" is not valid (it requires an hour indicator).
This fixes false positives such as in "John a un rendez-vous."
Fixes https://github.com/wit-ai/wit/issues/666.
Reviewed By: JonCoens
Differential Revision: D5530713
fbshipit-source-id: ecee1e5
Summary:
* `Duration` before/after `Time` now resolves with the lowest grain
* "now" has an undefined grain `NoGrain`, as depending on the context it might mean different things, as opposed to "right now"
Before:
`day after tomorrow` -> `day` grain
`1 day after tomorrow` -> `hour` grain
Given that the reference date/time is `2013-02-12T04:30:00`.
`one year from now` -> `2014-02-01T00:00:00` with `month` grain.
`one year from today` -> `2014-02-01T00:00:00` with `month` grain.
After:
`day after tomorrow` -> `day` grain
`1 day after tomorrow` -> `day` grain
`one year from now` -> `2014-02-12T04:30:00` with `month` grain (remains the same).
`one year from today` -> `2014-02-12T00:00:00` with `day` grain.
For other `Time` entities involving `Duration`, such as "in + `Duration`", the behavior remains the same: shift to the lower grain (the intent is not precise).
Reviewed By: l5t, blandinw
Differential Revision: D5467164
fbshipit-source-id: b63b6a4
Summary:
MM/YY is a common format for dates in India,UK and other parts of the world.Have added testcases in `Time/EN/corpus.hs` ,however it conflicts with one of the original(2/15 is output now as Feb. 2015 and not the 15th of February).
Closes https://github.com/facebookincubator/duckling/pull/59
Reviewed By: niteria
Differential Revision: D5455881
Pulled By: patapizza
fbshipit-source-id: 23b73a5
Summary:
Today things like `at single`, `at a few`, `at a couple of` would return a `Time`.
Discussed with blandinw to do this very explicit hack right now until other use cases show up.
Reviewed By: niteria
Differential Revision: D5325369
fbshipit-source-id: aec0402
Summary:
This PR contains various smaller but - at least on my data - important performance improvements for matching of German time and time range expressions.
I evaluated this on approx 11.000 time and time range expressions taken from emails (rather formal business travel requests) that have been manually annotated with the "true" time. Comparing this branch to the current master (`d6f8dd`) I get e.g. approx. 80% of the duckling results within +/- 1h of the true value (hours are the smallest grain in my data), vs. only 70% in the master. Other indicators I checked (time/range confusion, other thresholds, failures to find anything in the first place, etc.) were all improved as well.
**Changes**:
* [significant performance plus] added a rule `ruleDateDateInterval` that handles variations of "13.-15.10." correctly. Here the common case is that "13." refers to "13.10." and not "13.CURRENTMONTH". I didn't see an obvious way to fix that in the `<datetime> - <datetime>` rule.
* [significant performance plus] In `ruleMmdd` (which matches expressions like "13.03." in German), I made the last dot optional. At least in less formal text this is quite common to be forgotten. Also here and in `ruleDateDateInterval` I changed the order of the terms in the regular expression matching the month to prefer matching e.g. "10" over matching "1"+"no dot".
* [minor] treat "14/15Uhr" the same as "14-15Uhr"
* [minor] Extended "bis" to also match "bis zum" and "auf den" (e.g. in "von Montag bis zum Freitag" or "von Dienstag auf Mittwoch")
* [minor] Changed `hh:mm` matching to also get the rather esoteric expression "17h00" - should do no harm.
Closes https://github.com/facebookincubator/duckling/pull/54
Reviewed By: blandinw
Differential Revision: D5301815
Pulled By: patapizza
fbshipit-source-id: 8766caf
Summary:
e.g. "New York from 10-6 to 10-22" currently extracts: HH-MM. Instead, it should extract mm-dd i.e. October 10th to October 22nd.
Closes https://github.com/facebookincubator/duckling/pull/48
Reviewed By: niteria
Differential Revision: D5292473
Pulled By: patapizza
fbshipit-source-id: 04f1a4b
Summary:
'one' is a latent time of day.
Restricting a couple of rules to accept non-latent time tokens.
Reviewed By: blandinw
Differential Revision: D5293972
fbshipit-source-id: 07cdb9b
Summary:
The existing "mm/dd" rules only accepts format like "05/27"; However, in practice there might be extra spaces like "05 / 27", "05/ 27". The pull requests tweaks the regex to accept extra space.
Closes https://github.com/facebookincubator/duckling/pull/31
Reviewed By: niteria
Differential Revision: D5147118
Pulled By: patapizza
fbshipit-source-id: f6a5069
Summary:
Add the [Cantonese](https://en.wikipedia.org/wiki/Cantonese) (the official spoken language used in Hong Kong) support to date time
- updated Duration ZH corpus
- updated Time ZH rules and corpus
- updated TimeGrain ZH rules
Closes https://github.com/facebookincubator/duckling/pull/24
Reviewed By: patapizza
Differential Revision: D5143947
Pulled By: niteria
fbshipit-source-id: 9107d05
Summary:
* In DE `frühestens` and `spätestens` act implicitly as `nach` and `vor` (after and before) on times and may also appear after the time
* The rule `ruleTimeofdayTimeofdayInterval` does match `9Uhr-10` but not the
way more common expression `9-10Uhr`; added the same rule with the
second time as non-latent; actually I am not sure whether the original
rule makes sense at all
* Simple extension of `intersect by ,` to THE formal way in DE to express
a date (i.e. `Freitag, der 13.03.2013`)
General remark: I used UTF-8 characters albeit I saw that the other rules and examples use escaped hex encoding for e.g. German umlaute. If there is any reason to do that (it is not very readable), I will of course change that.
Closes https://github.com/facebookincubator/duckling/pull/19
Reviewed By: niteria
Differential Revision: D5070052
Pulled By: patapizza
fbshipit-source-id: 990ad08
Summary:
The numerical ordinal matching rule in DE is too broad. An ordinal like "1." may not be proceeded or followed by numbers.
* Added negative lookbehind - avoids matching the first "1." in "1.1" as an ordinal.
* Added negative lookahead - avoids matching the second "1." in "1.1. as an ordinal
Closes https://github.com/facebookincubator/duckling/pull/18
Reviewed By: patapizza
Differential Revision: D5069200
Pulled By: niteria
fbshipit-source-id: 0583076
Summary:
We already disallowed shallowly-nested intervals.
Interval of an intersection of an interval also seems
unlikely to produce anything useful.
For an input like:
"2016-Jul-29 07:00 - 2016-Jul-29 09:00 UTC"
it goes from:
```
(1.77 secs, 1,095,200,736 bytes)
```
to:
```
(1.33 secs, 857,167,480 bytes)
```
That's -25% time and -22% allocations.
Reviewed By: patapizza
Differential Revision: D5037492
fbshipit-source-id: 481dcdd
Summary:
Also added real-world test to English `Quantity` corpus ("3/4 cup", as a culinary example)
Closes https://github.com/facebookincubator/duckling/pull/14
Reviewed By: patapizza
Differential Revision: D5035990
Pulled By: niteria
fbshipit-source-id: c1b8f65
Summary:
Time dimension for Hebrew.
Commented out the failing tests that actually also fail in Clojure.
Reviewed By: JonCoens
Differential Revision: D4970308
fbshipit-source-id: b455142