Commit Graph

99 Commits

Author SHA1 Message Date
Satya Bodduluri
da41db3766 Added ruleIntervalDDDDMonth to EN
Summary: Added ruleIntervalDDDDMonth to EN to handle cases such as "23rd to 26th Oct" and "1-8 september"

Reviewed By: patapizza

Differential Revision: D5637280

fbshipit-source-id: a1fdcd2
2017-08-16 14:34:26 -07:00
Jesse Hellemn
98b58647b1 Adapting Spanish rules to handles names days and months in the same rules
Summary: Moved all named days to the same rule, moved all named months to the same rule. Kept same regexes, just consolidated them.

Reviewed By: patapizza

Differential Revision: D5637061

fbshipit-source-id: e08ecf9
2017-08-16 09:34:24 -07:00
Atyansh Jaiswal
e7431739ec Fixed Ordinal parsing for format "August 27th-30th"
Summary: Changed ruleIntervalMonthDDDD to use the ordinal predicate instead of ugly regex

Reviewed By: patapizza

Differential Revision: D5628188

fbshipit-source-id: 1dbe195
2017-08-15 10:34:37 -07:00
Hiten Parmar
be113689ac Add support for parsing day intervals beginning with from "from 10 to 16 August"
Summary: Added EN rule "ruleIntervalFromDDDDMonth" to support "from 10 to 16 August". Used "isDOMValue" helper rather than regex.

Reviewed By: patapizza

Differential Revision: D5610623

fbshipit-source-id: 00a5208
2017-08-12 02:19:23 -07:00
dubovinszky
24d3f19976 HU Setup + Numeral
Summary:
- Setup Hungarian (HU) language
- Added Numeral Dimension
Closes https://github.com/facebookincubator/duckling/pull/79

Reviewed By: blandinw

Differential Revision: D5595812

Pulled By: patapizza

fbshipit-source-id: 5959938
2017-08-09 17:49:56 -07:00
Veselin Stoyanov
5d03b45af9 Setup Bulgarian language and Numeral Dimension
Summary:
- Setup Bulgarian (BG) language
- Added Numeral Dimension
Closes https://github.com/facebookincubator/duckling/pull/78

Reviewed By: niteria

Differential Revision: D5575513

Pulled By: patapizza

fbshipit-source-id: e566155
2017-08-09 08:19:24 -07:00
Julien Odent
9037126937 [Duckling] Time/DE: don't parse 'nächste 5'
Summary: Fixes https://github.com/wit-ai/wit/issues/694.

Reviewed By: niteria

Differential Revision: D5590023

fbshipit-source-id: 6356615
2017-08-09 06:49:26 -07:00
Julien Odent
61800297c8 Time/PL: don't parse 'nie' as Time
Summary: 'nie' means 'no' in Polish, and isn't a common abbreviation for 'niedziela' (Sunday).

Reviewed By: blandinw

Differential Revision: D5587036

fbshipit-source-id: bfda7fc
2017-08-08 16:49:58 -07:00
Julien Odent
a84eb62180 Time: Fix nthDOWOfMonth
Summary:
Fixes #65.
* fixes US holidays
* Black Friday is actually the first day after Thanksgiving day (not necessary the fourth Friday of November)

Reviewed By: JonCoens

Differential Revision: D5533906

fbshipit-source-id: 1824cba
2017-08-01 09:04:31 -07:00
Julien Odent
ef461c3133 Time/FR: don't parse 'a un'
Summary:
In French, the form "at hh" is not valid (it requires an hour indicator).
This fixes false positives such as in "John a un rendez-vous."

Fixes https://github.com/wit-ai/wit/issues/666.

Reviewed By: JonCoens

Differential Revision: D5530713

fbshipit-source-id: ecee1e5
2017-08-01 08:49:41 -07:00
Julien Odent
8710b9d59b Time/EN: for duration from time
Summary: Fixes #57.

Reviewed By: blandinw

Differential Revision: D5495075

fbshipit-source-id: 018fd29
2017-07-26 11:49:44 -07:00
Julien Odent
7a6c2597af Time: don't shift duration on <duration> before/after <time> (but now)
Summary:
* `Duration` before/after `Time` now resolves with the lowest grain
* "now" has an undefined grain `NoGrain`, as depending on the context it might mean different things, as opposed to "right now"

Before:
`day after tomorrow` -> `day` grain
`1 day after tomorrow` -> `hour` grain

Given that the reference date/time is `2013-02-12T04:30:00`.
`one year from now` -> `2014-02-01T00:00:00` with `month` grain.
`one year from today` -> `2014-02-01T00:00:00` with `month` grain.

After:
`day after tomorrow` -> `day` grain
`1 day after tomorrow` -> `day` grain
`one year from now` -> `2014-02-12T04:30:00` with `month` grain (remains the same).
`one year from today` -> `2014-02-12T00:00:00` with `day` grain.

For other `Time` entities involving `Duration`, such as "in + `Duration`", the behavior remains the same: shift to the lower grain (the intent is not precise).

Reviewed By: l5t, blandinw

Differential Revision: D5467164

fbshipit-source-id: b63b6a4
2017-07-26 11:49:44 -07:00
chiralcarbon
ce0e9e4f50 Add MM/YYYY and MM/YY
Summary:
MM/YY is a common format for dates in India,UK and other parts of the world.Have added testcases in `Time/EN/corpus.hs` ,however it conflicts with one of the original(2/15 is output now as Feb. 2015 and not the 15th of February).
Closes https://github.com/facebookincubator/duckling/pull/59

Reviewed By: niteria

Differential Revision: D5455881

Pulled By: patapizza

fbshipit-source-id: 23b73a5
2017-07-20 11:20:02 -07:00
Julien Odent
a55629d541 Fix #53
Summary: Added a rule to handle "from <month> dd-dd" to fix #53.

Reviewed By: blandinw

Differential Revision: D5329214

fbshipit-source-id: a5f746d
2017-06-27 12:49:23 -07:00
Julien Odent
bfb6ba0387 Numeral flag for Time patterns
Summary:
Today things like `at single`, `at a few`, `at a couple of` would return a `Time`.
Discussed with blandinw to do this very explicit hack right now until other use cases show up.

Reviewed By: niteria

Differential Revision: D5325369

fbshipit-source-id: aec0402
2017-06-27 07:34:21 -07:00
chao pan
efc1f36494 fix issue #50
Summary: Closes https://github.com/facebookincubator/duckling/pull/52

Reviewed By: blandinw

Differential Revision: D5296543

Pulled By: patapizza

fbshipit-source-id: 041844a
2017-06-23 07:04:27 -07:00
Sebastian Mika
291bd28873 Various smaller DE time improvements
Summary:
This PR contains various smaller but - at least on my data - important performance improvements for matching of German time and time range expressions.

I evaluated this on approx 11.000 time and time range expressions taken from emails (rather formal business travel requests) that have been manually annotated with the "true" time. Comparing this branch to the current master (`d6f8dd`) I get e.g. approx. 80% of the duckling results within +/- 1h of the true value (hours are the smallest grain in my data), vs. only 70% in the master. Other indicators I checked (time/range confusion, other thresholds, failures to find anything in the first place, etc.) were all improved as well.

**Changes**:

* [significant performance plus] added a rule `ruleDateDateInterval` that handles variations of "13.-15.10." correctly. Here the common case is that "13." refers to "13.10." and not "13.CURRENTMONTH". I didn't see an obvious way to fix that in the `<datetime> - <datetime>` rule.
* [significant performance plus] In `ruleMmdd` (which matches expressions like "13.03." in German), I made the last dot optional. At least in less formal text this is quite common to be forgotten. Also here and in `ruleDateDateInterval` I changed the order of the terms in the regular expression matching the month to prefer matching e.g. "10" over matching "1"+"no dot".
* [minor] treat "14/15Uhr" the same as "14-15Uhr"
*  [minor] Extended "bis" to also match "bis zum" and "auf den" (e.g. in "von Montag bis zum Freitag" or "von Dienstag auf Mittwoch")
* [minor] Changed `hh:mm` matching to also get the rather esoteric expression "17h00" - should do no harm.
Closes https://github.com/facebookincubator/duckling/pull/54

Reviewed By: blandinw

Differential Revision: D5301815

Pulled By: patapizza

fbshipit-source-id: 8766caf
2017-06-23 07:04:27 -07:00
tarung-ml
d6f8ddc064 Support dates of type mm-dd
Summary:
e.g. "New York from 10-6 to 10-22" currently extracts: HH-MM. Instead, it should extract mm-dd i.e. October 10th to October 22nd.
Closes https://github.com/facebookincubator/duckling/pull/48

Reviewed By: niteria

Differential Revision: D5292473

Pulled By: patapizza

fbshipit-source-id: 04f1a4b
2017-06-22 04:04:26 -07:00
Adrien Menella
3e37bd0f71 Add rules for FR Time
Summary:
Add / modif rules to support:
  * `début|milieu|fin de matinée` (`early|mid|late morning`)
  * `début|milieu|fin de après-midi` (`early|mid|late afternoon`)
  * `début|milieu|fin de journée` (`early|mid|late day`)
  * `début|fin de soirée` (`early|late evening`)
  * `début|fin de mois` (`early|late month`)
  * `début|fin d'année` (`early|late year`)
  * `plus tard` (`later`)
  * `plus tard que ...` (`later than ...`)
  * `plus tard <time>` (`later than <time>`)
  * `plus tard <part-of-day>` (`later than <part-of-day>`)

And add additional corpus' examples for FR Time
Closes https://github.com/facebookincubator/duckling/pull/49

Reviewed By: blandinw

Differential Revision: D5292472

Pulled By: patapizza

fbshipit-source-id: 2f40d29
2017-06-21 15:04:27 -07:00
Julien Odent
213f94dda7 Time: don't parse 'this (past )?one'
Summary:
'one' is a latent time of day.
Restricting a couple of rules to accept non-latent time tokens.

Reviewed By: blandinw

Differential Revision: D5293972

fbshipit-source-id: 07cdb9b
2017-06-21 15:04:27 -07:00
Şeref R.Ayar
b0574191ac Support early/mid/late <month> #35
Summary: Closes https://github.com/facebookincubator/duckling/pull/43

Reviewed By: blandinw

Differential Revision: D5264974

Pulled By: patapizza

fbshipit-source-id: 0330f2b
2017-06-16 12:04:29 -07:00
Julien Odent
cd3f3dd2f4 Time/EN: afternoonish
Summary: Apparently it's a thing.

Reviewed By: blandinw

Differential Revision: D5199823

fbshipit-source-id: d2ed2aa
2017-06-09 09:34:20 -07:00
Heejin (macbook)
9a49b7652b KO/Time: Fix typo
Summary:
Found a common mistake.
Closes https://github.com/facebookincubator/duckling/pull/38

Reviewed By: niteria

Differential Revision: D5182189

Pulled By: patapizza

fbshipit-source-id: 182a325
2017-06-05 08:49:21 -07:00
chao pan
292a94128f Generalize mm/dd rules to accept spaces like '05 / 27', '05/ 27'
Summary:
The existing "mm/dd" rules only accepts format like "05/27"; However, in practice there might be extra spaces like "05 / 27", "05/ 27". The pull requests tweaks the regex to accept extra space.
Closes https://github.com/facebookincubator/duckling/pull/31

Reviewed By: niteria

Differential Revision: D5147118

Pulled By: patapizza

fbshipit-source-id: f6a5069
2017-05-30 09:49:55 -07:00
kkpoon
001ff291fc Add Cantonese support to date time into ZH
Summary:
Add the [Cantonese](https://en.wikipedia.org/wiki/Cantonese) (the official spoken language used in Hong Kong) support to date time

- updated Duration ZH corpus
- updated Time ZH rules and corpus
- updated TimeGrain ZH rules
Closes https://github.com/facebookincubator/duckling/pull/24

Reviewed By: patapizza

Differential Revision: D5143947

Pulled By: niteria

fbshipit-source-id: 9107d05
2017-05-30 07:49:19 -07:00
Julien Odent
4ed89d29de DE/Time: Convert unicode char to hexadecimal
Summary: I missed that one.

Reviewed By: niteria

Differential Revision: D5079134

fbshipit-source-id: 2eacbea
2017-05-18 08:04:45 -07:00
Sebastian Mika
39cb76024b Smaller improvements to DE about/before/after
Summary:
* In DE `frühestens` and `spätestens` act implicitly as `nach` and `vor` (after and before) on times and may also appear after the time

* The rule `ruleTimeofdayTimeofdayInterval` does match `9Uhr-10` but not the
way more common expression `9-10Uhr`; added the same rule with the
second time as non-latent; actually I am not sure whether the original
rule makes sense at all

* Simple extension of `intersect by ,` to THE formal way in DE to express
a date (i.e. `Freitag, der 13.03.2013`)

General remark: I used UTF-8 characters albeit I saw that the other rules and examples use escaped hex encoding for e.g. German umlaute. If there is any reason to do that (it is not very readable), I will of course change that.
Closes https://github.com/facebookincubator/duckling/pull/19

Reviewed By: niteria

Differential Revision: D5070052

Pulled By: patapizza

fbshipit-source-id: 990ad08
2017-05-17 10:19:44 -07:00
Sebastian Mika
b00e5faeac Fix DE numerical ordinal matching
Summary:
The numerical ordinal matching rule in DE is too broad. An ordinal like "1." may not be proceeded or followed by numbers.

* Added negative lookbehind - avoids matching the first "1." in "1.1" as an ordinal.
* Added negative lookahead - avoids matching the second "1." in "1.1. as an ordinal
Closes https://github.com/facebookincubator/duckling/pull/18

Reviewed By: patapizza

Differential Revision: D5069200

Pulled By: niteria

fbshipit-source-id: 0583076
2017-05-16 09:49:21 -07:00
Julien Odent
37829902b7 CS: Setup + basic Numeral
Summary:
* Setup for Czech
* Basic `Numeral` (0-10 integers + digits) from http://www.omniglot.com/language/numbers/czech.htm

Reviewed By: JonCoens

Differential Revision: D5044775

fbshipit-source-id: b5cd9d2
2017-05-11 09:49:27 -07:00
Bartosz Nitka
3650a5b5d1 Disallow nested intervals
Summary:
We already disallowed shallowly-nested intervals.
Interval of an intersection of an interval also seems
unlikely to produce anything useful.

For an input like:
"2016-Jul-29 07:00 - 2016-Jul-29 09:00 UTC"
it goes from:
```
(1.77 secs, 1,095,200,736 bytes)
```
to:
```
(1.33 secs, 857,167,480 bytes)
```
That's -25% time and -22% allocations.

Reviewed By: patapizza

Differential Revision: D5037492

fbshipit-source-id: 481dcdd
2017-05-10 12:04:18 -07:00
Matt Schultz
ff9b54ad43 Added English fractional Numeral rule (ex: "3/4", "1/2", "5/7")
Summary:
Also added real-world test to English `Quantity` corpus ("3/4 cup", as a culinary example)
Closes https://github.com/facebookincubator/duckling/pull/14

Reviewed By: patapizza

Differential Revision: D5035990

Pulled By: niteria

fbshipit-source-id: c1b8f65
2017-05-10 07:04:16 -07:00
Julien Odent
d3d3703015 HE: Time
Summary:
Time dimension for Hebrew.
Commented out the failing tests that actually also fail in Clojure.

Reviewed By: JonCoens

Differential Revision: D4970308

fbshipit-source-id: b455142
2017-04-28 10:04:35 -07:00
Julien Odent
5ba2c9e9a1 NB: Bringing latest changes
Summary:
* Numeral: fixed "hundre" (not "hundred")
* Numeral: added "tretti", "søtti"
* Time: updated last times to support "sist"
* Time: christmas days

Reviewed By: niteria

Differential Revision: D4958919

fbshipit-source-id: e4eecf5
2017-04-28 08:04:22 -07:00
Julien Odent
35b9101c48 VI: Time
Summary:
* Time dimension for Vietnamese.
* Expose `debugContext`.

Reviewed By: niteria

Differential Revision: D4963594

fbshipit-source-id: 2373735
2017-04-28 08:04:21 -07:00
Julien Odent
0370c452f1 Time
Summary: Time dimension for Croatian.

Reviewed By: niteria

Differential Revision: D4954399

fbshipit-source-id: 906c4a6
2017-04-26 09:19:27 -07:00
Julien Odent
840deda7dd Setup + Numeral
Summary: Setup + Numeral dimension for Croatian.

Reviewed By: niteria

Differential Revision: D4946964

fbshipit-source-id: 204429b
2017-04-26 09:19:26 -07:00
Bartosz Nitka
924516103b Revert Duckling part of 'clean up unused imports'
Summary: it doesn't take .cabal into account

Reviewed By: patapizza

Differential Revision: D4938400

fbshipit-source-id: 8bc99a5
2017-04-24 07:34:27 -07:00
Julien Odent
bd96d3dd95 Setup + Numeral
Summary: Setup for Hebrew + Numeral dimension

Reviewed By: niteria

Differential Revision: D4930041

fbshipit-source-id: 965132b
2017-04-24 06:49:40 -07:00
Bartosz Nitka
b26aa7d84d clean up unused imports
Summary:
This diff was generated by running `hsclimps`

PLEASE TAKE ONE OF THE FOLLOWING ACTIONS AS SOON AS POSSIBLE:
  1) Select Accept and Ship to land this change
  2) If you have issues with this diff, request changes
  3) If you are no longer the owner, add reviewers and update the `.context` file with the appropriate owner

NOTE: If the diff is unable to land because of a merge conflict I will automatically update it for you.

#accept2ship

Reviewed By: niteria

Differential Revision: D4937839

fbshipit-source-id: bb3d330
2017-04-24 05:19:24 -07:00
Bartosz Nitka
f7b3f2ed73 Detect interval contradictions sooner
Summary:
So far contradictions from intersection only
propagated through intersection. This change
makes it so that it also propagates through intervals
and lets intervals also generate contradictions.

Reviewed By: patapizza

Differential Revision: D4864160

fbshipit-source-id: 8348267
2017-04-10 16:35:27 -07:00
Bartosz Nitka
290ca48e25 Fix 4:23am returning 5:23am
Summary:
This is the easiest way to fix it, but talking offline
with Julien, we may need to revisit.
It basically gets rid of time series where we were
producing intervals that are not a multiply of the grain.

Reviewed By: patapizza

Differential Revision: D4841759

fbshipit-source-id: 1c4742a
2017-04-06 11:04:16 -07:00
Bartosz Nitka
bd94622f64 Move tests to tests and exes to exe
Summary:
This works around https://github.com/haskell/cabal/issues/4350
If we don't do this files get compiled multiple times
and cabal is unhappy.

Reviewed By: patapizza

Differential Revision: D4782749

fbshipit-source-id: 5bbe425
2017-03-27 16:04:24 -07:00
Julien Odent
33fa98734a Fix 'no dia 20'
Summary:
* 'no dia 20' (on the 20)
* Unifying two rules into one, with a day grain

See https://github.com/wit-ai/wit/issues/388

Reviewed By: blandinw

Differential Revision: D4715780

fbshipit-source-id: e990954
2017-03-15 13:49:17 -07:00
Jonathan Coens
41800a3171 Move onto dependent-sum instead of custom local data Some
Summary:
No need to reinvent the wheel when `dependent-sum` has what we need. I re-export `Some(..)` from `Duckling.Dimensions.Types` to cut down on import bloat.
Instead of a `Read` instance I created a `fromName` function.

Reviewed By: zilberstein

Differential Revision: D4710014

fbshipit-source-id: 1d4e86d
2017-03-15 10:34:17 -07:00
Bartosz Nitka
28d53fce30 Remove ruleIntersect2
Summary:
It is no longer necessary after D4676812 and D4698788.
`"I have 9 am 12 pm 1 pm 2pm 4 pm 3 pm on Saturday"` now works in
less than a second, it used to be 10s.

The test suite also got 3s faster.

Reviewed By: patapizza

Differential Revision: D4701890

fbshipit-source-id: 107a55f
2017-03-14 05:04:12 -07:00
Zejun Wu
3001604548 Clean redudant parentheses to test landcastle
Summary:
Clean redudant parentheses to test landcastle
opt-out-review

Differential Revision:
D4703203

verified-sandcastle

fbshipit-source-id: def175d
2017-03-13 18:19:24 -07:00
Bartosz Nitka
003604dce7 Optimize simple time predicates
Summary:
This is the next step for:
https://fb.facebook.com/groups/527352907463243/permalink/600056483526218/

This:
* changes the time language to be able to track contradictions (`EmptyPredicate`)
* changes the time language to be able to collect non-contradicting pieces, like month and hour and unify them
* provides an efficient way to convert those pieces into (past,future) time series
* adds AMPM predicate runner - there's a bit of overlap with is12H, but it basically works
* changes a test case that was wrong before
* regenerates classifiers, I'm not sure why they changed exactly

Before:
```
res <- H.io $ let sentence = "10am thurs 4.30 thurs 12pm sat" in (debugTokens sentence $ analyze sentence (testContext {lang = EN}) HashSet.empty)
(15.50 secs, 6,171,188,928 bytes)

res <- H.io $ let sentence = "I have 9 am 12 pm 1 pm 2pm 4 pm 3 pm on Saturday" in (debugTokens sentence $ analyze sentence (testContext {lang = EN}) HashSet.empty)
(110.82 secs, 44,031,569,512 bytes)
```

After:
```
res <- H.io $ let sentence = "10am thurs 4.30 thurs 12pm sat" in (debugTokens sentence $ analyze sentence (testContext {lang = EN}) HashSet.empty)
(1.24 secs, 703,020,912 bytes)

res <- H.io $ let sentence = "I have 9 am 12 pm 1 pm 2pm 4 pm 3 pm on Saturday" in (debugTokens sentence $ analyze sentence (testContext {lang = EN}) HashSet.empty)
(9.51 secs, 5,891,109,592 bytes)
```

Reviewed By: JonCoens

Differential Revision: D4676812

fbshipit-source-id: 9810203
2017-03-13 17:04:10 -07:00
Julien Odent
2e50aa5ea0 Fix 'tomorrow July' + IT fixes
Summary:
* we weren't checking the right reference time in `takeNth` and `takeN`
* fixing resulting failing tests for `IT`
* `analyzedNTest` to check that input results in `n` parsed tokens

Reviewed By: niteria

Differential Revision: D4698788

fbshipit-source-id: 2cd4762
2017-03-13 12:04:17 -07:00
FBShipIt
3f8e52e70a Initial commit
fbshipit-source-id: 301a10f448e9623aa1c953544f42de562909e192
2017-03-08 10:33:56 -08:00