Summary:
Current:
"seis dos de lar tarde" -> "dos de lar tarde" or 2pm; note
that the term "seis" is dropped.
Expected:
"seis dos de lar tarde" -> "seis dos de lar tarde"
or 6:02pm
Pull Request resolved: https://github.com/facebook/duckling/pull/496
Test Plan: H.io $ debug (makeLocale ES Nothing) "seis dos de la tarde" [This Time]
Reviewed By: chinmay87
Differential Revision: D22054328
Pulled By: yuanbing
fbshipit-source-id: 1ecb05885fc506176cc04768aa158279c7e7fd4f
Summary:
There are two types of ES phrases for timestamp to support:
1. "para las seis cero dos pm"
2. "para las 6 0 2 pm"
The solution is to:
1. added a new rule to parse two-digit number between 1 and 9 (inclusive);
2. modified the regex pattern to support additional optional phrase "para" in front of "las".
Reviewed By: chinmay87
Differential Revision: D22218800
fbshipit-source-id: 58f692beb6f10834c0ab639b31bf239bf4a1970e
Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/498
Test Plan:
In haxlsh:
H.io $ debug (makeLocale ES Nothing) "dos hora y treinta y cinco minutos" [This Duration]
Reviewed By: chinmay87
Differential Revision: D22054695
Pulled By: yuanbing
fbshipit-source-id: b4486141bf7ccb0e538e40ce40fadd7daef374a8
Summary:
This fix is to add support to parse alternative phrase, in ES, for "noon".
Currently the supported ES phrase for "noon" is "mediodia", the alternative form is "medio<whitespace*>dia".
Reviewed By: chinmay87
Differential Revision: D22188049
fbshipit-source-id: 798b83be75798f3b0d695a0f01a65dc84af98e22
Summary:
the rule is updated to conform with natural expression of "ordinal day of month".
Pull Request resolved: https://github.com/facebook/duckling/pull/495
Differential Revision: D22054297
Pulled By: yuanbing
fbshipit-source-id: d9d8e00311d4d3121685ab5b09f6c1f52f3077c9
Summary:
Please note that the major diff with the
existing rule for next week is that the new
phrase doesn't have the leading "la" or anything with
similar meaning.
Pull Request resolved: https://github.com/facebook/duckling/pull/493
Test Plan: Imported from GitHub, without a Test Plan: line.
Reviewed By: patapizza
Differential Revision: D21981169
Pulled By: yuanbing
fbshipit-source-id: 7478d1262c3a4599d359b485b28a547ad5f44b76
Summary:
The root cause was the error in parsing the ES numeral value [1-9] that spelled with two words instead of one.
For example "cero dos" should be parsed the as "dos". Currently it's being as two numeral values: 0 and 3.
Reviewed By: chinmay87
Differential Revision: D22162804
fbshipit-source-id: 949956935a21e742f6788e7afa788ff728dd9a8d
Summary:
the new rules could parse phrases in the form of
xxx upcoming weeks
upcoming xxx weeks
Pull Request resolved: https://github.com/facebook/duckling/pull/491
Test Plan: Imported from GitHub, without a Test Plan: line.
Differential Revision: D21959647
Pulled By: chinmay87
fbshipit-source-id: a062a8c7a6c2e23b921b1099b886fa589c69c454
Summary:
while computing a score used to rank in Duckling, it currently sums up the log likelihoods learned during training. While ranking, the goal is to find the (same span) parse candidate which is _more_ likely to lead to a *correct* parse. However, the old logic was summing up the "more confident of the two classes" log likelihood.From what I understand this is the part which feels wrong.
I created an example of two rules:
#1. a rule where the classifier learns that the rule is very confidently NOT the correct parse.
- okdata (positive class) is very low confidence (high negative number prior)
- kodata (negative class) is very high confidence (low negative number prior)
#2. a rule where the classifier is confident that it is the correct parse, but not Very Confident.
- okdata (positive class) is high confidence (nonzero, but low negative number prior)
- kodata (negative class) is very low confidence (high negative number prior)
these two rules match the same regex, thus the same span. While duckling parses it, it turns out, that rule #1 ranks higher than rule #2. The reason why is because #1 is MORE confident that it is the INCORRECT (does not contribute to) parse than rule #2. Does this make sense?
to solve this problem, I changed the ranking score estimation to use only the positive class scores (okdata). In the example above, it fixes it so rule #2 would end up ranking higher because the positive class confidence is higher than #1's positive class confidence.
Would really love some deeper input from Duckling experts. I re-learned haskell and learned haxl to craft a small example here, and I am very new to Duckling (just started reading the ranking code on Friday). I know Duckling is battle-tested but I also don't believe that means a bug can't exist. And further, this specific bug may not happen a whole lot for 2 reasons:
- there are not a lot of rules which end up higher negative confidence than positive (requires enough negative corpus examples over positive ones)
- ranking uses span width first, and only when the spans are equivalent does the score based ranking come into play. So it requires that 2 rules match the same span before any actual score calculation even matters.
Reviewed By: patapizza
Differential Revision: D22009276
fbshipit-source-id: 13491689d39d810da526fa4bb8b6e526d4cafd35
Summary:
Current:
if the fractional hour expression describes the hour fraction with term like "quarter or quarters", then duckling couldn't correctly recognize it.
Expected:
Duckling should be able to identify this kind of expression and parse it correctly.
Fix:
Add new rule to parse the fractional hour pattern that contains the keyword like "quarter or quarters".
Pull Request resolved: https://github.com/facebook/duckling/pull/485
Test Plan: Imported from GitHub, without a Test Plan: line.
Reviewed By: haoxuany
Differential Revision: D21850804
Pulled By: chinmay87
fbshipit-source-id: 818b7b3f37e3f8a6d1a7d579db19fb2cfb2763f4
Summary:
added new EN rule to parse the phrases that contain "midday".
Pull Request resolved: https://github.com/facebook/duckling/pull/490
Differential Revision: D21959562
Pulled By: chinmay87
fbshipit-source-id: f9ab45aecd551e8959d00b0025ed38b616ed6b14
Summary:
Current:
"el dia nueve" -> "9pm" of current day
Expected:
"el dia nueve" -> 9th of current or next month
Fix:
added new ES rule to handle the pattern like "el dia <day of month>"
Pull Request resolved: https://github.com/facebook/duckling/pull/487
Reviewed By: girifb
Differential Revision: D21850807
Pulled By: chinmay87
fbshipit-source-id: d8edd81273c7e5f700b440ccc8c7e7bded679051
Summary:
Current behavior:
"an hour and 45 minutes" -> parsed as "1 hour" [dimension: "Duration"]
"a minute and 30 seconds" ->parsed as "1 minute" [dimension: "Duration"]
Expected behavior:
"an hour and 45 minutes" -> "105 minutes" with dimension as "Duration"
"a minute and 30 seconds" -> "90 seconds" with dimension as "Duration"
The fix:
adding new rule to handle this duration composition
pattern. (<some duration> and <some other duration>)
Pull Request resolved: https://github.com/facebook/duckling/pull/483
Reviewed By: haoxuany
Differential Revision: D21850773
Pulled By: chinmay87
fbshipit-source-id: 62eb6859e0ce2b88cf8ae48d836a1a6a1ac8705d
Summary:
* Reduces size of final image from 5GB to 130MB
* Builds any checkout (not locked to the master)
* Doesn't run stack on CMD (executes static build of Duckling instead)
Pull Request resolved: https://github.com/facebook/duckling/pull/341
Reviewed By: chinmay87
Differential Revision: D21083018
Pulled By: patapizza
fbshipit-source-id: d909158f20f5b8da5b0248a25103b850797bc3a3
Summary:
When I was working on some related diffs, I noticed that there were some
asymmetries between the regexes for ruleIntervalMax and ruleIntervalMin:
- we had no support for "at most", even though we did have "at least"
- we had no support for "not? less than"
- the ordering of the different constructions didn't match
This a minor tweak to make things match better
Reviewed By: patapizza
Differential Revision: D20484594
fbshipit-source-id: c3c54a9cc1b83402e42634b7a98a1a3b8cc5e09c
Summary: Fix `ruleYearLatent` to be the same as the one in `en`. We don't want to match numerals that could have been hours.
Reviewed By: patapizza
Differential Revision: D20683975
fbshipit-source-id: cdef9b1b5f8a21dc5e207ed2a7afcad84c56a596
Summary:
When I first skimmed our rules for "half an hour" vs "an hour and a half"
I actually thought there might be a bug, because `timesOneAndAHalf`
sounds like it's actually multiplying by `1.5`.
There's no bug, the implementation is entirely correct, but it does
not multiply by 1.5, it adds .5 to any integer value at the given grain.
This diff renames the function to be more descriptive.
Handy trick for doing this kind of refactor without IDE tooling:
```
find duckling/Duckling/Duration/ -name 'Rules.hs'| xargs sed -i 's/timesOneAndAHalf/nPlusOneHalf/g'
```
Reviewed By: haoxuany
Differential Revision: D20456966
fbshipit-source-id: 35020685f091a41618b30b7e5f95dbfa48509b88
Summary:
This change applies roughly the same rules for supporting intervals
in Spanish AmountOfMoney that we suppor in English: intervals using
`entre _ e _` / `de _ a _` / `_ - _` with either money in both slots
or a number in the first slot and money in the second.
My Spanish is okay but not great - I'm confident these rules are good and
cover the most likely phrases, but there's probably room to add more coverage.
Reviewed By: patapizza
Differential Revision: D20425979
fbshipit-source-id: deb17fc331e1aa192d91dd47bc7f3864a246f0be
Summary:
Leveraging `predNthClosest` helper in English rules.
"the second closest monday to february 6"
"the closest tax day to boss day 2018"
Reviewed By: haoxuany
Differential Revision: D20214444
fbshipit-source-id: b6be32f63097d221aa7ccc6df4e3639e4deee4a9
Summary:
Adding locale rules for ES Numeral because Spain use "," as decimal but south american country use "." as decimal.
Wiki: https://en.wikipedia.org/wiki/Decimal_separator
Reviewed By: haoxuany
Differential Revision: D20040111
fbshipit-source-id: e2a4bfc2928df19976ef98e90ee82e7d21b52313
Summary: Supporting "orthodox good friday" in addition to "orthodox great friday" in the regex
Reviewed By: chinmay87
Differential Revision: D19604033
fbshipit-source-id: c6ca68fc34e284304ca2ba07a8f1bf81378c3558
Summary:
- Setup Afrikaans (AF) language
- Added Numeral Dimension
Some of the paths have changed, and some extra files were necessary, after
basing initial work off 24d3f19976
I followed some of the Numeral examples from Dutch as well as Hungarian,
since Afrikaans and Dutch have some similarities.
One thing was examples for numbers having the number as an example, which I
didn't do here, because I'm not sure it's necessary.
Pull Request resolved: https://github.com/facebook/duckling/pull/422
Reviewed By: awalterschulze
Differential Revision: D18348617
Pulled By: patapizza
fbshipit-source-id: b8c4218629c264b48d6f2cecc4c23e2e281a64da
Summary: apparently this is breaking the external build, fix this
Reviewed By: patapizza
Differential Revision: D19104360
fbshipit-source-id: bc75f698b483a7f4f5b2905e11cf52fd36c1f0a9
Summary: modified the regex pattern for minutes to include m alone, as well as the regex pattern for ruleDurationDotNumeralHours to pass h, hr, and hrs
Reviewed By: patapizza
Differential Revision: D18799727
fbshipit-source-id: df4d0bd53407b427254169454e647e43e073795e
Summary:
Hello,
I am new to Haskell, but I would like to add Thai language (TH) to Duckling.
I have tried to extended Duckling by adding Numeral dimension for new language TH.
Please have a look at it and see what we can improve.
Thanks!
Pull Request resolved: https://github.com/facebook/duckling/pull/399
Reviewed By: patapizza
Differential Revision: D17651508
Pulled By: haoxuany
fbshipit-source-id: 4b3ee1352f239eee637958f5e9dce68430352a0a
Summary: Make test failure outputs readable by proper printing of `Data.Text`, using the `unpack` function rather than relying on the implementation of the `Show` typeclass for `Text`
Reviewed By: patapizza
Differential Revision: D18367058
fbshipit-source-id: b5aece3c8818f16dfe4c55235f6b9a183ba6f70f
Summary: We weren't capturing cases like "the second of february" as it was matching with the "the <cycle> of <time>" rule
Differential Revision: D18249651
fbshipit-source-id: 09e214f585b96d07af4d5043de61445f4e156c54
Summary: We weren't capturing cases like "the first Saturday of the month", due to "the month" not being properly parsed.
Reviewed By: haoxuany
Differential Revision: D18193355
fbshipit-source-id: 2c4e83a3f22b0fe306ce7662ade85434a0016784
Summary: This got removed in a previous commit, readd this to confirm this functionality is still working.
Reviewed By: haoxuany
Differential Revision: D18175640
fbshipit-source-id: 3d06efe3537e1a517f412ed739f3cc34a9b3105b
Summary:
Parts of day are time ranges, e.g. "tonight" is a range from 6:00pm to midnight. We have intersect logic in place to resolve a string like "tonight at 7pm" to one time, at 7pm. But if the time is outside of the part of day's range (e.g. "tonight at 5pm"), the string is resolved to 2 separate times ("tonight" and "at 5pm").
These changes resolve e.g. "tonight at xx" to "xx" irrespective of the range of tonight, as long as the am/pm makes sense (so "tonight at 5am" would still resolve to 2 separate times - "tonight" and at "5am").
"this/early morning at xx" gets resolved to "xx am". All other parts of day get resolved to "xx pm", with one exception: all parts of day resolve "... at 12" to midnight.
Differential Revision: D17694898
fbshipit-source-id: 1e24023759bb942659285d18a6a4d0b09f77c9da
Summary: Added support for Rama Navami holiday from 2000 to 2030
Reviewed By: chinmay87
Differential Revision: D17881237
fbshipit-source-id: f3f17d67d178fa8fbcb8ae640c3bfc17bc3e21d3
Summary: Resolves durations such as "2 hours and ten" to 130 minutes or "1 hour and 15" to 75 minutes.
Reviewed By: zhpzuo
Differential Revision: D17822118
fbshipit-source-id: 7da5c0e43ced91cb924046f764c133a66af8ee4d
Summary: Added support for Ganesh/Vinayaka Chaturthi Hindu holiday from 2000 to 2030
Reviewed By: haoxuany
Differential Revision: D17675368
fbshipit-source-id: 2d53ad2592fc8d234bd7a3cbac2bddeaa45b220b
Summary:
Refactor Time/PL code by reusing mkRuleHolidays and mkRuleSeasons, and guarding against isOkWithThisNext for ruleLastTime, ruleNextTime, ruleThisTime,
,and mkOkForThisNext for ruleWeekend
Add another polish holidays
Reviewed By: chinmay87
Differential Revision: D17395534
fbshipit-source-id: d4ec591b0aad71f8f5e144ff5274491d55dc97f6
Summary:
Use of ANN pragma may slow down compilation time because of TemplateHaskell.
Because of that, using comment style ignore would be preferable.
For more information on ways to ignore hints with hlint, please see
https://github.com/ndmitchell/hlint#ignoring-hints
Reviewed By: patapizza
Differential Revision: D17365266
fbshipit-source-id: 71e4952738bba17b4d2ec2a18b31b4b7e3f509db