Summary:
Found a lacking frequent duration in German and a small typo in the existing one.
Pull Request resolved: https://github.com/facebook/duckling/pull/509
Reviewed By: patapizza
Differential Revision: D24690104
Pulled By: chessai
fbshipit-source-id: b49a7a636abf5b92f2fe7c0d5b2ca2fe64acbaa2
Summary:
Spanish (ES) will now have all the same quantity rules as English (EN) (which I think is the most-supported language), plus more.
This includes the following:
* bowls - (bol(es)?|tazón(es)?|cuencos?|platos? (soperos?)|(hondos?)) (EN does not currently have this)
* cups - (tazas?)
* dishes - (platos?|fuentes?) (EN does not currently have this)
* grams - (((m(ili)?)|(k(ilo)?))?g(ramo)?s?)
* ounces - ((onzas?)|oz)
* pints - (pintas?) (EN does not currently have this)
* pounds - ((lb|libra)s?)
* quarts - (cuartos? de galón) (EN does not currently have this)
* tablespoons - (cucharadas? (grande)?) (EN does not currently have this)
* teaspoons - (cucharaditas?) (EN does not currently have this)
Reviewed By: patapizza
Differential Revision: D24628214
fbshipit-source-id: 2e8d500661f30fa0928cb7d3f21470afc01e2285
Summary:
The Dockerfile build part did not copy the Duckling implementation into the container, making the build fail.
I also harmonized the target Debian to Buster, that is the one currently hidden behind `haskell:8`.
Pull Request resolved: https://github.com/facebook/duckling/pull/539
Reviewed By: patapizza
Differential Revision: D24688839
Pulled By: chessai
fbshipit-source-id: 0ffcc4d28a599b7edad668730117828d26e116ad
Summary:
This PR accomplishes several things:
- removes dist-newstyle (local build artifacts should not be checked in)
- extends the .gitignore to include many common build artifacts/editor artifacts
- allow more modern dependencies (upper bounds of many were out of date by one or two years' worth of releases)
- upgrade stack lts (9.2 -> 14.2) to GHC 8.6.5
- regenerate .travis.yml using the now-standard haskell-ci (many haskell core libraries use this), instead of the outdated script that was maintained by hvr; as a precursor to this, the tested-with versions were updated
Reviewed By: patapizza
Differential Revision: D24623967
fbshipit-source-id: 838fe571df0b8d44106349659ce8ce8ab82f0bc6
Summary:
Pull Request resolved: https://github.com/facebook/duckling/pull/533
In recent versions of Data.Some the name of the constructor, `This` has changed name to `Some`. This has become rather problematic for us to migrate so we're just going to remove the dependency. The meat of this diff is adding the type `Seal` to `Duckling.Types`. That type replaces `Some`.
Reviewed By: pepeiborra
Differential Revision: D23929459
fbshipit-source-id: 8ff4146ecba4f1119a17899961b2d877547f6e4f
Summary:
"so" is an adverb in German: https://github.com/wit-ai/wit/issues/1860
It's also a short form for "Sonntag" (Sunday); making the dot mandatory.
Reviewed By: haoxuany
Differential Revision: D22900791
fbshipit-source-id: 8dc873f79a21ca2add074f9c664e84fae56f1e67
Summary:
**Summary**
**Current**
`stack test` fails with an error "output was redirected with -o, but no output will be generated
because there is no Main module"
**Expected**
`stack test` should run tests to completion
The cause here seems to be that the [`main-is` flag](a88e0669f7/duckling.cabal (L851)) supplies the *filename* in which to begin tests, but expects to find a *module* named `Main` there by default.
Two possible fixes are possible - either:
- [Add a ghc-options flag](https://github.com/facebook/duckling/issues/505#issue-650474748) to specify a module name; confusingly the flag name is also `main-is`
- Use the default `Main` module name within TestMain.hs
(the approach taken here is the latter, since this avoids duplicating use of flags named `main-is` in slightly different contexts)
**References**
- https://github.com/facebook/duckling/issues/505
- https://github.com/haskell/cabal/issues/4315
**Version Info**
```sh
$ stack --version
1.9.3.1 x86_64
Compiled with:
- Cabal-2.4.0.1
# <remainder of output omitted>
```
Resolves https://github.com/facebook/duckling/issues/505
Pull Request resolved: https://github.com/facebook/duckling/pull/512
Reviewed By: girifb
Differential Revision: D22799888
Pulled By: patapizza
fbshipit-source-id: 2c0808790e6671e6bc3c9b1f322e57b8dc32a8cc
Summary: Currently the term "coming" is being treated the same way as "this" or "current". The expected treatment should be the same as the term "next".
Reviewed By: chinmay87
Differential Revision: D22435156
fbshipit-source-id: b0b20d8a38014267fb7d037b685ce126f602bda7
Summary:
Current:
"seis cero cinco pm" [dimension Time] -> "cero cinco pm" or "5 pm"
here the term "seis" was dropped because it was treated as "6" in "Numeral" dimension.
Expected:
"seis cero cinco pm" -> "6:05 pm"
The root cause was that the rule "<hour-of-day> <integer> (as relative minutes)" dropped the first term "hour-of-day" if it was parsed as a latent token.
Reviewed By: chinmay87
Differential Revision: D22553028
fbshipit-source-id: abc92bb369c23d2b3084641eab2a2dabb87dbc66
Summary:
There are two rules for parsing "manana" (dimension: Time): one is resolved to "morning"; while the other is resolved to "tomorrow". And the first (or "morning") rule resolves to a LATENT result; while the second (or "tomorrow") rule resolves to a NON-LATENT result.
If the duckling is called with "latent" option turned off, the "tomorrow" rule prevails. However, if the duckling is invoked with "latent" option turned on, the "morning" rule is preferred.
The solution (for now) is to steer the classifier towards "tomorrow" rule by adding large number of (same) examples for "tomorrow" rule.
Reviewed By: chinmay87
Differential Revision: D22425277
fbshipit-source-id: 2f139eec0c38b9b5227f27d9f09f6264e7cf86cd
Summary:
The root cause is this lacking of support for the composition of numerals in ES.
For example, "mil novecientos noventa" is parsed 3 individual numbers: 1000, 900 and 90 correspondingly. Instead, the expected result is a single numeral value that is the sum of aforementioned three numbers. The same expection can be extended to the composition with arbitrary number of numeral values.
Reviewed By: chinmay87
Differential Revision: D22192034
fbshipit-source-id: 476489145b83297b82d88f3451020c867e2d08aa
Summary:
Current:
"first monday of last month" -> the date of first monday starting from current time. Note here the term "last month" is dropped
Expected:
"first monday of last month" -> the date of first monday of previous month.
Reviewed By: chinmay87
Differential Revision: D22300243
fbshipit-source-id: 16622860c52ec2ce9c7a7bcd6094192255aa5a0b
Summary:
Current:
"twelve zero three" -> 12:00pm
Expected:
"twelve zero three" -> 12:03pm
The root cause was that duckling doesn't support this kind of pattern for timestamp. The uniqueness here was that the number "three" was spelled as "zero three" that Duckling failed to understand.
Reviewed By: chinmay87
Differential Revision: D22313140
fbshipit-source-id: 9e481a142a16b94c61b1770e7f8be036497419f8
Summary:
current:
last friday in october -> the date of Friday of previous week
expected:
last friday in october -> the data of last Friday of month october
Reviewed By: chinmay87
Differential Revision: D22201326
fbshipit-source-id: 1983c1b9c24aa356977af7def42d5ba07c7f08be
Summary:
Current:
"seis dos de lar tarde" -> "dos de lar tarde" or 2pm; note
that the term "seis" is dropped.
Expected:
"seis dos de lar tarde" -> "seis dos de lar tarde"
or 6:02pm
Pull Request resolved: https://github.com/facebook/duckling/pull/496
Test Plan: H.io $ debug (makeLocale ES Nothing) "seis dos de la tarde" [This Time]
Reviewed By: chinmay87
Differential Revision: D22054328
Pulled By: yuanbing
fbshipit-source-id: 1ecb05885fc506176cc04768aa158279c7e7fd4f
Summary:
There are two types of ES phrases for timestamp to support:
1. "para las seis cero dos pm"
2. "para las 6 0 2 pm"
The solution is to:
1. added a new rule to parse two-digit number between 1 and 9 (inclusive);
2. modified the regex pattern to support additional optional phrase "para" in front of "las".
Reviewed By: chinmay87
Differential Revision: D22218800
fbshipit-source-id: 58f692beb6f10834c0ab639b31bf239bf4a1970e
Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/498
Test Plan:
In haxlsh:
H.io $ debug (makeLocale ES Nothing) "dos hora y treinta y cinco minutos" [This Duration]
Reviewed By: chinmay87
Differential Revision: D22054695
Pulled By: yuanbing
fbshipit-source-id: b4486141bf7ccb0e538e40ce40fadd7daef374a8
Summary:
This fix is to add support to parse alternative phrase, in ES, for "noon".
Currently the supported ES phrase for "noon" is "mediodia", the alternative form is "medio<whitespace*>dia".
Reviewed By: chinmay87
Differential Revision: D22188049
fbshipit-source-id: 798b83be75798f3b0d695a0f01a65dc84af98e22
Summary:
the rule is updated to conform with natural expression of "ordinal day of month".
Pull Request resolved: https://github.com/facebook/duckling/pull/495
Differential Revision: D22054297
Pulled By: yuanbing
fbshipit-source-id: d9d8e00311d4d3121685ab5b09f6c1f52f3077c9
Summary:
Please note that the major diff with the
existing rule for next week is that the new
phrase doesn't have the leading "la" or anything with
similar meaning.
Pull Request resolved: https://github.com/facebook/duckling/pull/493
Test Plan: Imported from GitHub, without a Test Plan: line.
Reviewed By: patapizza
Differential Revision: D21981169
Pulled By: yuanbing
fbshipit-source-id: 7478d1262c3a4599d359b485b28a547ad5f44b76
Summary:
The root cause was the error in parsing the ES numeral value [1-9] that spelled with two words instead of one.
For example "cero dos" should be parsed the as "dos". Currently it's being as two numeral values: 0 and 3.
Reviewed By: chinmay87
Differential Revision: D22162804
fbshipit-source-id: 949956935a21e742f6788e7afa788ff728dd9a8d
Summary:
the new rules could parse phrases in the form of
xxx upcoming weeks
upcoming xxx weeks
Pull Request resolved: https://github.com/facebook/duckling/pull/491
Test Plan: Imported from GitHub, without a Test Plan: line.
Differential Revision: D21959647
Pulled By: chinmay87
fbshipit-source-id: a062a8c7a6c2e23b921b1099b886fa589c69c454
Summary:
while computing a score used to rank in Duckling, it currently sums up the log likelihoods learned during training. While ranking, the goal is to find the (same span) parse candidate which is _more_ likely to lead to a *correct* parse. However, the old logic was summing up the "more confident of the two classes" log likelihood.From what I understand this is the part which feels wrong.
I created an example of two rules:
#1. a rule where the classifier learns that the rule is very confidently NOT the correct parse.
- okdata (positive class) is very low confidence (high negative number prior)
- kodata (negative class) is very high confidence (low negative number prior)
#2. a rule where the classifier is confident that it is the correct parse, but not Very Confident.
- okdata (positive class) is high confidence (nonzero, but low negative number prior)
- kodata (negative class) is very low confidence (high negative number prior)
these two rules match the same regex, thus the same span. While duckling parses it, it turns out, that rule #1 ranks higher than rule #2. The reason why is because #1 is MORE confident that it is the INCORRECT (does not contribute to) parse than rule #2. Does this make sense?
to solve this problem, I changed the ranking score estimation to use only the positive class scores (okdata). In the example above, it fixes it so rule #2 would end up ranking higher because the positive class confidence is higher than #1's positive class confidence.
Would really love some deeper input from Duckling experts. I re-learned haskell and learned haxl to craft a small example here, and I am very new to Duckling (just started reading the ranking code on Friday). I know Duckling is battle-tested but I also don't believe that means a bug can't exist. And further, this specific bug may not happen a whole lot for 2 reasons:
- there are not a lot of rules which end up higher negative confidence than positive (requires enough negative corpus examples over positive ones)
- ranking uses span width first, and only when the spans are equivalent does the score based ranking come into play. So it requires that 2 rules match the same span before any actual score calculation even matters.
Reviewed By: patapizza
Differential Revision: D22009276
fbshipit-source-id: 13491689d39d810da526fa4bb8b6e526d4cafd35
Summary:
Current:
if the fractional hour expression describes the hour fraction with term like "quarter or quarters", then duckling couldn't correctly recognize it.
Expected:
Duckling should be able to identify this kind of expression and parse it correctly.
Fix:
Add new rule to parse the fractional hour pattern that contains the keyword like "quarter or quarters".
Pull Request resolved: https://github.com/facebook/duckling/pull/485
Test Plan: Imported from GitHub, without a Test Plan: line.
Reviewed By: haoxuany
Differential Revision: D21850804
Pulled By: chinmay87
fbshipit-source-id: 818b7b3f37e3f8a6d1a7d579db19fb2cfb2763f4
Summary:
added new EN rule to parse the phrases that contain "midday".
Pull Request resolved: https://github.com/facebook/duckling/pull/490
Differential Revision: D21959562
Pulled By: chinmay87
fbshipit-source-id: f9ab45aecd551e8959d00b0025ed38b616ed6b14
Summary:
Current:
"el dia nueve" -> "9pm" of current day
Expected:
"el dia nueve" -> 9th of current or next month
Fix:
added new ES rule to handle the pattern like "el dia <day of month>"
Pull Request resolved: https://github.com/facebook/duckling/pull/487
Reviewed By: girifb
Differential Revision: D21850807
Pulled By: chinmay87
fbshipit-source-id: d8edd81273c7e5f700b440ccc8c7e7bded679051
Summary:
Current behavior:
"an hour and 45 minutes" -> parsed as "1 hour" [dimension: "Duration"]
"a minute and 30 seconds" ->parsed as "1 minute" [dimension: "Duration"]
Expected behavior:
"an hour and 45 minutes" -> "105 minutes" with dimension as "Duration"
"a minute and 30 seconds" -> "90 seconds" with dimension as "Duration"
The fix:
adding new rule to handle this duration composition
pattern. (<some duration> and <some other duration>)
Pull Request resolved: https://github.com/facebook/duckling/pull/483
Reviewed By: haoxuany
Differential Revision: D21850773
Pulled By: chinmay87
fbshipit-source-id: 62eb6859e0ce2b88cf8ae48d836a1a6a1ac8705d
Summary:
* Reduces size of final image from 5GB to 130MB
* Builds any checkout (not locked to the master)
* Doesn't run stack on CMD (executes static build of Duckling instead)
Pull Request resolved: https://github.com/facebook/duckling/pull/341
Reviewed By: chinmay87
Differential Revision: D21083018
Pulled By: patapizza
fbshipit-source-id: d909158f20f5b8da5b0248a25103b850797bc3a3
Summary:
When I was working on some related diffs, I noticed that there were some
asymmetries between the regexes for ruleIntervalMax and ruleIntervalMin:
- we had no support for "at most", even though we did have "at least"
- we had no support for "not? less than"
- the ordering of the different constructions didn't match
This a minor tweak to make things match better
Reviewed By: patapizza
Differential Revision: D20484594
fbshipit-source-id: c3c54a9cc1b83402e42634b7a98a1a3b8cc5e09c
Summary: Fix `ruleYearLatent` to be the same as the one in `en`. We don't want to match numerals that could have been hours.
Reviewed By: patapizza
Differential Revision: D20683975
fbshipit-source-id: cdef9b1b5f8a21dc5e207ed2a7afcad84c56a596
Summary:
When I first skimmed our rules for "half an hour" vs "an hour and a half"
I actually thought there might be a bug, because `timesOneAndAHalf`
sounds like it's actually multiplying by `1.5`.
There's no bug, the implementation is entirely correct, but it does
not multiply by 1.5, it adds .5 to any integer value at the given grain.
This diff renames the function to be more descriptive.
Handy trick for doing this kind of refactor without IDE tooling:
```
find duckling/Duckling/Duration/ -name 'Rules.hs'| xargs sed -i 's/timesOneAndAHalf/nPlusOneHalf/g'
```
Reviewed By: haoxuany
Differential Revision: D20456966
fbshipit-source-id: 35020685f091a41618b30b7e5f95dbfa48509b88
Summary:
This change applies roughly the same rules for supporting intervals
in Spanish AmountOfMoney that we suppor in English: intervals using
`entre _ e _` / `de _ a _` / `_ - _` with either money in both slots
or a number in the first slot and money in the second.
My Spanish is okay but not great - I'm confident these rules are good and
cover the most likely phrases, but there's probably room to add more coverage.
Reviewed By: patapizza
Differential Revision: D20425979
fbshipit-source-id: deb17fc331e1aa192d91dd47bc7f3864a246f0be