Summary:
Support numerals like "forty-five (45)". Commonly seen in legal documents.
There are no classifiers to regenerate.
Resolves#216
Reviewed By: stroxler
Differential Revision: D28305725
fbshipit-source-id: b9b4e160f630ce3cf462fcf9f2e575738463c313
Summary:
I think they all fell into one of two categories:
- names colliding with field names, but where there was already an existing
pattern (e.g. d for dimension, v for value) and we just had to be consistent
- cases that were best fixed by turning NamedFieldPuns on, which I did
Reviewed By: chessai
Differential Revision: D28213245
fbshipit-source-id: 18fbd61771e12da11ce03b98b74af51d1e837787
Summary:
When tracing the code from Debug downward, the unnecessary rename
of an argument from `sentence` to `input` creates a context switch. Let's
use the same name throughout.
Reviewed By: chessai
Differential Revision: D28213244
fbshipit-source-id: 22476d958312e5c60cd32ff1e3d0d460cf0c8c79
Summary:
In both `Api.hs` and `Debug.hs` I noticed that I was staring at the code
longer than necessary to figure out what a lambda with a destructure and
pattern match were doing. Moving it to a function in `where` named
`isRelevantDimension` makes skimming easier.
Reviewed By: chessai
Differential Revision: D28213243
fbshipit-source-id: 344f464dcac7297009c35b19373eef67e0eb9540
Summary:
Easy style fixes for ExampleMain.hs, Debug.hs, Api.hs, Core.hs
Most of these are just lint fixes, but I also made a few not-just-lint changes
to conform to some elements of our style guide that I agree with:
- if the type signature doesn't fit on one line, then put one type per line
with nothing on the first line, so that all types are vertically aligned - makes
for a quick skim
- try to avoid mixing same-line function args with hanging function args: hang
all arguments or none at all to get a more outline-like feel, again better for
skimming
I was actually able to eliminate all errors for most of these modules - the name
collisions I usually give up on were manageable by hiding + easy variable renames
Reviewed By: chessai
Differential Revision: D28213246
fbshipit-source-id: 1f77d56f2ff8dccfd5f3b534f087c07047b92885
Summary:
While I was working on fixing #604, I came across the rules
`ruleMilitarySpelledOutAMPM(2)`, which were actually capturing
some of my test phrases and confusing me.
This commit removes them because
- they aren't needed: the existing latent spelled-out hour + minute rules plus
the "(in the )?(am/pm)" rules together give the same behavior
- they are confusingly named - these aren't military times at all, they are
spelled-out civilian times
Reviewed By: haoxuany
Differential Revision: D27848485
fbshipit-source-id: ba1ed16ec22b5139b0b500b44dc91adb1b5e3d82
Summary:
This commit extend Spanish-language support for concatenations
of the form "<higher-order-of-magnitude> <lower>", e.g.
"doscientos tres" (203) or "cuatro mil ventiuno" (4022) to work
not just for hundreds but also for thousands and millions.
Reviewed By: chessai
Differential Revision: D27858133
fbshipit-source-id: 5c6b227ae7dad9009cd636e7ea49c209480c931a
Summary:
This commit adds two things to Spanish numeral support:
- support for millions
- support, via hooking into the `isMultipliable` logic used by EN, for
composing counts of 2-999 with either "mil" or "millones", which is
the standard way to say things like "tres mil" = 3000
Reviewed By: chessai
Differential Revision: D27858135
fbshipit-source-id: 980e95bd989f818c5ceaa2bb6c87fe81d3e08366
Summary:
This diff refactors our handling of "<hundreds> 0..99" numbers
to be more flexible by replacing `ruleNumeralthreePartHundreds`
with
- a rule for two-part hundreds like "dos cientos" (which is technically
incorrect grammar - doscientos is correct - but probably worth keeping) based
on a notion of multipliability like that used in EN rules
- a rule stating that we can compose hundreds with 0..99 additively
The resulting rules are more flexible, and they correctly parse not only
gramatically iffy phrases like "dos cientos tres", but also grammatically
correct phrases like "doscientos tres". This fixes#380.
Reviewed By: chessai
Differential Revision: D27858136
fbshipit-source-id: 4a918d84d93ac074f83f6947a8f80cfd11145115
Summary:
Whle looking into fixing https://github.com/facebook/duckling/issues/380
I was having a bit of trouble navigating the existing rules and guessing
what is / is not supported.
This diff refactors the Numeral/ES code to be easier to navigate:
- rename all the `ruleNumeral{1,2,3,4,5,6}` rules to be descriptive
- changes the order to be themed from small to large numbers, and
make sure the order of defines matches the order of rules at the end
of the module
- use [20 .. 90] instead of manually specifying the same list out-of-order
Reviewed By: chessai
Differential Revision: D27858134
fbshipit-source-id: b13983d75b36bb4e2b387ef06fe61066d81ae19a
Summary:
It's common to use dashes when spelling out times longhand,
e.g. "five-thirty am", but Duckling wasn't handling this at all.
This commit adds rules for times spelled out with dashes. The
rules explicitly forbid the second of the two times from including
digits via a negative match. This is because
- it wouldn't be at all idomatic to write five-26 or five-oh-6
- allowing that pattern clashes with time range parsing, e.g.
"9-10 am" should parse as a time range, not as "9:10 am"
Reviewed By: chessai
Differential Revision: D27848428
fbshipit-source-id: dfe8b98cb38119a16db2a19db47fd3128783e617
Summary:
This commit fixes#111, which was an open issue that any non-integer
multiple of any unit of time was being converted to seconds.
My solution is to write a recursive function `Duration.Helpers.inCoarsestGrain`
which, given a grain `g` and double value `v` finds the coarses grain `g'` such that
`v * g` - rounded to the nearest seconds - has integral units.
We call this function only in the case of non-integer multiples, and we start our
search from the given grain because nothing coarser would make sense. The code could
actually be slightly more efficient if we started at the next-smallest grain, but
in the interest of clarity I think this is probably better.
Reviewed By: chessai
Differential Revision: D27891439
fbshipit-source-id: b048310963eb71337fd91ab4ef3c840134a76e73
Summary:
While debugging an attempt to extend our handling of spelled-out
times, I realized that we are being too aggressive in our parsing of
times like "five ten", because we'll parse "five nine" as possibly
meaning "5:09", which isn't something an English speaker would say
(or rather if they did, it's more likely they mean "five (to) six"
or something similar.
Reviewed By: chessai
Differential Revision: D27848429
fbshipit-source-id: 34d783332fd60359ad9b6e7862367453bc93a1d1
Summary:
This commit gets rid of all the easy-to-fix lint warnings on time helper modules:
- replacing unnecessary `.` with `$`
- Flipping a lambda in a map to an infix operation
- Use `ts` for a list of times, not `series` which produces a pretty confusing naming collision
There are still quite a lot of lint errors related to name masking, which would be challenging to fix without us coming to an agreement about naming conventions.
But at least in my editor, name-masking errors are a lot less visually noisy than other errors (they only highlight the one name) so I don't mind them as much when skimming the code.
Reviewed By: chessai
Differential Revision: D27842198
fbshipit-source-id: 9091e5349657243b61d7ee169d0d06dd2122ac17
Summary:
The Dutch numeral and ordinal rules contained a spelling mistake: [‘achttien’](https://en.wiktionary.org/wiki/achttien) (18) was spelt as ‘achtien’ (note the single *t*).
This PR fixes this spelling mistake everywhere in the code base.
Pull Request resolved: https://github.com/facebook/duckling/pull/602
Reviewed By: patapizza
Differential Revision: D27859001
Pulled By: chessai
fbshipit-source-id: 8da0d8b099d49cf6207d4066cee1fc7da68a418e
Summary:
While looking at bugfixes for some ES and IT numeral parsing, I was
looking at the Helpers.hs module and figured I'd silence some lint
errors. There are still some name-masking lint issues, but all others
have been fixed.
Reviewed By: chessai
Differential Revision: D27854164
fbshipit-source-id: 5be8d924b033a55608c455074df1c80c8c0019be
Summary:
This fixes#592 in a very conservative way: the reason why `ruleIntersect` does
not detect "tonight 815" and "tonight eight fifteen" as it does "tonight 8:15"
is because it explicitly forbids the second part of the intersection from being
latent, unless it is a year.
I don't think it's a good idea to remove the restriction on latent inputs in
`ruleIntersect`, so instead I just made a new rule specifically for the
intersection of `<part-of-day> <time-of-day>`.
It also seems to me that there's a lot of room for this to be too aggressive,
for example if I say "tonight 500 people will laugh" the "tonight" and "500"
aren't really linked. So, I set the rule to be latent; this may be too conservative
to be useful though (do client libraries usually allow latent results?).
Reviewed By: chessai
Differential Revision: D27842596
fbshipit-source-id: 36ac59e31c632d4864241bce291147a46d52f780
Summary: Adds Catalan language and Numeral rules for it
Reviewed By: haoxuany
Differential Revision: D26518604
Pulled By: chessai
fbshipit-source-id: e6b4b0ceb9b7931d086c732dd03fb5cbbe062d5b
Summary:
These are popular variants/abbreviations of Egyptian pounds.
All these forms are documented on wikipedia (https://en.wikipedia.org/wiki/Egyptian_pound)
Pull Request resolved: https://github.com/facebook/duckling/pull/590
Reviewed By: haoxuany
Differential Revision: D27598249
Pulled By: chessai
fbshipit-source-id: 42ae9115b1def48c58e50a6deb624c3407c029f3
Summary:
The facebook internal linters prefer us to avoid
excessive point-free style and extra $ where we could
instead move existing brackets.
Making those style tweaks for Time/EN/Rules.hs because
I was looking at the file as part of
Reviewed By: chessai
Differential Revision: D27108042
fbshipit-source-id: 7c8e76578476ea14d655131943e693c5159b12d2
Summary:
I was testing an unrelated change (which doesn't change
classifier scores) and reran classifiers just to be safe, I noticed
that the scores changed.
This diff updates them.
Reviewed By: chessai
Differential Revision: D26892970
fbshipit-source-id: c7da3e3b7d01955f98b287a3ff4e7c1ff2837c7f
Summary:
I was looking at adding support for "next week" constructions in Spanish to
close https://github.com/facebook/duckling/issues/553 (which it appears has
already been handled), when I noticed that the equivalent logic for English
has been split into two separate examples: "coming week" isn't in the same
example as other equivalent constructs like "upcoming week" and "next week".
This diff combines them, which I think is clearer and fewer lines of code
Reviewed By: chessai
Differential Revision: D26892322
fbshipit-source-id: 68ca4644759198fc79d963ae080495c3f2d4a923
Summary: due to exploit in T85548324, factoring the input to get a smaller parse tree (the existing one parses tail recursively, whereas this one uses ruleIntersect which is still bad, but slightly better).
Differential Revision: D26657170
fbshipit-source-id: fe3a738073b4d30ae401521bb692f4a4bba48d96
Summary: There are a handful of more spelling for russian numbers [20, 30 .. 90] that we aren't handling. Additionally, we optimise for recall over precision by allowing some invalid spellings that could be understandable typos.
Reviewed By: patapizza
Differential Revision: D26285711
Pulled By: chessai
fbshipit-source-id: fd8a8f373d228a526e79b22326eff48bb966310d
Summary:
Add rules:
- `hkd` as HKD, and related rules (prefix and suffix)
- dollar and <amount-of-money> rule
- dollar and a half rule
- intersection for <amount-of-money> and `a half`
Changed:
- dime and dollar rules now have improved coverage
Reviewed By: girifb
Differential Revision: D26191724
Pulled By: chessai
fbshipit-source-id: bf63b6eaa751fb96dcf341fa2b66db06a6eeca79
Summary:
Adding . in between kilogram units used to be extracted as a Numeral
instead of Quantity.
Pull Request resolved: https://github.com/facebook/duckling/pull/570
Reviewed By: patapizza
Differential Revision: D26199687
Pulled By: chessai
fbshipit-source-id: 65e39f20296946d5762d7180b12878f4e66ea701
Summary:
In general there are some clashes between time formats `hhmm` and date formats `ddmm`. For example, depending on context, `22.10` can mean clock time ten past ten or the twenty second of october. In general it's correct to interpret this as clock time, as Duckling currently does.
But there are some cases not currently covered by Duckling where we have more unambiguous dates, e.g. `12.03.2018` and `27.11`. These are included here (in addition to midnight `24:00` which was also missing).
#### Changes:
- Bug in `ruleDdmm` regex meant that dates on the format `dd/mm` where `mm > 9` were not parsed
- `ruleYyyymmdd` now also parses dots and forward slashes, i.e. `2012.05.14` and `2012/05/14`
- New rule `rule2400` parses `24:00` and `24.00` (I elected not to include it in `ruleMidnighteodendOfDay` as it has grain minute rather than day)
- New rule `ruleDmm` parses `1/10`, `9.12` etc
- New rule `ruleDDm` parses `10/3`, `11.1` etc
- New rule `ruleDdDotMm` parses `25.02`, `31.10` etc
- `ruleDdmmyyyy` now also parses dots, i.e. `03.10.1983`
- New tests
Pull Request resolved: https://github.com/facebook/duckling/pull/395
Reviewed By: patapizza
Differential Revision: D26193069
Pulled By: chessai
fbshipit-source-id: cf711807fa1d40be2303f2426d74ded40c2e23b3
Summary:
This PR adds UAH currency Type and examples to EN and RU Corpus
Pull Request resolved: https://github.com/facebook/duckling/pull/433
Reviewed By: girifb
Differential Revision: D25102990
Pulled By: chessai
fbshipit-source-id: ed40e8dfcf145a65c7e6d87158da0efacb32e256
Summary: adds a new rule that parses year intervals such as "1960 - 1961". see inline comments for heuristics.
Reviewed By: patapizza
Differential Revision: D25840835
fbshipit-source-id: 851a5b1c78440cbf065bf9f20a05c78d4967ea3c
Summary: adds a rule for 'the day after tomorrow' in Romanian. regenerates classifiers.
Reviewed By: girifb
Differential Revision: D26155042
fbshipit-source-id: 80005ab94a10f9fbf242c9a712bd040e4f6bc477
Summary:
**2nd set of changes from pull request https://github.com/facebook/duckling/issues/516
Supporting Cantonese and more common expressions in Chinese.
Adding rules file for Duration/ZH.
Pull Request resolved: https://github.com/facebook/duckling/pull/523
Reviewed By: haoxuany
Differential Revision: D23428901
Pulled By: chessai
fbshipit-source-id: 6d04c97b63bac966eb61d77cab2f08f7543dbbf0
Summary:
Currently values like 1000.000 (in Dutch . is thousand separator) are not recognised, as the ruleDecimalWithThousandsSeparator requires the decimal part (e.g. 1000.000,34) to be present. This PR adds some data and changes the ruleDecimalWithThousandsSeparator to make the decimal part optional.
Pull Request resolved: https://github.com/facebook/duckling/pull/504
Reviewed By: patapizza, girifb
Differential Revision: D26078885
Pulled By: chessai
fbshipit-source-id: b1679c713e1d17a168d34a3cc556b6c36a571d75
Summary:
**1st set of changes from pull request https://github.com/facebook/duckling/issues/516
Supporting more common expressions, such as fraction, half, dozen, in Chinese.
Pull Request resolved: https://github.com/facebook/duckling/pull/522
Reviewed By: patapizza
Differential Revision: D23428893
Pulled By: chessai
fbshipit-source-id: 3454ac70a4bfff90dc282560916a0fae9969f521
Summary:
* "at the moment" is considered identical to "now".
* "ASAP" is considered identical to "from now"
Pull Request resolved: https://github.com/facebook/duckling/pull/405
Reviewed By: patapizza
Differential Revision: D26009483
Pulled By: chessai
fbshipit-source-id: addf4c509e69d413cae279601c64f72710eba11f
Summary:
This pull request is to add support for Telugu language (Numerical Dimension) to Duckling
Pull Request resolved: https://github.com/facebook/duckling/pull/470
Differential Revision: D25546700
Pulled By: chessai
fbshipit-source-id: 1d88ee27da8a577a4a79ff31be8cb55ed6444c4e
Summary:
Egyptian Arabic is a dialect of Arabic that is mostly a spoken language that is used in everyday communications.
This PR adds new locale to Arabic to support the differences between Modern Standard Arabic (MSA) and Egyptian Arabic (EG).
I have mainly depended on the different locales of Spanish that are supported by Duckling to create the new Egyptian Arabic locale.
New modifications are added to the `Numeral` dimension since I didn't spot differences in other dimensions.
Pull Request resolved: https://github.com/facebook/duckling/pull/554
Reviewed By: patapizza
Differential Revision: D25543502
Pulled By: chessai
fbshipit-source-id: 4cbb7be78a52071c8681380077f0b4dc033a60de
Summary:
Crore (1e7) and Lakh (1e5) are both commonly used to describe an amount of Indian currency. Common abbreviations are "Cr" (Crore) and "lkh", "L", "lac" (lakh).
Additionally, common spellings of "crore" include "karor" and "koti"
Reviewed By: patapizza
Differential Revision: D25550546
fbshipit-source-id: 0c1479d9027431cb0d1182b5117eabca6f939cb2
Summary:
'miej' in Polish is the imperative form of the verb 'mieć' (to have). "mniej więcej" means "more or less" and it was the intention here.
Pull Request resolved: https://github.com/facebook/duckling/pull/426
Reviewed By: patapizza, girifb
Differential Revision: D25546380
Pulled By: chessai
fbshipit-source-id: 1047b83109cab917f1f4dbe87b667f8ccd2fb92d
Summary:
Improves the recognition of German time approximation language and removes a single error in the rule of <time-of-day> approximately.
Pull Request resolved: https://github.com/facebook/duckling/pull/435
Reviewed By: patapizza
Differential Revision: D24934281
Pulled By: chessai
fbshipit-source-id: 641bcb6a7e5c26e66c735fe13bccae9b7a8909ae
Summary:
Add support for additional Hindi numbers like 300, 81, 150, 1000, 1520. These are not supported in the current master version.
Pull Request resolved: https://github.com/facebook/duckling/pull/552
Reviewed By: ashwinp-fb, girifb
Differential Revision: D25072230
Pulled By: chessai
fbshipit-source-id: 35277a2349384bcf44a20e74852113f5c010e618
Summary:
Found a lacking frequent duration in German and a small typo in the existing one.
Pull Request resolved: https://github.com/facebook/duckling/pull/509
Reviewed By: patapizza
Differential Revision: D24690104
Pulled By: chessai
fbshipit-source-id: b49a7a636abf5b92f2fe7c0d5b2ca2fe64acbaa2
Summary:
Spanish (ES) will now have all the same quantity rules as English (EN) (which I think is the most-supported language), plus more.
This includes the following:
* bowls - (bol(es)?|tazón(es)?|cuencos?|platos? (soperos?)|(hondos?)) (EN does not currently have this)
* cups - (tazas?)
* dishes - (platos?|fuentes?) (EN does not currently have this)
* grams - (((m(ili)?)|(k(ilo)?))?g(ramo)?s?)
* ounces - ((onzas?)|oz)
* pints - (pintas?) (EN does not currently have this)
* pounds - ((lb|libra)s?)
* quarts - (cuartos? de galón) (EN does not currently have this)
* tablespoons - (cucharadas? (grande)?) (EN does not currently have this)
* teaspoons - (cucharaditas?) (EN does not currently have this)
Reviewed By: patapizza
Differential Revision: D24628214
fbshipit-source-id: 2e8d500661f30fa0928cb7d3f21470afc01e2285
Summary:
Pull Request resolved: https://github.com/facebook/duckling/pull/533
In recent versions of Data.Some the name of the constructor, `This` has changed name to `Some`. This has become rather problematic for us to migrate so we're just going to remove the dependency. The meat of this diff is adding the type `Seal` to `Duckling.Types`. That type replaces `Some`.
Reviewed By: pepeiborra
Differential Revision: D23929459
fbshipit-source-id: 8ff4146ecba4f1119a17899961b2d877547f6e4f
Summary:
"so" is an adverb in German: https://github.com/wit-ai/wit/issues/1860
It's also a short form for "Sonntag" (Sunday); making the dot mandatory.
Reviewed By: haoxuany
Differential Revision: D22900791
fbshipit-source-id: 8dc873f79a21ca2add074f9c664e84fae56f1e67
Summary: Currently the term "coming" is being treated the same way as "this" or "current". The expected treatment should be the same as the term "next".
Reviewed By: chinmay87
Differential Revision: D22435156
fbshipit-source-id: b0b20d8a38014267fb7d037b685ce126f602bda7
Summary:
Current:
"seis cero cinco pm" [dimension Time] -> "cero cinco pm" or "5 pm"
here the term "seis" was dropped because it was treated as "6" in "Numeral" dimension.
Expected:
"seis cero cinco pm" -> "6:05 pm"
The root cause was that the rule "<hour-of-day> <integer> (as relative minutes)" dropped the first term "hour-of-day" if it was parsed as a latent token.
Reviewed By: chinmay87
Differential Revision: D22553028
fbshipit-source-id: abc92bb369c23d2b3084641eab2a2dabb87dbc66
Summary:
There are two rules for parsing "manana" (dimension: Time): one is resolved to "morning"; while the other is resolved to "tomorrow". And the first (or "morning") rule resolves to a LATENT result; while the second (or "tomorrow") rule resolves to a NON-LATENT result.
If the duckling is called with "latent" option turned off, the "tomorrow" rule prevails. However, if the duckling is invoked with "latent" option turned on, the "morning" rule is preferred.
The solution (for now) is to steer the classifier towards "tomorrow" rule by adding large number of (same) examples for "tomorrow" rule.
Reviewed By: chinmay87
Differential Revision: D22425277
fbshipit-source-id: 2f139eec0c38b9b5227f27d9f09f6264e7cf86cd
Summary:
The root cause is this lacking of support for the composition of numerals in ES.
For example, "mil novecientos noventa" is parsed 3 individual numbers: 1000, 900 and 90 correspondingly. Instead, the expected result is a single numeral value that is the sum of aforementioned three numbers. The same expection can be extended to the composition with arbitrary number of numeral values.
Reviewed By: chinmay87
Differential Revision: D22192034
fbshipit-source-id: 476489145b83297b82d88f3451020c867e2d08aa
Summary:
Current:
"first monday of last month" -> the date of first monday starting from current time. Note here the term "last month" is dropped
Expected:
"first monday of last month" -> the date of first monday of previous month.
Reviewed By: chinmay87
Differential Revision: D22300243
fbshipit-source-id: 16622860c52ec2ce9c7a7bcd6094192255aa5a0b
Summary:
Current:
"twelve zero three" -> 12:00pm
Expected:
"twelve zero three" -> 12:03pm
The root cause was that duckling doesn't support this kind of pattern for timestamp. The uniqueness here was that the number "three" was spelled as "zero three" that Duckling failed to understand.
Reviewed By: chinmay87
Differential Revision: D22313140
fbshipit-source-id: 9e481a142a16b94c61b1770e7f8be036497419f8
Summary:
current:
last friday in october -> the date of Friday of previous week
expected:
last friday in october -> the data of last Friday of month october
Reviewed By: chinmay87
Differential Revision: D22201326
fbshipit-source-id: 1983c1b9c24aa356977af7def42d5ba07c7f08be
Summary:
Current:
"seis dos de lar tarde" -> "dos de lar tarde" or 2pm; note
that the term "seis" is dropped.
Expected:
"seis dos de lar tarde" -> "seis dos de lar tarde"
or 6:02pm
Pull Request resolved: https://github.com/facebook/duckling/pull/496
Test Plan: H.io $ debug (makeLocale ES Nothing) "seis dos de la tarde" [This Time]
Reviewed By: chinmay87
Differential Revision: D22054328
Pulled By: yuanbing
fbshipit-source-id: 1ecb05885fc506176cc04768aa158279c7e7fd4f
Summary:
There are two types of ES phrases for timestamp to support:
1. "para las seis cero dos pm"
2. "para las 6 0 2 pm"
The solution is to:
1. added a new rule to parse two-digit number between 1 and 9 (inclusive);
2. modified the regex pattern to support additional optional phrase "para" in front of "las".
Reviewed By: chinmay87
Differential Revision: D22218800
fbshipit-source-id: 58f692beb6f10834c0ab639b31bf239bf4a1970e
Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/498
Test Plan:
In haxlsh:
H.io $ debug (makeLocale ES Nothing) "dos hora y treinta y cinco minutos" [This Duration]
Reviewed By: chinmay87
Differential Revision: D22054695
Pulled By: yuanbing
fbshipit-source-id: b4486141bf7ccb0e538e40ce40fadd7daef374a8
Summary:
This fix is to add support to parse alternative phrase, in ES, for "noon".
Currently the supported ES phrase for "noon" is "mediodia", the alternative form is "medio<whitespace*>dia".
Reviewed By: chinmay87
Differential Revision: D22188049
fbshipit-source-id: 798b83be75798f3b0d695a0f01a65dc84af98e22
Summary:
the rule is updated to conform with natural expression of "ordinal day of month".
Pull Request resolved: https://github.com/facebook/duckling/pull/495
Differential Revision: D22054297
Pulled By: yuanbing
fbshipit-source-id: d9d8e00311d4d3121685ab5b09f6c1f52f3077c9
Summary:
Please note that the major diff with the
existing rule for next week is that the new
phrase doesn't have the leading "la" or anything with
similar meaning.
Pull Request resolved: https://github.com/facebook/duckling/pull/493
Test Plan: Imported from GitHub, without a Test Plan: line.
Reviewed By: patapizza
Differential Revision: D21981169
Pulled By: yuanbing
fbshipit-source-id: 7478d1262c3a4599d359b485b28a547ad5f44b76
Summary:
The root cause was the error in parsing the ES numeral value [1-9] that spelled with two words instead of one.
For example "cero dos" should be parsed the as "dos". Currently it's being as two numeral values: 0 and 3.
Reviewed By: chinmay87
Differential Revision: D22162804
fbshipit-source-id: 949956935a21e742f6788e7afa788ff728dd9a8d
Summary:
the new rules could parse phrases in the form of
xxx upcoming weeks
upcoming xxx weeks
Pull Request resolved: https://github.com/facebook/duckling/pull/491
Test Plan: Imported from GitHub, without a Test Plan: line.
Differential Revision: D21959647
Pulled By: chinmay87
fbshipit-source-id: a062a8c7a6c2e23b921b1099b886fa589c69c454
Summary:
while computing a score used to rank in Duckling, it currently sums up the log likelihoods learned during training. While ranking, the goal is to find the (same span) parse candidate which is _more_ likely to lead to a *correct* parse. However, the old logic was summing up the "more confident of the two classes" log likelihood.From what I understand this is the part which feels wrong.
I created an example of two rules:
#1. a rule where the classifier learns that the rule is very confidently NOT the correct parse.
- okdata (positive class) is very low confidence (high negative number prior)
- kodata (negative class) is very high confidence (low negative number prior)
#2. a rule where the classifier is confident that it is the correct parse, but not Very Confident.
- okdata (positive class) is high confidence (nonzero, but low negative number prior)
- kodata (negative class) is very low confidence (high negative number prior)
these two rules match the same regex, thus the same span. While duckling parses it, it turns out, that rule #1 ranks higher than rule #2. The reason why is because #1 is MORE confident that it is the INCORRECT (does not contribute to) parse than rule #2. Does this make sense?
to solve this problem, I changed the ranking score estimation to use only the positive class scores (okdata). In the example above, it fixes it so rule #2 would end up ranking higher because the positive class confidence is higher than #1's positive class confidence.
Would really love some deeper input from Duckling experts. I re-learned haskell and learned haxl to craft a small example here, and I am very new to Duckling (just started reading the ranking code on Friday). I know Duckling is battle-tested but I also don't believe that means a bug can't exist. And further, this specific bug may not happen a whole lot for 2 reasons:
- there are not a lot of rules which end up higher negative confidence than positive (requires enough negative corpus examples over positive ones)
- ranking uses span width first, and only when the spans are equivalent does the score based ranking come into play. So it requires that 2 rules match the same span before any actual score calculation even matters.
Reviewed By: patapizza
Differential Revision: D22009276
fbshipit-source-id: 13491689d39d810da526fa4bb8b6e526d4cafd35
Summary:
Current:
if the fractional hour expression describes the hour fraction with term like "quarter or quarters", then duckling couldn't correctly recognize it.
Expected:
Duckling should be able to identify this kind of expression and parse it correctly.
Fix:
Add new rule to parse the fractional hour pattern that contains the keyword like "quarter or quarters".
Pull Request resolved: https://github.com/facebook/duckling/pull/485
Test Plan: Imported from GitHub, without a Test Plan: line.
Reviewed By: haoxuany
Differential Revision: D21850804
Pulled By: chinmay87
fbshipit-source-id: 818b7b3f37e3f8a6d1a7d579db19fb2cfb2763f4