duckling

mirror of https://github.com/facebook/duckling.git synced 2024-10-26 21:59:44 +03:00

Author	SHA1	Message	Date
Daniel Cartwright	59cb9e0879	Support legalese numerals Summary: Support numerals like "forty-five (45)". Commonly seen in legal documents. There are no classifiers to regenerate. Resolves #216 Reviewed By: stroxler Differential Revision: D28305725 fbshipit-source-id: b9b4e160f630ce3cf462fcf9f2e575738463c313	2021-05-10 10:33:35 -07:00
Steven Troxler	4b44e969c9	Fix all name collisions on the main `Types.hs` Summary: I think they all fell into one of two categories: - names colliding with field names, but where there was already an existing pattern (e.g. d for dimension, v for value) and we just had to be consistent - cases that were best fixed by turning NamedFieldPuns on, which I did Reviewed By: chessai Differential Revision: D28213245 fbshipit-source-id: 18fbd61771e12da11ce03b98b74af51d1e837787	2021-05-07 06:20:03 -07:00
Steven Troxler	ce3614fedd	In Debug.hs, `s/sentence/input/g` Summary: When tracing the code from Debug downward, the unnecessary rename of an argument from `sentence` to `input` creates a context switch. Let's use the same name throughout. Reviewed By: chessai Differential Revision: D28213244 fbshipit-source-id: 22476d958312e5c60cd32ff1e3d0d460cf0c8c79	2021-05-06 08:54:57 -07:00
Steven Troxler	a88b70feb7	Filter with a self-describing function in where Summary: In both `Api.hs` and `Debug.hs` I noticed that I was staring at the code longer than necessary to figure out what a lambda with a destructure and pattern match were doing. Moving it to a function in `where` named `isRelevantDimension` makes skimming easier. Reviewed By: chessai Differential Revision: D28213243 fbshipit-source-id: 344f464dcac7297009c35b19373eef67e0eb9540	2021-05-06 08:54:57 -07:00
Steven Troxler	eba5d0a825	Simple style fixes for outer layers around Engine.hs Summary: Easy style fixes for ExampleMain.hs, Debug.hs, Api.hs, Core.hs Most of these are just lint fixes, but I also made a few not-just-lint changes to conform to some elements of our style guide that I agree with: - if the type signature doesn't fit on one line, then put one type per line with nothing on the first line, so that all types are vertically aligned - makes for a quick skim - try to avoid mixing same-line function args with hanging function args: hang all arguments or none at all to get a more outline-like feel, again better for skimming I was actually able to eliminate all errors for most of these modules - the name collisions I usually give up on were manageable by hiding + easy variable renames Reviewed By: chessai Differential Revision: D28213246 fbshipit-source-id: 1f77d56f2ff8dccfd5f3b534f087c07047b92885	2021-05-06 08:54:56 -07:00
Steven Troxler	0e13d28b4d	Time/EN: Get rid of unnecessary rules Summary: While I was working on fixing #604, I came across the rules `ruleMilitarySpelledOutAMPM(2)`, which were actually capturing some of my test phrases and confusing me. This commit removes them because - they aren't needed: the existing latent spelled-out hour + minute rules plus the "(in the )?(am/pm)" rules together give the same behavior - they are confusingly named - these aren't military times at all, they are spelled-out civilian times Reviewed By: haoxuany Differential Revision: D27848485 fbshipit-source-id: ba1ed16ec22b5139b0b500b44dc91adb1b5e3d82	2021-04-26 06:17:44 -07:00
Steven Troxler	c44c73fe04	Numeral/ES: Add support for additive concatenations Summary: This commit extend Spanish-language support for concatenations of the form "<higher-order-of-magnitude> <lower>", e.g. "doscientos tres" (203) or "cuatro mil ventiuno" (4022) to work not just for hundreds but also for thousands and millions. Reviewed By: chessai Differential Revision: D27858133 fbshipit-source-id: 5c6b227ae7dad9009cd636e7ea49c209480c931a	2021-04-23 09:48:07 -07:00
Steven Troxler	888da76215	Numeral/ES: Add support for 1M, and multiples of 1K/1M Summary: This commit adds two things to Spanish numeral support: - support for millions - support, via hooking into the `isMultipliable` logic used by EN, for composing counts of 2-999 with either "mil" or "millones", which is the standard way to say things like "tres mil" = 3000 Reviewed By: chessai Differential Revision: D27858135 fbshipit-source-id: 980e95bd989f818c5ceaa2bb6c87fe81d3e08366	2021-04-23 09:48:06 -07:00
Steven Troxler	15bba9eba9	Numeral/ES: Refactor hundreds handling to fix bug Summary: This diff refactors our handling of "<hundreds> 0..99" numbers to be more flexible by replacing `ruleNumeralthreePartHundreds` with - a rule for two-part hundreds like "dos cientos" (which is technically incorrect grammar - doscientos is correct - but probably worth keeping) based on a notion of multipliability like that used in EN rules - a rule stating that we can compose hundreds with 0..99 additively The resulting rules are more flexible, and they correctly parse not only gramatically iffy phrases like "dos cientos tres", but also grammatically correct phrases like "doscientos tres". This fixes #380. Reviewed By: chessai Differential Revision: D27858136 fbshipit-source-id: 4a918d84d93ac074f83f6947a8f80cfd11145115	2021-04-23 09:48:06 -07:00
Steven Troxler	6db071069b	Numeral/ES: Style fixes - names, order, etc Summary: Whle looking into fixing https://github.com/facebook/duckling/issues/380 I was having a bit of trouble navigating the existing rules and guessing what is / is not supported. This diff refactors the Numeral/ES code to be easier to navigate: - rename all the `ruleNumeral{1,2,3,4,5,6}` rules to be descriptive - changes the order to be themed from small to large numbers, and make sure the order of defines matches the order of rules at the end of the module - use [20 .. 90] instead of manually specifying the same list out-of-order Reviewed By: chessai Differential Revision: D27858134 fbshipit-source-id: b13983d75b36bb4e2b387ef06fe61066d81ae19a	2021-04-23 09:48:06 -07:00
Steven Troxler	bf696ba185	Time/EN: Allow dashes in spelled-out times Summary: It's common to use dashes when spelling out times longhand, e.g. "five-thirty am", but Duckling wasn't handling this at all. This commit adds rules for times spelled out with dashes. The rules explicitly forbid the second of the two times from including digits via a negative match. This is because - it wouldn't be at all idomatic to write five-26 or five-oh-6 - allowing that pattern clashes with time range parsing, e.g. "9-10 am" should parse as a time range, not as "9:10 am" Reviewed By: chessai Differential Revision: D27848428 fbshipit-source-id: dfe8b98cb38119a16db2a19db47fd3128783e617	2021-04-22 11:47:28 -07:00
Steven Troxler	23ec021b07	Duration/RU: For non-int times, use the coarsest grain Summary: This commit fixes #111, which was an open issue that any non-integer multiple of any unit of time was being converted to seconds. My solution is to write a recursive function `Duration.Helpers.inCoarsestGrain` which, given a grain `g` and double value `v` finds the coarses grain `g'` such that `v * g` - rounded to the nearest seconds - has integral units. We call this function only in the case of non-integer multiples, and we start our search from the given grain because nothing coarser would make sense. The code could actually be slightly more efficient if we started at the next-smallest grain, but in the interest of clarity I think this is probably better. Reviewed By: chessai Differential Revision: D27891439 fbshipit-source-id: b048310963eb71337fd91ab4ef3c840134a76e73	2021-04-21 13:32:38 -07:00
Steven Troxler	f73277c2b0	Duration/EN: fix lint errors Summary: Minor changes to silence most linter errors - remove redundant import - delete some redundant brackets and tweak bracket locations - switch . to $ to avoid excessive-point-free-style warnings Also linted Duration/Types.hs Reviewed By: chessai Differential Revision: D27891440 fbshipit-source-id: d84287edafe328aeddd78b781618ec5c41944bd8	2021-04-21 13:32:37 -07:00
Steven Troxler	35532b0b7c	Time/EN: Tighten up handling of split times like "five ten" Summary: While debugging an attempt to extend our handling of spelled-out times, I realized that we are being too aggressive in our parsing of times like "five ten", because we'll parse "five nine" as possibly meaning "5:09", which isn't something an English speaker would say (or rather if they did, it's more likely they mean "five (to) six" or something similar. Reviewed By: chessai Differential Revision: D27848429 fbshipit-source-id: 34d783332fd60359ad9b6e7862367453bc93a1d1	2021-04-20 05:31:53 -07:00
Steven Troxler	a250e60cbb	Silence many lint errors on Time/{Helpers,Types}.hs Summary: This commit gets rid of all the easy-to-fix lint warnings on time helper modules: - replacing unnecessary `.` with `$` - Flipping a lambda in a map to an infix operation - Use `ts` for a list of times, not `series` which produces a pretty confusing naming collision There are still quite a lot of lint errors related to name masking, which would be challenging to fix without us coming to an agreement about naming conventions. But at least in my editor, name-masking errors are a lot less visually noisy than other errors (they only highlight the one name) so I don't mind them as much when skimming the code. Reviewed By: chessai Differential Revision: D27842198 fbshipit-source-id: 9091e5349657243b61d7ee169d0d06dd2122ac17	2021-04-19 18:16:55 -07:00
Splinter Suidman	933c287854	NL: Fix spelling of ‘achttien’ (18) (#602 ) Summary: The Dutch numeral and ordinal rules contained a spelling mistake: [‘achttien’](https://en.wiktionary.org/wiki/achttien) (18) was spelt as ‘achtien’ (note the single t). This PR fixes this spelling mistake everywhere in the code base. Pull Request resolved: https://github.com/facebook/duckling/pull/602 Reviewed By: patapizza Differential Revision: D27859001 Pulled By: chessai fbshipit-source-id: 8da0d8b099d49cf6207d4066cee1fc7da68a418e	2021-04-19 18:01:33 -07:00
Steven Troxler	586522b649	Numeral: Silence point-free lint errors in Numeral/Helpers.hs Summary: While looking at bugfixes for some ES and IT numeral parsing, I was looking at the Helpers.hs module and figured I'd silence some lint errors. There are still some name-masking lint issues, but all others have been fixed. Reviewed By: chessai Differential Revision: D27854164 fbshipit-source-id: 5be8d924b033a55608c455074df1c80c8c0019be	2021-04-19 16:47:10 -07:00
Steven Troxler	9bd4c9b7fb	Time/EN: Allow latent match for <part-of-day> <latent-time-of-day> Summary: This fixes #592 in a very conservative way: the reason why `ruleIntersect` does not detect "tonight 815" and "tonight eight fifteen" as it does "tonight 8:15" is because it explicitly forbids the second part of the intersection from being latent, unless it is a year. I don't think it's a good idea to remove the restriction on latent inputs in `ruleIntersect`, so instead I just made a new rule specifically for the intersection of `<part-of-day> <time-of-day>`. It also seems to me that there's a lot of room for this to be too aggressive, for example if I say "tonight 500 people will laugh" the "tonight" and "500" aren't really linked. So, I set the rule to be latent; this may be too conservative to be useful though (do client libraries usually allow latent results?). Reviewed By: chessai Differential Revision: D27842596 fbshipit-source-id: 36ac59e31c632d4864241bce291147a46d52f780	2021-04-19 13:05:50 -07:00
evjava	f1cb3bc87c	Russian(RU) duration improvements (#375 ) Summary: - dimunitives for minutes and hours - quarters of an hour - added 'сутки' (24-hour period) Pull Request resolved: https://github.com/facebook/duckling/pull/375 Reviewed By: stroxler Differential Revision: D20332233 Pulled By: chessai fbshipit-source-id: 479858e6c5de856a6965b6193c481e654a6e04fb	2021-04-16 14:31:41 -07:00
leandro.guisandez@pgconocimiento.com	f3e128a07b	Add Ordinal dimension to CA (Catalan) locale Summary: Adds Ordinal rules to Catalan language Reviewed By: patapizza Differential Revision: D27681617 Pulled By: chessai fbshipit-source-id: 145a1117eeff10839484f34a87e9bd685382d42e	2021-04-09 13:05:19 -07:00
leandro.guisandez@pgconocimiento.com	7907812184	Initialise Catalan language with Numeral Summary: Adds Catalan language and Numeral rules for it Reviewed By: haoxuany Differential Revision: D26518604 Pulled By: chessai fbshipit-source-id: e6b4b0ceb9b7931d086c732dd03fb5cbbe062d5b	2021-04-08 14:47:02 -07:00
Amr Keleg	8a8f557002	Add more variants of egp to EN and AR AmountOfMoney dimension (#590 ) Summary: These are popular variants/abbreviations of Egyptian pounds. All these forms are documented on wikipedia (https://en.wikipedia.org/wiki/Egyptian_pound) Pull Request resolved: https://github.com/facebook/duckling/pull/590 Reviewed By: haoxuany Differential Revision: D27598249 Pulled By: chessai fbshipit-source-id: 42ae9115b1def48c58e50a6deb624c3407c029f3	2021-04-07 12:31:41 -07:00
Mustafa ALP	3157d2e553	Time Dimension for TR locale (#584 ) Summary: Added time dimension for Turkish language Pull Request resolved: https://github.com/facebook/duckling/pull/584 Differential Revision: D27235743 Pulled By: chessai fbshipit-source-id: 7419ff7373d942530f0eb35939acb9970b918672	2021-04-06 10:32:18 -07:00
Steven Troxler	4917d426cf	Style tweaks Summary: The facebook internal linters prefer us to avoid excessive point-free style and extra $ where we could instead move existing brackets. Making those style tweaks for Time/EN/Rules.hs because I was looking at the file as part of Reviewed By: chessai Differential Revision: D27108042 fbshipit-source-id: 7c8e76578476ea14d655131943e693c5159b12d2	2021-03-17 10:47:28 -07:00
Steven Troxler	78904b6680	Time - #444 Handle 2-digit date in existing d/m/y rule Summary: The pattern laied out in the bug report https://github.com/facebook/duckling/issues/444 is actually already handled by the pattern `<day-of-month>(ordinal or number)/<named-month>/year`. The problem is purely that the regular expression doesn't match 2-digit years, so the pattern is getting skipped rather than evaluated. This diff fixes the regexp and adds a new example with a 2-digit pattern. This fixes the bug report: ``` > debug (makeLocale EN Nothing) "10-Apr-15" [Seal Time] <day-of-month>(ordinal or number)/<named-month>/year (10-Apr-15) -- integer (numeric) (10) -- -- regex (10) -- regex (-) -- April (Apr) -- -- regex (Apr) -- regex (-) -- regex (15) [Entity {dim = "time", body = "10-Apr-15", value = RVal Time (TimeValue (SimpleValue (InstantValue {vValue = 2015-04-10 00:00:00 -0200, vGrain = Day})) [SimpleValue (InstantValue {vValue = 2015-04-10 00:00:00 -0200, vGrain = Day})] Nothing), start = 0, end = 9, latent = False, enode = Node {nodeRange = Range 0 9, token = Token Time TimeData{latent=False, grain=Day, form=Nothing, direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 0 2, token = Token Numeral (NumeralData {value = 10.0, grain = Nothing, multipliable = False, okForAnyTime = True}), children = [Node {nodeRange = Range 0 2, token = Token RegexMatch (GroupMatch ["10"]), children = [], rule = Nothing}], rule = Just "integer (numeric)"},Node {nodeRange = Range 2 3, token = Token RegexMatch (GroupMatch []), children = [], rule = Nothing},Node {nodeRange = Range 3 6, token = Token Time TimeData{latent=False, grain=Month, form=Just (Month {month = 4}), direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 3 6, token = Token RegexMatch (GroupMatch []), children = [], rule = Nothing}], rule = Just "April"},Node {nodeRange = Range 6 7, token = Token RegexMatch (GroupMatch []), children = [], rule = Nothing},Node {nodeRange = Range 7 9, token = Token RegexMatch (GroupMatch ["15"]), children = [], rule = Nothing}], rule = Just "<day-of-month>(ordinal or number)/<named-month>/year"}}] ``` Reviewed By: chessai Differential Revision: D27106007 fbshipit-source-id: 4751672aef807464febef87f6d22d7270bd335df	2021-03-17 10:35:00 -07:00
mustafaalp43@gmail.com	56fd7b0aaf	Feature/Turkish money (#579 ) Summary: Added amount of money dimension for Turkish language Pull Request resolved: https://github.com/facebook/duckling/pull/579 Test Plan: :test Endpoint.Duckling.Test Reviewed By: haoxuany, bugra Differential Revision: D27017300 Pulled By: chessai fbshipit-source-id: e8cb257a2953675f54269ed358948e8cbe38af7b	2021-03-17 10:35:00 -07:00
Steven Troxler	55168db92f	Update classifiers Summary: I was testing an unrelated change (which doesn't change classifier scores) and reran classifiers just to be safe, I noticed that the scores changed. This diff updates them. Reviewed By: chessai Differential Revision: D26892970 fbshipit-source-id: c7da3e3b7d01955f98b287a3ff4e7c1ff2837c7f	2021-03-08 14:02:45 -08:00
Steven Troxler	fc5278855d	Combine duplicated examples Summary: I was looking at adding support for "next week" constructions in Spanish to close https://github.com/facebook/duckling/issues/553 (which it appears has already been handled), when I noticed that the equivalent logic for English has been split into two separate examples: "coming week" isn't in the same example as other equivalent constructs like "upcoming week" and "next week". This diff combines them, which I think is clearer and fewer lines of code Reviewed By: chessai Differential Revision: D26892322 fbshipit-source-id: 68ca4644759198fc79d963ae080495c3f2d4a923	2021-03-08 13:47:06 -08:00
Mustafa ALP	b671d75d02	Typo correction (#574 ) Summary: This commit includes typo correction for `half` and `three` equivelant in Turkish Pull Request resolved: https://github.com/facebook/duckling/pull/574 Reviewed By: girifb Differential Revision: D26726718 Pulled By: chessai fbshipit-source-id: 840c2d8e491057b6ccec81562ff64356789f587d	2021-03-03 12:31:33 -08:00
Aaron Yue	f68ebf808a	forward factor parse tree for exploit in T85548324 Summary: due to exploit in T85548324, factoring the input to get a smaller parse tree (the existing one parses tail recursively, whereas this one uses ruleIntersect which is still bad, but slightly better). Differential Revision: D26657170 fbshipit-source-id: fe3a738073b4d30ae401521bb692f4a4bba48d96	2021-02-24 22:20:20 -08:00
Amit Manchanda	37b671389c	add: support for specific times in HI duration (#424 ) Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/424 Reviewed By: girifb Differential Revision: D26411920 Pulled By: chessai fbshipit-source-id: 3f0063e4786688579f2f53f46b31bda5d222d402	2021-02-17 14:48:07 -08:00
Aleksey Landyrev	590651150b	Add Time dimension for RU language Summary: Used `b40e2147a9` as reference Reviewed By: kappa Differential Revision: D24773196 Pulled By: chessai fbshipit-source-id: 7cc008c0ee80f930efd76e39bb16ca91ec94b641	2021-02-12 12:02:44 -08:00
Maurice Döpke	75af12524f	adds german time rule for expressions like: Montag in 3 Wochen (#332 ) Summary: closes https://github.com/facebook/duckling/issues/331 Pull Request resolved: https://github.com/facebook/duckling/pull/332 Reviewed By: girifb Differential Revision: D26283481 Pulled By: chessai fbshipit-source-id: 054c6467a69896ff3ebbd1f9bc0734aadf1b6dbe	2021-02-09 14:33:37 -08:00
Maurice Döpke	998b13bceb	Adds german times rules like "Übernächste Woche" (week after next) (#330 ) Summary: fixes https://github.com/facebook/duckling/issues/329 and allows for recognizing of terms like übernächste woche Pull Request resolved: https://github.com/facebook/duckling/pull/330 Reviewed By: girifb Differential Revision: D26284196 Pulled By: chessai fbshipit-source-id: 160e73668b835c83adb0fd1c396a8a2977e86516	2021-02-09 10:48:32 -08:00
evjava@yandex.ru	ff4a1a5bae	Be more permissive with numerals [20, 90] Summary: There are a handful of more spelling for russian numbers [20, 30 .. 90] that we aren't handling. Additionally, we optimise for recall over precision by allowing some invalid spellings that could be understandable typos. Reviewed By: patapizza Differential Revision: D26285711 Pulled By: chessai fbshipit-source-id: fd8a8f373d228a526e79b22326eff48bb966310d	2021-02-05 15:31:18 -08:00
kcnhk1@gmail.com	3f2f307735	Time - add more common expressions Summary: Added: last <duration> <time> <day-of-month> Reviewed By: haoxuany Differential Revision: D26263977 Pulled By: chessai fbshipit-source-id: b00ece753593a7fabe45bbaa9e1f013860e38d80	2021-02-04 16:32:11 -08:00
Amit Manchanda	c2b280c9ef	add: support for composite duration in hindi (#425 ) Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/425 Reviewed By: girifb Differential Revision: D26263097 Pulled By: chessai fbshipit-source-id: 29605023746a30dc286ffb246eb30fdc4067cbd8	2021-02-04 16:19:23 -08:00
kcnhk1@gmail.com	9a6aeb9b51	Distance - introduce interval rules Reviewed By: haoxuany Differential Revision: D26256269 Pulled By: chessai fbshipit-source-id: 0c3ca267158fd5189fef5540d5bbb903b0dd00b4	2021-02-04 12:02:32 -08:00
kcnhk1@gmail.com	722cc838ff	Volume - extend interval support Reviewed By: haoxuany Differential Revision: D26255089 Pulled By: chessai fbshipit-source-id: e4bdb0aa3c1be55dff0a5577155a3d0469d6762d	2021-02-04 12:02:32 -08:00
kcnhk1@gmail.com	67c1dbe94f	AmountOfMoney - extend interval support Reviewed By: haoxuany Differential Revision: D26254863 Pulled By: chessai fbshipit-source-id: dfc06f9831de2d50c11d252429c4fb9b8c1eb13a	2021-02-04 11:19:19 -08:00
kcnhk1@gmail.com	b6da3929ce	Extend distance rules Summary: Add rules: - one meter and <dist> - <dist> meters and <dist> Reviewed By: girifb Differential Revision: D26191350 Pulled By: chessai fbshipit-source-id: 52c85c94647e98fba866c24d3386eea988f7f58c	2021-02-03 15:01:39 -08:00
kcnhk1@gmail.com	776b1ec64d	extend AmountOfMoney rules Summary: Add rules: - `hkd` as HKD, and related rules (prefix and suffix) - dollar and <amount-of-money> rule - dollar and a half rule - intersection for <amount-of-money> and `a half` Changed: - dime and dollar rules now have improved coverage Reviewed By: girifb Differential Revision: D26191724 Pulled By: chessai fbshipit-source-id: bf63b6eaa751fb96dcf341fa2b66db06a6eeca79	2021-02-03 14:05:30 -08:00
Amr Keleg	e673ba5e84	Quantity/EN: Support k.g k.g. (#570 ) Summary: Adding . in between kilogram units used to be extracted as a Numeral instead of Quantity. Pull Request resolved: https://github.com/facebook/duckling/pull/570 Reviewed By: patapizza Differential Revision: D26199687 Pulled By: chessai fbshipit-source-id: 65e39f20296946d5762d7180b12878f4e66ea701	2021-02-03 12:46:27 -08:00
kcnhk1@gmail.com	496842d16a	Extend numeral rules Summary: - Extend fraction rule - add mixed fraction rules - add prefix of 10/100/10_000 rules Reviewed By: girifb Differential Revision: D26191175 Pulled By: chessai fbshipit-source-id: c2f6b74602e1b8061e0c556721ad8e36821fdb5c	2021-02-03 11:19:33 -08:00
jfulse	788f63eeac	Parse more date formats in Norwegian (#395 ) Summary: In general there are some clashes between time formats `hhmm` and date formats `ddmm`. For example, depending on context, `22.10` can mean clock time ten past ten or the twenty second of october. In general it's correct to interpret this as clock time, as Duckling currently does. But there are some cases not currently covered by Duckling where we have more unambiguous dates, e.g. `12.03.2018` and `27.11`. These are included here (in addition to midnight `24:00` which was also missing). #### Changes: - Bug in `ruleDdmm` regex meant that dates on the format `dd/mm` where `mm > 9` were not parsed - `ruleYyyymmdd` now also parses dots and forward slashes, i.e. `2012.05.14` and `2012/05/14` - New rule `rule2400` parses `24:00` and `24.00` (I elected not to include it in `ruleMidnighteodendOfDay` as it has grain minute rather than day) - New rule `ruleDmm` parses `1/10`, `9.12` etc - New rule `ruleDDm` parses `10/3`, `11.1` etc - New rule `ruleDdDotMm` parses `25.02`, `31.10` etc - `ruleDdmmyyyy` now also parses dots, i.e. `03.10.1983` - New tests Pull Request resolved: https://github.com/facebook/duckling/pull/395 Reviewed By: patapizza Differential Revision: D26193069 Pulled By: chessai fbshipit-source-id: cf711807fa1d40be2303f2426d74ded40c2e23b3	2021-02-02 23:18:48 -08:00
Maxime Biais	16708d9572	Minor Volume.FR improvement: add "Centilitre" type (#354 ) Summary: Minor Volume.FR improvement: add "Centilitre" type. This is useful for recipe parsing. Pull Request resolved: https://github.com/facebook/duckling/pull/354 Reviewed By: patapizza Differential Revision: D26193246 Pulled By: chessai fbshipit-source-id: ddd551e062b8efeff1e786e30e35815c0c29a34c	2021-02-01 22:48:34 -08:00
kcnhk1@gmail.com	61e06c3aa6	Add initial support for volumes in Chinese Reviewed By: girifb Differential Revision: D26183123 Pulled By: chessai fbshipit-source-id: 1acd27d5172cfb5bccbeb1576700e2c60a8e3907	2021-02-01 16:05:42 -08:00
Igor Kuzmenko	9993911e3b	Adds UAH currency Type and examples to EN and RU Corpus (#433 ) Summary: This PR adds UAH currency Type and examples to EN and RU Corpus Pull Request resolved: https://github.com/facebook/duckling/pull/433 Reviewed By: girifb Differential Revision: D25102990 Pulled By: chessai fbshipit-source-id: ed40e8dfcf145a65c7e6d87158da0efacb32e256	2021-02-01 14:32:24 -08:00
Daniel Cartwright	7193caafb9	parse latent year intervals Summary: adds a new rule that parses year intervals such as "1960 - 1961". see inline comments for heuristics. Reviewed By: patapizza Differential Revision: D25840835 fbshipit-source-id: 851a5b1c78440cbf065bf9f20a05c78d4967ea3c	2021-01-29 16:33:56 -08:00
Daniel Cartwright	33f0c17ee2	implement 'the day after tomorrow' in Romanian Summary: adds a rule for 'the day after tomorrow' in Romanian. regenerates classifiers. Reviewed By: girifb Differential Revision: D26155042 fbshipit-source-id: 80005ab94a10f9fbf242c9a712bd040e4f6bc477	2021-01-29 14:49:13 -08:00
Marcin Armatys	d5fac5f14e	Polish(PL) - Support for seventy, eighty, ninety (#417 ) Summary: Support for polish equivalents of seventy, eighty, ninety. Pull Request resolved: https://github.com/facebook/duckling/pull/417 Reviewed By: patapizza Differential Revision: D26130642 Pulled By: chessai fbshipit-source-id: 4a0be944dcd0a9dea155caae145cf4a38537753f	2021-01-29 11:47:36 -08:00
Nour Shalabi	6346cfe926	Add Arabic rule for a week ago (#379 ) Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/379 Reviewed By: patapizza Differential Revision: D26149123 Pulled By: chessai fbshipit-source-id: 5f0bca88fc1b64da5d93fcf715996d58a972fda2	2021-01-29 11:32:32 -08:00
Arjan Scherpenisse	d095b05060	NL/Duration: Support composite durations (#503 ) Summary: E.g. "1 uur en drie kwartier", "1 dag 4 uur", etc. Pull Request resolved: https://github.com/facebook/duckling/pull/503 Reviewed By: patapizza Differential Revision: D22260615 Pulled By: chessai fbshipit-source-id: 40689f7630b4d5bab498df730528ce6bf768fa89	2021-01-27 11:18:10 -08:00
kckckcng	a82684e723	Time&Duration/ZH: support Cantonese and more common expressions (#516-2) (#523 ) Summary: **2nd set of changes from pull request https://github.com/facebook/duckling/issues/516 Supporting Cantonese and more common expressions in Chinese. Adding rules file for Duration/ZH. Pull Request resolved: https://github.com/facebook/duckling/pull/523 Reviewed By: haoxuany Differential Revision: D23428901 Pulled By: chessai fbshipit-source-id: 6d04c97b63bac966eb61d77cab2f08f7543dbbf0	2021-01-26 15:17:45 -08:00
michaelmarien	28ddc3bff7	NL/amount-of-money (#504 ) Summary: Currently values like 1000.000 (in Dutch . is thousand separator) are not recognised, as the ruleDecimalWithThousandsSeparator requires the decimal part (e.g. 1000.000,34) to be present. This PR adds some data and changes the ruleDecimalWithThousandsSeparator to make the decimal part optional. Pull Request resolved: https://github.com/facebook/duckling/pull/504 Reviewed By: patapizza, girifb Differential Revision: D26078885 Pulled By: chessai fbshipit-source-id: b1679c713e1d17a168d34a3cc556b6c36a571d75	2021-01-26 12:33:14 -08:00
kckckcng	f2798021b6	Numeral/ZH: support more common expressions (#516-1) (#522 ) Summary: **1st set of changes from pull request https://github.com/facebook/duckling/issues/516 Supporting more common expressions, such as fraction, half, dozen, in Chinese. Pull Request resolved: https://github.com/facebook/duckling/pull/522 Reviewed By: patapizza Differential Revision: D23428893 Pulled By: chessai fbshipit-source-id: 3454ac70a4bfff90dc282560916a0fae9969f521	2021-01-21 21:17:54 -08:00
Sam Coope	e9e5507820	Add ASAP, at the moment to EN time (#405 ) Summary: * "at the moment" is considered identical to "now". * "ASAP" is considered identical to "from now" Pull Request resolved: https://github.com/facebook/duckling/pull/405 Reviewed By: patapizza Differential Revision: D26009483 Pulled By: chessai fbshipit-source-id: addf4c509e69d413cae279601c64f72710eba11f	2021-01-21 20:47:40 -08:00
Daniel Cartwright	1ba1aedeba	Correct CDT TimeZone offset Summary: CDT is UTC -5. (-5 hours) * (60 minutes/hour) = -300 hours. 540 was probably copy/paste error. Reviewed By: girifb Differential Revision: D25877623 fbshipit-source-id: de4f84f2564cbb154aec95eee63c458c64f8a85f	2021-01-12 14:02:52 -08:00
chessai	40cdb88982	Add CreditCardNumber to common dimensions (#563 ) Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/563 Reviewed By: girifb Differential Revision: D25624047 Pulled By: chessai fbshipit-source-id: b50cf34f4a28bfcbd4a0ca3479debc5a5c118b5e	2021-01-05 13:18:19 -08:00
Wojtek Przechodzeń	10eee56f10	Time/PL - new rules (#538 ) Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/538 Reviewed By: haoxuany Differential Revision: D24640854 Pulled By: chessai fbshipit-source-id: 51eb0d530b143511f79992a91ca8f465b7860b6e	2020-12-16 13:47:49 -08:00
chaitu9701	28cb5ebd2a	Adding Numerical Dimention support for Telugu language (#470 ) Summary: This pull request is to add support for Telugu language (Numerical Dimension) to Duckling Pull Request resolved: https://github.com/facebook/duckling/pull/470 Differential Revision: D25546700 Pulled By: chessai fbshipit-source-id: 1d88ee27da8a577a4a79ff31be8cb55ed6444c4e	2020-12-15 17:48:03 -08:00
Amit Manchanda	724325b02f	add: support for quarter to, quarter past and half in HI (#423 ) Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/423 Reviewed By: girifb Differential Revision: D25573001 Pulled By: chessai fbshipit-source-id: 5474f108e968bdfb53ebc2518b46f28befdeba89	2020-12-15 17:02:28 -08:00
Amr Keleg	703ff13210	Add a new Arabic locale (EG) (#554 ) Summary: Egyptian Arabic is a dialect of Arabic that is mostly a spoken language that is used in everyday communications. This PR adds new locale to Arabic to support the differences between Modern Standard Arabic (MSA) and Egyptian Arabic (EG). I have mainly depended on the different locales of Spanish that are supported by Duckling to create the new Egyptian Arabic locale. New modifications are added to the `Numeral` dimension since I didn't spot differences in other dimensions. Pull Request resolved: https://github.com/facebook/duckling/pull/554 Reviewed By: patapizza Differential Revision: D25543502 Pulled By: chessai fbshipit-source-id: 4cbb7be78a52071c8681380077f0b4dc033a60de	2020-12-15 11:33:40 -08:00
Daniel Cartwright	181037e469	Support abbreviation of Crore and Lakh Summary: Crore (1e7) and Lakh (1e5) are both commonly used to describe an amount of Indian currency. Common abbreviations are "Cr" (Crore) and "lkh", "L", "lac" (lakh). Additionally, common spellings of "crore" include "karor" and "koti" Reviewed By: patapizza Differential Revision: D25550546 fbshipit-source-id: 0c1479d9027431cb0d1182b5117eabca6f939cb2	2020-12-15 11:18:05 -08:00
moozzyk	c33249b4dd	Fix typo in PL Duration Rules (#426 ) Summary: 'miej' in Polish is the imperative form of the verb 'mieć' (to have). "mniej więcej" means "more or less" and it was the intention here. Pull Request resolved: https://github.com/facebook/duckling/pull/426 Reviewed By: patapizza, girifb Differential Revision: D25546380 Pulled By: chessai fbshipit-source-id: 1047b83109cab917f1f4dbe87b667f8ccd2fb92d	2020-12-14 16:32:05 -08:00
Hernan Barijhoff	f053b14676	ES/Ordinal: Fixes "tercero" pattern regex (#477 ) Summary: Missing "tercer" regex in rule Pull Request resolved: https://github.com/facebook/duckling/pull/477 Reviewed By: patapizza Differential Revision: D24934794 Pulled By: chessai fbshipit-source-id: a51f6fe3187749885784bfaacfee09cf26a8df6d	2020-11-19 13:48:43 -08:00
Christoph Flick	d0a6f8114c	Improve german time approximation (#435 ) Summary: Improves the recognition of German time approximation language and removes a single error in the rule of <time-of-day> approximately. Pull Request resolved: https://github.com/facebook/duckling/pull/435 Reviewed By: patapizza Differential Revision: D24934281 Pulled By: chessai fbshipit-source-id: 641bcb6a7e5c26e66c735fe13bccae9b7a8909ae	2020-11-19 13:48:42 -08:00
Sajjad Heydari	700118644c	FA Setup (#520 ) Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/520 Reviewed By: patapizza Differential Revision: D25072459 Pulled By: chessai fbshipit-source-id: 5db72eda36fe166a452b2345cab75fb1508b192b	2020-11-19 12:20:00 -08:00
Harisankar H	11595b7377	Support for more Hindi numbers (#552 ) Summary: Add support for additional Hindi numbers like 300, 81, 150, 1000, 1520. These are not supported in the current master version. Pull Request resolved: https://github.com/facebook/duckling/pull/552 Reviewed By: ashwinp-fb, girifb Differential Revision: D25072230 Pulled By: chessai fbshipit-source-id: 35277a2349384bcf44a20e74852113f5c010e618	2020-11-18 17:04:29 -08:00
Daniel Cartwright	58cf66589f	make duckling time not treat 0:xx and 12:xx ambiguously Reviewed By: haoxuany Differential Revision: D24929661 fbshipit-source-id: 3858d14ef1655f079daa33d2b159e8cb918a70ac	2020-11-12 14:19:04 -08:00
chessai	cdeefe1d4d	ghc88x compat (#550 ) Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/550 Reviewed By: haoxuany Differential Revision: D24844625 Pulled By: chessai fbshipit-source-id: 52dcf5f9488386f7f407535e876bff1207823fe0	2020-11-12 13:47:46 -08:00
Dmitri Osipov	e7264b55c9	adds frequent durations in German (#509 ) Summary: Found a lacking frequent duration in German and a small typo in the existing one. Pull Request resolved: https://github.com/facebook/duckling/pull/509 Reviewed By: patapizza Differential Revision: D24690104 Pulled By: chessai fbshipit-source-id: b49a7a636abf5b92f2fe7c0d5b2ca2fe64acbaa2	2020-11-09 11:18:35 -08:00
Daniel Cartwright	eb043d7018	Quantity rules for Spanish (ES) Summary: Spanish (ES) will now have all the same quantity rules as English (EN) (which I think is the most-supported language), plus more. This includes the following: * bowls - (bol(es)?\|tazón(es)?\|cuencos?\|platos? (soperos?)\|(hondos?)) (EN does not currently have this) * cups - (tazas?) * dishes - (platos?\|fuentes?) (EN does not currently have this) * grams - (((m(ili)?)\|(k(ilo)?))?g(ramo)?s?) * ounces - ((onzas?)\|oz) * pints - (pintas?) (EN does not currently have this) * pounds - ((lb\|libra)s?) * quarts - (cuartos? de galón) (EN does not currently have this) * tablespoons - (cucharadas? (grande)?) (EN does not currently have this) * teaspoons - (cucharaditas?) (EN does not currently have this) Reviewed By: patapizza Differential Revision: D24628214 fbshipit-source-id: 2e8d500661f30fa0928cb7d3f21470afc01e2285	2020-11-09 11:18:35 -08:00
Victor Pothin	bfc75849d2	Adds new rules of accentuation of the Portuguese (#531 ) Summary: Keeps accents consistent, "quinquagésimo" there is no more "Ü". Pull Request resolved: https://github.com/facebook/duckling/pull/531 Reviewed By: patapizza Differential Revision: D23770703 Pulled By: chessai fbshipit-source-id: f8a34c02028faf9f51eca6a016b5bad988a83f04	2020-11-02 12:17:57 -08:00
Josef Svenningsson	7889f396f3	Remove dependency on Data.Some (#533 ) Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/533 In recent versions of Data.Some the name of the constructor, `This` has changed name to `Some`. This has become rather problematic for us to migrate so we're just going to remove the dependency. The meat of this diff is adding the type `Seal` to `Duckling.Types`. That type replaces `Some`. Reviewed By: pepeiborra Differential Revision: D23929459 fbshipit-source-id: 8ff4146ecba4f1119a17899961b2d877547f6e4f	2020-09-28 01:33:01 -07:00
Julien Odent	7ba9ea8aeb	Time/EN: Fix empty group match Summary: sad_palpatine Differential Revision: D23718913 fbshipit-source-id: 363bf9a43d8d1cd77405882bc70a7fa1a1de2dbe	2020-09-15 17:22:00 -07:00
Julien Odent	ef2b1b1b0e	Time/FR: Some speed up Summary: Guarding against grains, shortening regexes. Reviewed By: jtliao Differential Revision: D23387716 fbshipit-source-id: de84d0efa79c4ae10bd9fbf14e82a724fee1a1f2	2020-08-28 09:48:15 -07:00
Arjan Scherpenisse	df2ada617a	NL/Duration: Add "anderhalf uur" (#502 ) Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/502 Reviewed By: patapizza Differential Revision: D22260625 Pulled By: haoxuany fbshipit-source-id: bf44fdab7def19f6dd0e0ef7763c112a3b024396	2020-08-05 15:34:05 -07:00
Julien Odent	3d5e1c3bad	Time/DE: Don't parse "so" Summary: "so" is an adverb in German: https://github.com/wit-ai/wit/issues/1860 It's also a short form for "Sonntag" (Sunday); making the dot mandatory. Reviewed By: haoxuany Differential Revision: D22900791 fbshipit-source-id: 8dc873f79a21ca2add074f9c664e84fae56f1e67	2020-08-03 12:34:49 -07:00
Bing Yuan	a88e0669f7	Fixed the rule for parsing "coming <time cycle>" Summary: Currently the term "coming" is being treated the same way as "this" or "current". The expected treatment should be the same as the term "next". Reviewed By: chinmay87 Differential Revision: D22435156 fbshipit-source-id: b0b20d8a38014267fb7d037b685ce126f602bda7	2020-07-17 13:17:18 -07:00
Bing Yuan	5af4d617ba	Fixed a problem in parsing mult-word timestamp for ES Summary: Current: "seis cero cinco pm" [dimension Time] -> "cero cinco pm" or "5 pm" here the term "seis" was dropped because it was treated as "6" in "Numeral" dimension. Expected: "seis cero cinco pm" -> "6:05 pm" The root cause was that the rule "<hour-of-day> <integer> (as relative minutes)" dropped the first term "hour-of-day" if it was parsed as a latent token. Reviewed By: chinmay87 Differential Revision: D22553028 fbshipit-source-id: abc92bb369c23d2b3084641eab2a2dabb87dbc66	2020-07-17 11:38:43 -07:00
Bing Yuan	780bd0aac5	Fixed the problem parsing "next <day-of-week>" Summary: If the current time is: 07/07/2020 (tuesday), Current: "next saturday" -> 07/11/2020 Expected: "next saturday" -> 07/18/2020 According to Quora (https://www.quora.com/When-is-this-Monday-and-next-Monday-Are-they-the-same#:~:text='Next%20Monday'%20is%20Monday%20of,the%20first%20Monday%20after%20today.), the term "next saturday" means the first saturday in the week after current (this) week, regardless the current day of week. Reviewed By: haoxuany Differential Revision: D22420499 fbshipit-source-id: c2bd28b9fda78ff3cb0418a50c3b302be350b02d	2020-07-15 14:47:41 -07:00
Bing Yuan	9c1ab0de69	Tweak the rule for parsing "tomorrow" in ES Summary: There are two rules for parsing "manana" (dimension: Time): one is resolved to "morning"; while the other is resolved to "tomorrow". And the first (or "morning") rule resolves to a LATENT result; while the second (or "tomorrow") rule resolves to a NON-LATENT result. If the duckling is called with "latent" option turned off, the "tomorrow" rule prevails. However, if the duckling is invoked with "latent" option turned on, the "morning" rule is preferred. The solution (for now) is to steer the classifier towards "tomorrow" rule by adding large number of (same) examples for "tomorrow" rule. Reviewed By: chinmay87 Differential Revision: D22425277 fbshipit-source-id: 2f139eec0c38b9b5227f27d9f09f6264e7cf86cd	2020-07-15 12:08:20 -07:00
Bing Yuan	82e976b77d	Added support for parsing year composed of multiple ES words Summary: The root cause is this lacking of support for the composition of numerals in ES. For example, "mil novecientos noventa" is parsed 3 individual numbers: 1000, 900 and 90 correspondingly. Instead, the expected result is a single numeral value that is the sum of aforementioned three numbers. The same expection can be extended to the composition with arbitrary number of numeral values. Reviewed By: chinmay87 Differential Revision: D22192034 fbshipit-source-id: 476489145b83297b82d88f3451020c867e2d08aa	2020-07-06 17:02:59 -07:00
Bing Yuan	857aa16d06	added support to parse oridinal day-of-week Summary: Current: "first monday of last month" -> the date of first monday starting from current time. Note here the term "last month" is dropped Expected: "first monday of last month" -> the date of first monday of previous month. Reviewed By: chinmay87 Differential Revision: D22300243 fbshipit-source-id: 16622860c52ec2ce9c7a7bcd6094192255aa5a0b	2020-07-06 15:39:57 -07:00
Bing Yuan	c7aed76c5a	added new rule to handle ES phrase for next week (#497 ) Summary: Current: "siquiente semana" -> [] // empty result Expected: "siquiete semana" -> "next week" Pull Request resolved: https://github.com/facebook/duckling/pull/497 Test Plan: haxlsh> H.io $ debug (makeLocale ES Nothing) "siguiente semana" [This Time] Reviewed By: chinmay87 Differential Revision: D22054455 Pulled By: yuanbing fbshipit-source-id: 576e96a49eebace9b5baa382efac2e266e651d8e	2020-07-06 12:50:45 -07:00
Bing Yuan	44007b76d3	Add support for spelled out time of Summary: Current: "twelve zero three" -> 12:00pm Expected: "twelve zero three" -> 12:03pm The root cause was that duckling doesn't support this kind of pattern for timestamp. The uniqueness here was that the number "three" was spelled as "zero three" that Duckling failed to understand. Reviewed By: chinmay87 Differential Revision: D22313140 fbshipit-source-id: 9e481a142a16b94c61b1770e7f8be036497419f8	2020-07-06 12:17:25 -07:00
Bing Yuan	a78aacfc50	Updated the rule to parse "last <day-of-week> of <time>" Summary: current: last friday in october -> the date of Friday of previous week expected: last friday in october -> the data of last Friday of month october Reviewed By: chinmay87 Differential Revision: D22201326 fbshipit-source-id: 1983c1b9c24aa356977af7def42d5ba07c7f08be	2020-06-25 16:04:17 -07:00
Bing Yuan	36a3d2011f	Added new rule to parse ES phrase for time of day (in the afternoon) (#496 ) Summary: Current: "seis dos de lar tarde" -> "dos de lar tarde" or 2pm; note that the term "seis" is dropped. Expected: "seis dos de lar tarde" -> "seis dos de lar tarde" or 6:02pm Pull Request resolved: https://github.com/facebook/duckling/pull/496 Test Plan: H.io $ debug (makeLocale ES Nothing) "seis dos de la tarde" [This Time] Reviewed By: chinmay87 Differential Revision: D22054328 Pulled By: yuanbing fbshipit-source-id: 1ecb05885fc506176cc04768aa158279c7e7fd4f	2020-06-25 15:07:32 -07:00
Bing Yuan	eb9ddcbd95	Fixed a problem in parsing ES timestamp Summary: There are two types of ES phrases for timestamp to support: 1. "para las seis cero dos pm" 2. "para las 6 0 2 pm" The solution is to: 1. added a new rule to parse two-digit number between 1 and 9 (inclusive); 2. modified the regex pattern to support additional optional phrase "para" in front of "las". Reviewed By: chinmay87 Differential Revision: D22218800 fbshipit-source-id: 58f692beb6f10834c0ab639b31bf239bf4a1970e	2020-06-25 12:49:39 -07:00
Bing Yuan	1ad3a8514e	added new rule to parse phrase in the pattern "xxx minutes to <hour-of-day>" (#500 ) Summary: Current: 20 minutes to 2pm tomorrow -> 20 minutes (dimension: Time) Expected: 20 minutes to 2pm tomorrow -> 1:45pm of next day (dimension: Time) Pull Request resolved: https://github.com/facebook/duckling/pull/500 Reviewed By: chinmay87 Differential Revision: D22200580 Pulled By: yuanbing fbshipit-source-id: e47e5b5aaf4e3644c7032096caa75672a8543087	2020-06-25 11:21:29 -07:00
Bing Yuan	e570acd2f9	Added new rule support composite duration phrase in ES (#498 ) Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/498 Test Plan: In haxlsh: H.io $ debug (makeLocale ES Nothing) "dos hora y treinta y cinco minutos" [This Duration] Reviewed By: chinmay87 Differential Revision: D22054695 Pulled By: yuanbing fbshipit-source-id: b4486141bf7ccb0e538e40ce40fadd7daef374a8	2020-06-25 09:47:32 -07:00
Bing Yuan	7b2def024e	support "noon" phrase in ES Summary: This fix is to add support to parse alternative phrase, in ES, for "noon". Currently the supported ES phrase for "noon" is "mediodia", the alternative form is "medio<whitespace*>dia". Reviewed By: chinmay87 Differential Revision: D22188049 fbshipit-source-id: 798b83be75798f3b0d695a0f01a65dc84af98e22	2020-06-24 16:36:05 -07:00
Bing Yuan	dddb4adf23	Updated the rule to parse ordinal day of month in ES (#495 ) Summary: the rule is updated to conform with natural expression of "ordinal day of month". Pull Request resolved: https://github.com/facebook/duckling/pull/495 Differential Revision: D22054297 Pulled By: yuanbing fbshipit-source-id: d9d8e00311d4d3121685ab5b09f6c1f52f3077c9	2020-06-24 11:47:22 -07:00
Bing Yuan	195a9d7aa1	Added new rule to support ES phrase for "next week". (#493 ) Summary: Please note that the major diff with the existing rule for next week is that the new phrase doesn't have the leading "la" or anything with similar meaning. Pull Request resolved: https://github.com/facebook/duckling/pull/493 Test Plan: Imported from GitHub, without a Test Plan: line. Reviewed By: patapizza Differential Revision: D21981169 Pulled By: yuanbing fbshipit-source-id: 7478d1262c3a4599d359b485b28a547ad5f44b76	2020-06-24 11:02:24 -07:00
Bing Yuan	8cf3fdb581	Fix a problem with parsing ES time phrase Summary: The root cause was the error in parsing the ES numeral value [1-9] that spelled with two words instead of one. For example "cero dos" should be parsed the as "dos". Currently it's being as two numeral values: 0 and 3. Reviewed By: chinmay87 Differential Revision: D22162804 fbshipit-source-id: 949956935a21e742f6788e7afa788ff728dd9a8d	2020-06-22 12:03:15 -07:00
Bing Yuan	097b9260d5	Added new rules to parse phrases for upcoming weeks. (#491 ) Summary: the new rules could parse phrases in the form of xxx upcoming weeks upcoming xxx weeks Pull Request resolved: https://github.com/facebook/duckling/pull/491 Test Plan: Imported from GitHub, without a Test Plan: line. Differential Revision: D21959647 Pulled By: chinmay87 fbshipit-source-id: a062a8c7a6c2e23b921b1099b886fa589c69c454	2020-06-17 14:32:59 -07:00
Cody Ohlsen	474ae1b851	Duckling probabilistic layer bug fix Summary: while computing a score used to rank in Duckling, it currently sums up the log likelihoods learned during training. While ranking, the goal is to find the (same span) parse candidate which is _more_ likely to lead to a correct parse. However, the old logic was summing up the "more confident of the two classes" log likelihood.From what I understand this is the part which feels wrong. I created an example of two rules: #1. a rule where the classifier learns that the rule is very confidently NOT the correct parse. - okdata (positive class) is very low confidence (high negative number prior) - kodata (negative class) is very high confidence (low negative number prior) #2. a rule where the classifier is confident that it is the correct parse, but not Very Confident. - okdata (positive class) is high confidence (nonzero, but low negative number prior) - kodata (negative class) is very low confidence (high negative number prior) these two rules match the same regex, thus the same span. While duckling parses it, it turns out, that rule #1 ranks higher than rule #2. The reason why is because #1 is MORE confident that it is the INCORRECT (does not contribute to) parse than rule #2. Does this make sense? to solve this problem, I changed the ranking score estimation to use only the positive class scores (okdata). In the example above, it fixes it so rule #2 would end up ranking higher because the positive class confidence is higher than #1's positive class confidence. Would really love some deeper input from Duckling experts. I re-learned haskell and learned haxl to craft a small example here, and I am very new to Duckling (just started reading the ranking code on Friday). I know Duckling is battle-tested but I also don't believe that means a bug can't exist. And further, this specific bug may not happen a whole lot for 2 reasons: - there are not a lot of rules which end up higher negative confidence than positive (requires enough negative corpus examples over positive ones) - ranking uses span width first, and only when the spans are equivalent does the score based ranking come into play. So it requires that 2 rules match the same span before any actual score calculation even matters. Reviewed By: patapizza Differential Revision: D22009276 fbshipit-source-id: 13491689d39d810da526fa4bb8b6e526d4cafd35	2020-06-12 16:06:11 -07:00
byuan	558b38c1cb	Fixed the problem with parsing fractional hour phrase that contains "quarter" or "quarters" (#485 ) Summary: Current: if the fractional hour expression describes the hour fraction with term like "quarter or quarters", then duckling couldn't correctly recognize it. Expected: Duckling should be able to identify this kind of expression and parse it correctly. Fix: Add new rule to parse the fractional hour pattern that contains the keyword like "quarter or quarters". Pull Request resolved: https://github.com/facebook/duckling/pull/485 Test Plan: Imported from GitHub, without a Test Plan: line. Reviewed By: haoxuany Differential Revision: D21850804 Pulled By: chinmay87 fbshipit-source-id: 818b7b3f37e3f8a6d1a7d579db19fb2cfb2763f4	2020-06-10 12:19:28 -07:00
Bing Yuan	220c0f2d7d	Added support for parsing new ES duration phrases like half hour, quarter of hour. (#489 ) Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/489 Differential Revision: D21959268 Pulled By: chinmay87 fbshipit-source-id: 2b785b44da5437c7b27af098daef551139dad990	2020-06-09 15:16:38 -07:00

1 2 3 4 5 ...

651 Commits