Commit Graph

566 Commits

Author SHA1 Message Date
kcnhk1@gmail.com
3f2f307735 Time - add more common expressions
Summary:
Added:
last <duration>
<time> <day-of-month>

Reviewed By: haoxuany

Differential Revision: D26263977

Pulled By: chessai

fbshipit-source-id: b00ece753593a7fabe45bbaa9e1f013860e38d80
2021-02-04 16:32:11 -08:00
Amit Manchanda
c2b280c9ef add: support for composite duration in hindi (#425)
Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/425

Reviewed By: girifb

Differential Revision: D26263097

Pulled By: chessai

fbshipit-source-id: 29605023746a30dc286ffb246eb30fdc4067cbd8
2021-02-04 16:19:23 -08:00
kcnhk1@gmail.com
9a6aeb9b51 Distance - introduce interval rules
Reviewed By: haoxuany

Differential Revision: D26256269

Pulled By: chessai

fbshipit-source-id: 0c3ca267158fd5189fef5540d5bbb903b0dd00b4
2021-02-04 12:02:32 -08:00
kcnhk1@gmail.com
722cc838ff Volume - extend interval support
Reviewed By: haoxuany

Differential Revision: D26255089

Pulled By: chessai

fbshipit-source-id: e4bdb0aa3c1be55dff0a5577155a3d0469d6762d
2021-02-04 12:02:32 -08:00
kcnhk1@gmail.com
67c1dbe94f AmountOfMoney - extend interval support
Reviewed By: haoxuany

Differential Revision: D26254863

Pulled By: chessai

fbshipit-source-id: dfc06f9831de2d50c11d252429c4fb9b8c1eb13a
2021-02-04 11:19:19 -08:00
kcnhk1@gmail.com
b6da3929ce Extend distance rules
Summary:
Add rules:
- one meter and <dist>
- <dist> meters and <dist>

Reviewed By: girifb

Differential Revision: D26191350

Pulled By: chessai

fbshipit-source-id: 52c85c94647e98fba866c24d3386eea988f7f58c
2021-02-03 15:01:39 -08:00
kcnhk1@gmail.com
776b1ec64d extend AmountOfMoney rules
Summary:
Add rules:
- `hkd` as HKD, and related rules (prefix and suffix)
- dollar and <amount-of-money> rule
- dollar and a half rule
- intersection for <amount-of-money> and `a half`

Changed:
- dime and dollar rules now have improved coverage

Reviewed By: girifb

Differential Revision: D26191724

Pulled By: chessai

fbshipit-source-id: bf63b6eaa751fb96dcf341fa2b66db06a6eeca79
2021-02-03 14:05:30 -08:00
Amr Keleg
e673ba5e84 Quantity/EN: Support k.g k.g. (#570)
Summary:
Adding . in between kilogram units used to be extracted as a Numeral
instead of Quantity.

Pull Request resolved: https://github.com/facebook/duckling/pull/570

Reviewed By: patapizza

Differential Revision: D26199687

Pulled By: chessai

fbshipit-source-id: 65e39f20296946d5762d7180b12878f4e66ea701
2021-02-03 12:46:27 -08:00
kcnhk1@gmail.com
496842d16a Extend numeral rules
Summary:
- Extend fraction rule
- add mixed fraction rules
- add prefix of 10/100/10_000 rules

Reviewed By: girifb

Differential Revision: D26191175

Pulled By: chessai

fbshipit-source-id: c2f6b74602e1b8061e0c556721ad8e36821fdb5c
2021-02-03 11:19:33 -08:00
jfulse
788f63eeac Parse more date formats in Norwegian (#395)
Summary:
In general there are some clashes between time formats `hhmm` and date formats `ddmm`. For example, depending on context, `22.10` can mean clock time ten past ten or the twenty second of october. In general it's correct to interpret this as clock time, as Duckling currently does.

But there are some cases not currently covered by Duckling where we have more unambiguous dates, e.g. `12.03.2018` and `27.11`. These are included here (in addition to midnight `24:00` which was also missing).

#### Changes:

- Bug in `ruleDdmm` regex meant that dates on the format `dd/mm` where `mm > 9` were not parsed
- `ruleYyyymmdd` now also parses dots and forward slashes, i.e. `2012.05.14` and `2012/05/14`
- New rule `rule2400` parses `24:00` and `24.00` (I elected not to include it in `ruleMidnighteodendOfDay` as it has grain minute rather than day)
- New rule `ruleDmm` parses `1/10`, `9.12` etc
- New rule `ruleDDm` parses `10/3`, `11.1` etc
- New rule `ruleDdDotMm` parses `25.02`, `31.10` etc
- `ruleDdmmyyyy` now also parses dots, i.e. `03.10.1983`
- New tests

Pull Request resolved: https://github.com/facebook/duckling/pull/395

Reviewed By: patapizza

Differential Revision: D26193069

Pulled By: chessai

fbshipit-source-id: cf711807fa1d40be2303f2426d74ded40c2e23b3
2021-02-02 23:18:48 -08:00
Maxime Biais
16708d9572 Minor Volume.FR improvement: add "Centilitre" type (#354)
Summary:
Minor Volume.FR improvement: add "Centilitre" type. This is useful for recipe parsing.

Pull Request resolved: https://github.com/facebook/duckling/pull/354

Reviewed By: patapizza

Differential Revision: D26193246

Pulled By: chessai

fbshipit-source-id: ddd551e062b8efeff1e786e30e35815c0c29a34c
2021-02-01 22:48:34 -08:00
kcnhk1@gmail.com
61e06c3aa6 Add initial support for volumes in Chinese
Reviewed By: girifb

Differential Revision: D26183123

Pulled By: chessai

fbshipit-source-id: 1acd27d5172cfb5bccbeb1576700e2c60a8e3907
2021-02-01 16:05:42 -08:00
Igor Kuzmenko
9993911e3b Adds UAH currency Type and examples to EN and RU Corpus (#433)
Summary:
This PR adds UAH currency Type and examples to EN and RU Corpus

Pull Request resolved: https://github.com/facebook/duckling/pull/433

Reviewed By: girifb

Differential Revision: D25102990

Pulled By: chessai

fbshipit-source-id: ed40e8dfcf145a65c7e6d87158da0efacb32e256
2021-02-01 14:32:24 -08:00
Daniel Cartwright
7193caafb9 parse latent year intervals
Summary: adds a new rule that parses year intervals such as "1960 - 1961". see inline comments for heuristics.

Reviewed By: patapizza

Differential Revision: D25840835

fbshipit-source-id: 851a5b1c78440cbf065bf9f20a05c78d4967ea3c
2021-01-29 16:33:56 -08:00
Daniel Cartwright
33f0c17ee2 implement 'the day after tomorrow' in Romanian
Summary: adds a rule for 'the day after tomorrow' in Romanian. regenerates classifiers.

Reviewed By: girifb

Differential Revision: D26155042

fbshipit-source-id: 80005ab94a10f9fbf242c9a712bd040e4f6bc477
2021-01-29 14:49:13 -08:00
Marcin Armatys
d5fac5f14e Polish(PL) - Support for seventy, eighty, ninety (#417)
Summary:
Support for polish equivalents of seventy, eighty, ninety.

Pull Request resolved: https://github.com/facebook/duckling/pull/417

Reviewed By: patapizza

Differential Revision: D26130642

Pulled By: chessai

fbshipit-source-id: 4a0be944dcd0a9dea155caae145cf4a38537753f
2021-01-29 11:47:36 -08:00
Nour Shalabi
6346cfe926 Add Arabic rule for a week ago (#379)
Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/379

Reviewed By: patapizza

Differential Revision: D26149123

Pulled By: chessai

fbshipit-source-id: 5f0bca88fc1b64da5d93fcf715996d58a972fda2
2021-01-29 11:32:32 -08:00
Arjan Scherpenisse
d095b05060 NL/Duration: Support composite durations (#503)
Summary:
E.g. "1 uur en drie kwartier", "1 dag 4 uur", etc.

Pull Request resolved: https://github.com/facebook/duckling/pull/503

Reviewed By: patapizza

Differential Revision: D22260615

Pulled By: chessai

fbshipit-source-id: 40689f7630b4d5bab498df730528ce6bf768fa89
2021-01-27 11:18:10 -08:00
kckckcng
a82684e723 Time&Duration/ZH: support Cantonese and more common expressions (#516-2) (#523)
Summary:
**2nd set of changes from pull request https://github.com/facebook/duckling/issues/516

Supporting Cantonese and more common expressions in Chinese.
Adding rules file for Duration/ZH.

Pull Request resolved: https://github.com/facebook/duckling/pull/523

Reviewed By: haoxuany

Differential Revision: D23428901

Pulled By: chessai

fbshipit-source-id: 6d04c97b63bac966eb61d77cab2f08f7543dbbf0
2021-01-26 15:17:45 -08:00
michaelmarien
28ddc3bff7 NL/amount-of-money (#504)
Summary:
Currently values like 1000.000 (in Dutch . is thousand separator) are not recognised, as the ruleDecimalWithThousandsSeparator requires the decimal part (e.g. 1000.000,34) to be present. This PR adds some data and changes the ruleDecimalWithThousandsSeparator to make the decimal part optional.

Pull Request resolved: https://github.com/facebook/duckling/pull/504

Reviewed By: patapizza, girifb

Differential Revision: D26078885

Pulled By: chessai

fbshipit-source-id: b1679c713e1d17a168d34a3cc556b6c36a571d75
2021-01-26 12:33:14 -08:00
kckckcng
f2798021b6 Numeral/ZH: support more common expressions (#516-1) (#522)
Summary:
**1st set of changes from pull request https://github.com/facebook/duckling/issues/516

Supporting more common expressions, such as fraction, half, dozen, in Chinese.

Pull Request resolved: https://github.com/facebook/duckling/pull/522

Reviewed By: patapizza

Differential Revision: D23428893

Pulled By: chessai

fbshipit-source-id: 3454ac70a4bfff90dc282560916a0fae9969f521
2021-01-21 21:17:54 -08:00
Sam Coope
e9e5507820 Add ASAP, at the moment to EN time (#405)
Summary:
* "at the moment" is considered identical to "now".
* "ASAP" is considered identical to "from now"

Pull Request resolved: https://github.com/facebook/duckling/pull/405

Reviewed By: patapizza

Differential Revision: D26009483

Pulled By: chessai

fbshipit-source-id: addf4c509e69d413cae279601c64f72710eba11f
2021-01-21 20:47:40 -08:00
Daniel Cartwright
1ba1aedeba Correct CDT TimeZone offset
Summary: CDT is UTC -5. (-5 hours) * (60 minutes/hour) = -300 hours. 540 was probably copy/paste error.

Reviewed By: girifb

Differential Revision: D25877623

fbshipit-source-id: de4f84f2564cbb154aec95eee63c458c64f8a85f
2021-01-12 14:02:52 -08:00
chessai
40cdb88982 Add CreditCardNumber to common dimensions (#563)
Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/563

Reviewed By: girifb

Differential Revision: D25624047

Pulled By: chessai

fbshipit-source-id: b50cf34f4a28bfcbd4a0ca3479debc5a5c118b5e
2021-01-05 13:18:19 -08:00
Wojtek Przechodzeń
10eee56f10 Time/PL - new rules (#538)
Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/538

Reviewed By: haoxuany

Differential Revision: D24640854

Pulled By: chessai

fbshipit-source-id: 51eb0d530b143511f79992a91ca8f465b7860b6e
2020-12-16 13:47:49 -08:00
chaitu9701
28cb5ebd2a Adding Numerical Dimention support for Telugu language (#470)
Summary:
This pull request is to add support for Telugu language (Numerical Dimension) to Duckling

Pull Request resolved: https://github.com/facebook/duckling/pull/470

Differential Revision: D25546700

Pulled By: chessai

fbshipit-source-id: 1d88ee27da8a577a4a79ff31be8cb55ed6444c4e
2020-12-15 17:48:03 -08:00
Amit Manchanda
724325b02f add: support for quarter to, quarter past and half in HI (#423)
Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/423

Reviewed By: girifb

Differential Revision: D25573001

Pulled By: chessai

fbshipit-source-id: 5474f108e968bdfb53ebc2518b46f28befdeba89
2020-12-15 17:02:28 -08:00
Amr Keleg
703ff13210 Add a new Arabic locale (EG) (#554)
Summary:
Egyptian Arabic is a dialect of Arabic that is mostly a spoken language that is used in everyday communications.
This PR adds new locale to Arabic to support the differences between Modern Standard Arabic (MSA) and Egyptian Arabic (EG).
I have mainly depended on the different locales of Spanish that are supported by Duckling to create the new Egyptian Arabic locale.
New modifications are added to the `Numeral` dimension since I didn't spot differences in other dimensions.

Pull Request resolved: https://github.com/facebook/duckling/pull/554

Reviewed By: patapizza

Differential Revision: D25543502

Pulled By: chessai

fbshipit-source-id: 4cbb7be78a52071c8681380077f0b4dc033a60de
2020-12-15 11:33:40 -08:00
Daniel Cartwright
181037e469 Support abbreviation of Crore and Lakh
Summary:
Crore (1e7) and Lakh (1e5) are both commonly used to describe an amount of Indian currency. Common abbreviations are "Cr" (Crore) and "lkh", "L", "lac" (lakh).

Additionally, common spellings of "crore" include "karor" and "koti"

Reviewed By: patapizza

Differential Revision: D25550546

fbshipit-source-id: 0c1479d9027431cb0d1182b5117eabca6f939cb2
2020-12-15 11:18:05 -08:00
moozzyk
c33249b4dd Fix typo in PL Duration Rules (#426)
Summary:
'miej' in Polish is the imperative form of the verb 'mieć' (to have). "mniej więcej" means "more or less" and it was the intention here.

Pull Request resolved: https://github.com/facebook/duckling/pull/426

Reviewed By: patapizza, girifb

Differential Revision: D25546380

Pulled By: chessai

fbshipit-source-id: 1047b83109cab917f1f4dbe87b667f8ccd2fb92d
2020-12-14 16:32:05 -08:00
Hernan Barijhoff
f053b14676 ES/Ordinal: Fixes "tercero" pattern regex (#477)
Summary:
Missing "tercer" regex in rule

Pull Request resolved: https://github.com/facebook/duckling/pull/477

Reviewed By: patapizza

Differential Revision: D24934794

Pulled By: chessai

fbshipit-source-id: a51f6fe3187749885784bfaacfee09cf26a8df6d
2020-11-19 13:48:43 -08:00
Christoph Flick
d0a6f8114c Improve german time approximation (#435)
Summary:
Improves the recognition of German time approximation language and removes a single error in the rule of <time-of-day> approximately.

Pull Request resolved: https://github.com/facebook/duckling/pull/435

Reviewed By: patapizza

Differential Revision: D24934281

Pulled By: chessai

fbshipit-source-id: 641bcb6a7e5c26e66c735fe13bccae9b7a8909ae
2020-11-19 13:48:42 -08:00
Sajjad Heydari
700118644c FA Setup (#520)
Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/520

Reviewed By: patapizza

Differential Revision: D25072459

Pulled By: chessai

fbshipit-source-id: 5db72eda36fe166a452b2345cab75fb1508b192b
2020-11-19 12:20:00 -08:00
Harisankar H
11595b7377 Support for more Hindi numbers (#552)
Summary:
Add support for additional Hindi numbers like 300, 81, 150, 1000, 1520. These are not supported in the current master version.

Pull Request resolved: https://github.com/facebook/duckling/pull/552

Reviewed By: ashwinp-fb, girifb

Differential Revision: D25072230

Pulled By: chessai

fbshipit-source-id: 35277a2349384bcf44a20e74852113f5c010e618
2020-11-18 17:04:29 -08:00
Daniel Cartwright
58cf66589f make duckling time not treat 0:xx and 12:xx ambiguously
Reviewed By: haoxuany

Differential Revision: D24929661

fbshipit-source-id: 3858d14ef1655f079daa33d2b159e8cb918a70ac
2020-11-12 14:19:04 -08:00
chessai
cdeefe1d4d ghc88x compat (#550)
Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/550

Reviewed By: haoxuany

Differential Revision: D24844625

Pulled By: chessai

fbshipit-source-id: 52dcf5f9488386f7f407535e876bff1207823fe0
2020-11-12 13:47:46 -08:00
Dmitri Osipov
e7264b55c9 adds frequent durations in German (#509)
Summary:
Found a lacking frequent duration in German and a small typo in the existing one.

Pull Request resolved: https://github.com/facebook/duckling/pull/509

Reviewed By: patapizza

Differential Revision: D24690104

Pulled By: chessai

fbshipit-source-id: b49a7a636abf5b92f2fe7c0d5b2ca2fe64acbaa2
2020-11-09 11:18:35 -08:00
Daniel Cartwright
eb043d7018 Quantity rules for Spanish (ES)
Summary:
Spanish (ES) will now have all the same quantity rules as English (EN) (which I think is the most-supported language), plus more.

This includes the following:
* bowls - (bol(es)?|tazón(es)?|cuencos?|platos? (soperos?)|(hondos?)) (EN does not currently have this)
* cups - (tazas?)
* dishes - (platos?|fuentes?) (EN does not currently have this)
* grams - (((m(ili)?)|(k(ilo)?))?g(ramo)?s?)
* ounces - ((onzas?)|oz)
* pints - (pintas?) (EN does not currently have this)
* pounds - ((lb|libra)s?)
* quarts - (cuartos? de galón) (EN does not currently have this)
* tablespoons - (cucharadas? (grande)?) (EN does not currently have this)
* teaspoons - (cucharaditas?) (EN does not currently have this)

Reviewed By: patapizza

Differential Revision: D24628214

fbshipit-source-id: 2e8d500661f30fa0928cb7d3f21470afc01e2285
2020-11-09 11:18:35 -08:00
Victor Pothin
bfc75849d2 Adds new rules of accentuation of the Portuguese (#531)
Summary:
Keeps accents consistent, "quinquagésimo" there is no more "Ü".

Pull Request resolved: https://github.com/facebook/duckling/pull/531

Reviewed By: patapizza

Differential Revision: D23770703

Pulled By: chessai

fbshipit-source-id: f8a34c02028faf9f51eca6a016b5bad988a83f04
2020-11-02 12:17:57 -08:00
Josef Svenningsson
7889f396f3 Remove dependency on Data.Some (#533)
Summary:
Pull Request resolved: https://github.com/facebook/duckling/pull/533

In recent versions of Data.Some the name of the constructor, `This` has changed name to `Some`. This has become rather problematic for us to migrate so we're just going to remove the dependency. The meat of this diff is adding the type `Seal` to `Duckling.Types`. That type replaces `Some`.

Reviewed By: pepeiborra

Differential Revision: D23929459

fbshipit-source-id: 8ff4146ecba4f1119a17899961b2d877547f6e4f
2020-09-28 01:33:01 -07:00
Julien Odent
7ba9ea8aeb Time/EN: Fix empty group match
Summary: sad_palpatine

Differential Revision: D23718913

fbshipit-source-id: 363bf9a43d8d1cd77405882bc70a7fa1a1de2dbe
2020-09-15 17:22:00 -07:00
Julien Odent
ef2b1b1b0e Time/FR: Some speed up
Summary: Guarding against grains, shortening regexes.

Reviewed By: jtliao

Differential Revision: D23387716

fbshipit-source-id: de84d0efa79c4ae10bd9fbf14e82a724fee1a1f2
2020-08-28 09:48:15 -07:00
Arjan Scherpenisse
df2ada617a NL/Duration: Add "anderhalf uur" (#502)
Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/502

Reviewed By: patapizza

Differential Revision: D22260625

Pulled By: haoxuany

fbshipit-source-id: bf44fdab7def19f6dd0e0ef7763c112a3b024396
2020-08-05 15:34:05 -07:00
Julien Odent
3d5e1c3bad Time/DE: Don't parse "so"
Summary:
"so" is an adverb in German: https://github.com/wit-ai/wit/issues/1860
It's also a short form for "Sonntag" (Sunday); making the dot mandatory.

Reviewed By: haoxuany

Differential Revision: D22900791

fbshipit-source-id: 8dc873f79a21ca2add074f9c664e84fae56f1e67
2020-08-03 12:34:49 -07:00
Bing Yuan
a88e0669f7 Fixed the rule for parsing "coming <time cycle>"
Summary: Currently the term "coming" is being treated the same way as "this" or "current". The expected treatment should be the same as the term "next".

Reviewed By: chinmay87

Differential Revision: D22435156

fbshipit-source-id: b0b20d8a38014267fb7d037b685ce126f602bda7
2020-07-17 13:17:18 -07:00
Bing Yuan
5af4d617ba Fixed a problem in parsing mult-word timestamp for ES
Summary:
Current:
"seis cero cinco pm" [dimension Time] -> "cero cinco pm" or "5 pm"
here the term "seis" was dropped because it was treated as "6" in "Numeral" dimension.

Expected:
"seis cero cinco pm" -> "6:05 pm"

The root cause was that the rule "<hour-of-day> <integer> (as relative minutes)" dropped the first term "hour-of-day" if it was parsed as a latent token.

Reviewed By: chinmay87

Differential Revision: D22553028

fbshipit-source-id: abc92bb369c23d2b3084641eab2a2dabb87dbc66
2020-07-17 11:38:43 -07:00
Bing Yuan
780bd0aac5 Fixed the problem parsing "next <day-of-week>"
Summary:
If the current time is: 07/07/2020 (tuesday),
Current:
"next saturday" -> 07/11/2020
Expected:
"next saturday" -> 07/18/2020

According to
Quora (https://www.quora.com/When-is-this-Monday-and-next-Monday-Are-they-the-same#:~:text='Next%20Monday'%20is%20Monday%20of,the%20first%20Monday%20after%20today.),

the term "next saturday" means the first saturday in the week after current (this) week, regardless the current day of week.

Reviewed By: haoxuany

Differential Revision: D22420499

fbshipit-source-id: c2bd28b9fda78ff3cb0418a50c3b302be350b02d
2020-07-15 14:47:41 -07:00
Bing Yuan
9c1ab0de69 Tweak the rule for parsing "tomorrow" in ES
Summary:
There are two rules for parsing "manana" (dimension: Time): one is resolved to "morning"; while the other is resolved to "tomorrow". And the first (or "morning") rule resolves to a LATENT result; while the second (or "tomorrow") rule resolves to a NON-LATENT result.

If the duckling is called with "latent" option turned off, the "tomorrow" rule prevails. However, if the duckling is invoked with "latent" option turned on, the "morning" rule is preferred.

The solution (for now) is to steer the classifier towards "tomorrow" rule by adding large number of (same) examples for "tomorrow" rule.

Reviewed By: chinmay87

Differential Revision: D22425277

fbshipit-source-id: 2f139eec0c38b9b5227f27d9f09f6264e7cf86cd
2020-07-15 12:08:20 -07:00
Bing Yuan
82e976b77d Added support for parsing year composed of multiple ES words
Summary:
The root cause is this lacking of support for the composition of numerals in ES.

For example, "mil novecientos noventa" is parsed 3 individual numbers: 1000, 900 and 90 correspondingly. Instead, the expected result is a single numeral value that is the sum of aforementioned three numbers. The same expection can be extended to the composition with arbitrary number of numeral values.

Reviewed By: chinmay87

Differential Revision: D22192034

fbshipit-source-id: 476489145b83297b82d88f3451020c867e2d08aa
2020-07-06 17:02:59 -07:00
Bing Yuan
857aa16d06 added support to parse oridinal day-of-week
Summary:
Current:
"first monday of last month" -> the date of first monday starting from current time. Note here the term "last month" is dropped

Expected:
"first monday of last month" -> the date of first monday of previous month.

Reviewed By: chinmay87

Differential Revision: D22300243

fbshipit-source-id: 16622860c52ec2ce9c7a7bcd6094192255aa5a0b
2020-07-06 15:39:57 -07:00