Commit Graph

329 Commits

Author SHA1 Message Date
andhai
9509e042dc DE-Numeral-complex-German-numerals (#699)
Summary:
The commit adds a rule and an underlying parser for German numeral expressions representing (integer) numbers smaller than 1 million. Other than in English, those numbers are represented by single words, e.g. "neunhundertsiebenundachtzigtausendsechshundertvierundfünfzig" (987654). Other rukes are simplified or removed to eliminate redundancies.

Pull Request resolved: https://github.com/facebook/duckling/pull/699

Reviewed By: patapizza

Differential Revision: D37716120

Pulled By: stroxler

fbshipit-source-id: 90b26e253259c5bc1aaa76f3972537c2361f6bb3
2022-07-14 16:47:23 -07:00
Julien Odent
03c6197283 Time: Don't accept negative years
Summary:
In theory, years can be negative. In practice, this is rarely seen and when it is, it's usually with the B.C. postfix.
Updating `ruleYearLatent` to not accept negative years for all languages.

Reviewed By: codyohl

Differential Revision: D33895871

fbshipit-source-id: 818000104da825aab91be7fa2a72704aa350a91a
2022-02-02 05:34:31 -08:00
Alex Kapranoff
30b3d29e86 (Time/TimeGrain/Ordinal)/RU_XX: several extra Time forms for Russian
Summary:
Some changes were originally suggested by me during the review of
https://github.com/facebook/duckling/pull/474.
Others are new.

1. "Day after tomorrow/before yesterday"
2. Ordinals in the form of number+suffix like "8th of March"
3. Tuesdays require a special preposition.
4. Support "Yo" (U+0451) in "fourth" and "during daylight".
5. Support special perposition for "next week".
6. Support "one before last" adjective for time grains.
7. Proper suffixes for "quarter" grain.
8. Support "at midnight".
9. Support alternative flag for "afternoon".

Changes in Ordinal and TimeGrain are all driven by the new examples in the
corpus for Time.

There are also a couple of bugfixes:
1. A hidden latin "e" was present in an otherwise Cyrillic regex.
2. Wrong order of options in a regex separated with "|" prevented some matches.

Reviewed By: haoxuany

Differential Revision: D32311714

fbshipit-source-id: 084f6c3893eb5bfd767c267f558b910c6854eb59
2021-11-30 09:49:43 -08:00
Lautaro Emanuel
e1641aeba4 Added missing 'TR' time test cases (#661)
Summary:
Fixes https://github.com/facebook/duckling/issues/660.

Pull Request resolved: https://github.com/facebook/duckling/pull/661

Test Plan: :test Duckling.Time.TR.Tests

Reviewed By: stroxler

Differential Revision: D32145794

Pulled By: chessai

fbshipit-source-id: 4d55043f133b8238e9e8360264a3fbea6af2d022
2021-11-10 10:34:28 -08:00
Ovidiu Nistor
dd70d80dc1 Add Japanese time dimension (#646)
Summary:
Add the most common rules for Japanese time dimension.

Pull Request resolved: https://github.com/facebook/duckling/pull/646

Reviewed By: stroxler

Differential Revision: D30675005

Pulled By: chessai

fbshipit-source-id: 917aa98b5cfe0c73d207b1f51b80d8e17a1c7e6a
2021-09-21 12:28:06 -07:00
Daniel Cartwright
72f45e8e2c Recognise hh:mm:ss as not an interval
Summary: An interval regex was overzealous and matching too much, so `hh:mm:ss` was getting parsed as an interval instead of a time.

Reviewed By: patapizza

Differential Revision: D30608223

fbshipit-source-id: b24c18146070f15ada80b9401e67f0c0aefef7d8
2021-08-27 12:23:43 -07:00
Filipe Pereira
d8888e2ff8 Ca time improvements (#639)
Summary:
Some time recognition improvements for Catalan:
- morning should be a time range recognised until noon
- "dema" can also be used for tomorrow (besides "demà")
- "se" alone should not be understood as September

Pull Request resolved: https://github.com/facebook/duckling/pull/639

Reviewed By: stroxler

Differential Revision: D30312076

Pulled By: chessai

fbshipit-source-id: 1a42bbd7eecc4f5690145ee9cadb8eccae8edd08
2021-08-16 10:46:40 -07:00
Filipe Pereira
57dab83ad3 PT time improvements 2 (#636)
Summary:
Fixed rules for PT time expressions like "amanhã à noite", "dia 17", "dia 15 às 18"

Pull Request resolved: https://github.com/facebook/duckling/pull/636

Reviewed By: stroxler

Differential Revision: D30138416

Pulled By: chessai

fbshipit-source-id: 5265d44e7ddce5eee8cd7266df9254389a10b139
2021-08-05 13:47:41 -07:00
Filipe Pereira
fc7950a68f ES time improvements (#634)
Summary:
New rules for ES time expressions like "3 Marzo", "Marzo 3.

Pull Request resolved: https://github.com/facebook/duckling/pull/634

Reviewed By: girifb

Differential Revision: D30110631

Pulled By: chessai

fbshipit-source-id: e6add868535522d243ccf1dab2443e6cd3f7f8b2
2021-08-05 10:48:47 -07:00
Filipe Pereira
a6499228af FR time improvement (#635)
Summary:
Fixed recognition for month "Juil" (abbreviation of "Juillet")

Pull Request resolved: https://github.com/facebook/duckling/pull/635

Reviewed By: stroxler

Differential Revision: D30115291

Pulled By: chessai

fbshipit-source-id: e04d6e7952f85f4ca061540a3967908bcd4f1ebd
2021-08-04 17:31:08 -07:00
Filipe Pereira
fe4f77bdc0 PT time improvements (#633)
Summary:
New rules for PT time expressions like "5 Maio", "Maio 5", "5 Maio 2022".

Pull Request resolved: https://github.com/facebook/duckling/pull/633

Reviewed By: stroxler

Differential Revision: D30114330

Pulled By: chessai

fbshipit-source-id: f56418d95efa1d7488957b8b8083daec3193949b
2021-08-04 17:31:07 -07:00
Amr Keleg
79ac8f63f9 Add isArabic rule (#577)
Summary:
Fixes https://github.com/facebook/duckling/issues/437, fixes https://github.com/facebook/duckling/issues/571

Pull Request resolved: https://github.com/facebook/duckling/pull/577

Reviewed By: stroxler

Differential Revision: D29664126

Pulled By: chessai

fbshipit-source-id: b6365699231527b0869322c798e32a21328f1071
2021-07-12 13:37:23 -07:00
Daniel Cartwright
ed291c2a3a ES (Spanish) Time - add rule for 'next <day-of-week>'
Summary: Resolves #623. Add rule for 'proximo <day-of-week>'

Reviewed By: stroxler

Differential Revision: D29002419

fbshipit-source-id: 7d5fb04b66fe068ae2906b63ede44009e80f1a3c
2021-06-09 20:33:12 -07:00
Damien Gallet
28e38679a7 Time.FR > add rule for years in twentieth centry (#357)
Summary:
In Time.FR, add support for birthdates like "15 juin 72"

Pull Request resolved: https://github.com/facebook/duckling/pull/357

Reviewed By: patapizza

Differential Revision: D26193322

Pulled By: chessai

fbshipit-source-id: d22efea81aad31af8baa2f7f9afdaf1a75c0dc10
2021-06-04 13:04:12 -07:00
Daniel Cartwright
8cb77a43c7 Add custom isRangeValid implementation for ZH
Summary: Fixes #313

Reviewed By: stroxler

Differential Revision: D28364035

fbshipit-source-id: 7fe3dba75410d217747a0d7a6f7df611ac26ec70
2021-06-04 12:48:32 -07:00
tuantvk
25b39f4a8b Time/VI: update rule Sunday (#611)
Summary:
In Vietnam, sunday is "chủ nhật" or "Chúa nhật (Catholic)".

Pull Request resolved: https://github.com/facebook/duckling/pull/611

Reviewed By: haoxuany

Differential Revision: D28399277

Pulled By: chessai

fbshipit-source-id: 26aa7c76cf1f8b8c2ba32e049f7f470a140e3d92
2021-06-04 12:18:27 -07:00
leandro.guisandez@pgconocimiento.com
5d8d99bbf4 Init
Summary: Initialise Time for CA (Catalan) language

Reviewed By: stroxler

Differential Revision: D28455273

Pulled By: chessai

fbshipit-source-id: be9a4d61692ba4fb32986e161e9fdd6d25a357dc
2021-05-18 13:50:19 -07:00
Steven Troxler
81ab073acf Move Candidate to Ranking/Types.hs
Summary:
In my opinion putting `Candidate` into the core `Types.hs`
is a mistake - it's used exclusively in the ranking stage, so cluttering
the core tokenizing and recursive parsing / value resolution logic in
`Duckling.Types` with this irrelevant datatype makes things less clear
than if we keep it in the `Ranking` modules.

Reviewed By: chessai

Differential Revision: D28462902

fbshipit-source-id: cd4bb88c4a16945265e8f21c8808b06ae3383559
2021-05-18 11:50:17 -07:00
Daniel Cartwright
13513d30a5 Regenerate classifiers
Summary: Some classifiers were a bit out of date. They needed regenerating.

Reviewed By: girifb

Differential Revision: D28399234

fbshipit-source-id: 2780dbe5478a5386a2b6062dec8696736b3ce723
2021-05-13 14:02:35 -07:00
Steven Troxler
d6587dafbb Fix excessive-free-point-style lint errors on Rank.hs
Summary: Just replace `.` with `$`, also tweaked the spacing a bit for skimmability

Reviewed By: chessai

Differential Revision: D28411898

fbshipit-source-id: d18b9ef5db99b82d150231080c89f812f709f409
2021-05-13 11:18:09 -07:00
Steven Troxler
3eafced0fa Get rid of name clash warnings in Extraction.hs
Summary: Use targeted imports to avoid clash on `node` variable name

Reviewed By: chessai

Differential Revision: D28411902

fbshipit-source-id: 4a81e35a6aa601015685ccab3f571e919e9025c8
2021-05-13 11:03:19 -07:00
chessai
ccdf27ad1d FR: add nth <time> of <time> rules (#596)
Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/596

Reviewed By: stroxler

Differential Revision: D27722743

Pulled By: chessai

fbshipit-source-id: a9136fef2a26e87269bca8212ae07d3d7fe04977
2021-05-11 11:32:13 -07:00
Steven Troxler
0e13d28b4d Time/EN: Get rid of unnecessary rules
Summary:
While I was working on fixing #604, I came across the rules
`ruleMilitarySpelledOutAMPM(2)`, which were actually capturing
some of my test phrases and confusing me.

This commit removes them because
- they aren't needed: the existing latent spelled-out hour + minute rules plus
  the "(in the )?(am/pm)" rules together give the same behavior
- they are confusingly named - these aren't military times at all, they are
  spelled-out civilian times

Reviewed By: haoxuany

Differential Revision: D27848485

fbshipit-source-id: ba1ed16ec22b5139b0b500b44dc91adb1b5e3d82
2021-04-26 06:17:44 -07:00
Steven Troxler
c44c73fe04 Numeral/ES: Add support for additive concatenations
Summary:
This commit extend Spanish-language support for concatenations
of the form "<higher-order-of-magnitude> <lower>", e.g.
"doscientos tres" (203) or "cuatro mil ventiuno" (4022) to work
not just for hundreds but also for thousands and millions.

Reviewed By: chessai

Differential Revision: D27858133

fbshipit-source-id: 5c6b227ae7dad9009cd636e7ea49c209480c931a
2021-04-23 09:48:07 -07:00
Steven Troxler
888da76215 Numeral/ES: Add support for 1M, and multiples of 1K/1M
Summary:
This commit adds two things to Spanish numeral support:
- support for millions
- support, via hooking into the `isMultipliable` logic used by EN, for
  composing counts of 2-999 with either "mil" or "millones", which is
  the standard way to say things like "tres mil" = 3000

Reviewed By: chessai

Differential Revision: D27858135

fbshipit-source-id: 980e95bd989f818c5ceaa2bb6c87fe81d3e08366
2021-04-23 09:48:06 -07:00
Steven Troxler
15bba9eba9 Numeral/ES: Refactor hundreds handling to fix bug
Summary:
This diff refactors our handling of "<hundreds> 0..99" numbers
to be more flexible by replacing `ruleNumeralthreePartHundreds`
with
- a rule for two-part hundreds like "dos cientos" (which is technically
  incorrect grammar - doscientos is correct - but probably worth keeping) based
  on a notion of multipliability like that used in EN rules
- a rule stating that we can compose hundreds with 0..99 additively

The resulting rules are more flexible, and they correctly parse not only
gramatically iffy phrases like "dos cientos tres", but also grammatically
correct phrases like "doscientos tres". This fixes #380.

Reviewed By: chessai

Differential Revision: D27858136

fbshipit-source-id: 4a918d84d93ac074f83f6947a8f80cfd11145115
2021-04-23 09:48:06 -07:00
Steven Troxler
9bd4c9b7fb Time/EN: Allow latent match for <part-of-day> <latent-time-of-day>
Summary:
This fixes #592 in a very conservative way: the reason why `ruleIntersect` does
not detect "tonight 815" and "tonight eight fifteen" as it does "tonight 8:15"
is because it explicitly forbids the second part of the intersection from being
latent, unless it is a year.

I don't think it's a good idea to remove the restriction on latent inputs in
`ruleIntersect`, so instead I just made a new rule specifically for the
intersection of `<part-of-day> <time-of-day>`.

It also seems to me that there's a lot of room for this to be too aggressive,
for example if I say "tonight 500 people will laugh" the "tonight" and "500"
aren't really linked. So, I set the rule to be latent; this may be too conservative
to be useful though (do client libraries usually allow latent results?).

Reviewed By: chessai

Differential Revision: D27842596

fbshipit-source-id: 36ac59e31c632d4864241bce291147a46d52f780
2021-04-19 13:05:50 -07:00
leandro.guisandez@pgconocimiento.com
7907812184 Initialise Catalan language with Numeral
Summary: Adds Catalan language and Numeral rules for it

Reviewed By: haoxuany

Differential Revision: D26518604

Pulled By: chessai

fbshipit-source-id: e6b4b0ceb9b7931d086c732dd03fb5cbbe062d5b
2021-04-08 14:47:02 -07:00
Mustafa ALP
3157d2e553 Time Dimension for TR locale (#584)
Summary:
Added time dimension for Turkish language

Pull Request resolved: https://github.com/facebook/duckling/pull/584

Differential Revision: D27235743

Pulled By: chessai

fbshipit-source-id: 7419ff7373d942530f0eb35939acb9970b918672
2021-04-06 10:32:18 -07:00
Steven Troxler
55168db92f Update classifiers
Summary:
I was testing an unrelated change (which doesn't change
classifier scores) and reran classifiers just to be safe, I noticed
that the scores changed.

This diff updates them.

Reviewed By: chessai

Differential Revision: D26892970

fbshipit-source-id: c7da3e3b7d01955f98b287a3ff4e7c1ff2837c7f
2021-03-08 14:02:45 -08:00
Aleksey Landyrev
590651150b Add Time dimension for RU language
Summary: Used b40e2147a9 as reference

Reviewed By: kappa

Differential Revision: D24773196

Pulled By: chessai

fbshipit-source-id: 7cc008c0ee80f930efd76e39bb16ca91ec94b641
2021-02-12 12:02:44 -08:00
Maurice Döpke
75af12524f adds german time rule for expressions like: Montag in 3 Wochen (#332)
Summary:
closes https://github.com/facebook/duckling/issues/331

Pull Request resolved: https://github.com/facebook/duckling/pull/332

Reviewed By: girifb

Differential Revision: D26283481

Pulled By: chessai

fbshipit-source-id: 054c6467a69896ff3ebbd1f9bc0734aadf1b6dbe
2021-02-09 14:33:37 -08:00
Maurice Döpke
998b13bceb Adds german times rules like "Übernächste Woche" (week after next) (#330)
Summary:
fixes https://github.com/facebook/duckling/issues/329  and allows for recognizing of terms like übernächste woche

Pull Request resolved: https://github.com/facebook/duckling/pull/330

Reviewed By: girifb

Differential Revision: D26284196

Pulled By: chessai

fbshipit-source-id: 160e73668b835c83adb0fd1c396a8a2977e86516
2021-02-09 10:48:32 -08:00
kcnhk1@gmail.com
3f2f307735 Time - add more common expressions
Summary:
Added:
last <duration>
<time> <day-of-month>

Reviewed By: haoxuany

Differential Revision: D26263977

Pulled By: chessai

fbshipit-source-id: b00ece753593a7fabe45bbaa9e1f013860e38d80
2021-02-04 16:32:11 -08:00
Daniel Cartwright
33f0c17ee2 implement 'the day after tomorrow' in Romanian
Summary: adds a rule for 'the day after tomorrow' in Romanian. regenerates classifiers.

Reviewed By: girifb

Differential Revision: D26155042

fbshipit-source-id: 80005ab94a10f9fbf242c9a712bd040e4f6bc477
2021-01-29 14:49:13 -08:00
Nour Shalabi
6346cfe926 Add Arabic rule for a week ago (#379)
Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/379

Reviewed By: patapizza

Differential Revision: D26149123

Pulled By: chessai

fbshipit-source-id: 5f0bca88fc1b64da5d93fcf715996d58a972fda2
2021-01-29 11:32:32 -08:00
Arjan Scherpenisse
d095b05060 NL/Duration: Support composite durations (#503)
Summary:
E.g. "1 uur en drie kwartier", "1 dag 4 uur", etc.

Pull Request resolved: https://github.com/facebook/duckling/pull/503

Reviewed By: patapizza

Differential Revision: D22260615

Pulled By: chessai

fbshipit-source-id: 40689f7630b4d5bab498df730528ce6bf768fa89
2021-01-27 11:18:10 -08:00
kckckcng
a82684e723 Time&Duration/ZH: support Cantonese and more common expressions (#516-2) (#523)
Summary:
**2nd set of changes from pull request https://github.com/facebook/duckling/issues/516

Supporting Cantonese and more common expressions in Chinese.
Adding rules file for Duration/ZH.

Pull Request resolved: https://github.com/facebook/duckling/pull/523

Reviewed By: haoxuany

Differential Revision: D23428901

Pulled By: chessai

fbshipit-source-id: 6d04c97b63bac966eb61d77cab2f08f7543dbbf0
2021-01-26 15:17:45 -08:00
kckckcng
f2798021b6 Numeral/ZH: support more common expressions (#516-1) (#522)
Summary:
**1st set of changes from pull request https://github.com/facebook/duckling/issues/516

Supporting more common expressions, such as fraction, half, dozen, in Chinese.

Pull Request resolved: https://github.com/facebook/duckling/pull/522

Reviewed By: patapizza

Differential Revision: D23428893

Pulled By: chessai

fbshipit-source-id: 3454ac70a4bfff90dc282560916a0fae9969f521
2021-01-21 21:17:54 -08:00
Sam Coope
e9e5507820 Add ASAP, at the moment to EN time (#405)
Summary:
* "at the moment" is considered identical to "now".
* "ASAP" is considered identical to "from now"

Pull Request resolved: https://github.com/facebook/duckling/pull/405

Reviewed By: patapizza

Differential Revision: D26009483

Pulled By: chessai

fbshipit-source-id: addf4c509e69d413cae279601c64f72710eba11f
2021-01-21 20:47:40 -08:00
Wojtek Przechodzeń
10eee56f10 Time/PL - new rules (#538)
Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/538

Reviewed By: haoxuany

Differential Revision: D24640854

Pulled By: chessai

fbshipit-source-id: 51eb0d530b143511f79992a91ca8f465b7860b6e
2020-12-16 13:47:49 -08:00
chaitu9701
28cb5ebd2a Adding Numerical Dimention support for Telugu language (#470)
Summary:
This pull request is to add support for Telugu language (Numerical Dimension) to Duckling

Pull Request resolved: https://github.com/facebook/duckling/pull/470

Differential Revision: D25546700

Pulled By: chessai

fbshipit-source-id: 1d88ee27da8a577a4a79ff31be8cb55ed6444c4e
2020-12-15 17:48:03 -08:00
Christoph Flick
d0a6f8114c Improve german time approximation (#435)
Summary:
Improves the recognition of German time approximation language and removes a single error in the rule of <time-of-day> approximately.

Pull Request resolved: https://github.com/facebook/duckling/pull/435

Reviewed By: patapizza

Differential Revision: D24934281

Pulled By: chessai

fbshipit-source-id: 641bcb6a7e5c26e66c735fe13bccae9b7a8909ae
2020-11-19 13:48:42 -08:00
Sajjad Heydari
700118644c FA Setup (#520)
Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/520

Reviewed By: patapizza

Differential Revision: D25072459

Pulled By: chessai

fbshipit-source-id: 5db72eda36fe166a452b2345cab75fb1508b192b
2020-11-19 12:20:00 -08:00
Harisankar H
11595b7377 Support for more Hindi numbers (#552)
Summary:
Add support for additional Hindi numbers like 300, 81, 150, 1000, 1520. These are not supported in the current master version.

Pull Request resolved: https://github.com/facebook/duckling/pull/552

Reviewed By: ashwinp-fb, girifb

Differential Revision: D25072230

Pulled By: chessai

fbshipit-source-id: 35277a2349384bcf44a20e74852113f5c010e618
2020-11-18 17:04:29 -08:00
Dmitri Osipov
e7264b55c9 adds frequent durations in German (#509)
Summary:
Found a lacking frequent duration in German and a small typo in the existing one.

Pull Request resolved: https://github.com/facebook/duckling/pull/509

Reviewed By: patapizza

Differential Revision: D24690104

Pulled By: chessai

fbshipit-source-id: b49a7a636abf5b92f2fe7c0d5b2ca2fe64acbaa2
2020-11-09 11:18:35 -08:00
Josef Svenningsson
7889f396f3 Remove dependency on Data.Some (#533)
Summary:
Pull Request resolved: https://github.com/facebook/duckling/pull/533

In recent versions of Data.Some the name of the constructor, `This` has changed name to `Some`. This has become rather problematic for us to migrate so we're just going to remove the dependency. The meat of this diff is adding the type `Seal` to `Duckling.Types`. That type replaces `Some`.

Reviewed By: pepeiborra

Differential Revision: D23929459

fbshipit-source-id: 8ff4146ecba4f1119a17899961b2d877547f6e4f
2020-09-28 01:33:01 -07:00
Julien Odent
7ba9ea8aeb Time/EN: Fix empty group match
Summary: sad_palpatine

Differential Revision: D23718913

fbshipit-source-id: 363bf9a43d8d1cd77405882bc70a7fa1a1de2dbe
2020-09-15 17:22:00 -07:00
Julien Odent
ef2b1b1b0e Time/FR: Some speed up
Summary: Guarding against grains, shortening regexes.

Reviewed By: jtliao

Differential Revision: D23387716

fbshipit-source-id: de84d0efa79c4ae10bd9fbf14e82a724fee1a1f2
2020-08-28 09:48:15 -07:00
Bing Yuan
5af4d617ba Fixed a problem in parsing mult-word timestamp for ES
Summary:
Current:
"seis cero cinco pm" [dimension Time] -> "cero cinco pm" or "5 pm"
here the term "seis" was dropped because it was treated as "6" in "Numeral" dimension.

Expected:
"seis cero cinco pm" -> "6:05 pm"

The root cause was that the rule "<hour-of-day> <integer> (as relative minutes)" dropped the first term "hour-of-day" if it was parsed as a latent token.

Reviewed By: chinmay87

Differential Revision: D22553028

fbshipit-source-id: abc92bb369c23d2b3084641eab2a2dabb87dbc66
2020-07-17 11:38:43 -07:00