Commit Graph

23 Commits

Author SHA1 Message Date
Chinmay Deshmukh
5ac990bbe2 Return latent entities
Summary: Add an option to return latent time entities. This can be used when one is pretty certain that the input contains a datetime.

Reviewed By: patapizza

Differential Revision: D7254245

fbshipit-source-id: e9e0503cace2691804056fcebdc18fd9090fb181
2018-03-19 14:45:27 -07:00
Bora Tunca
002dec5614 Support lowercase time zones
Summary: Support lowercase time zones

Reviewed By: patapizza

Differential Revision: D7114137

fbshipit-source-id: 906c4fa5305aea8990e8a5d480ceeec4b584d7db
2018-03-01 14:30:29 -08:00
Michael Azzam
8fbd6a0414 Consolidate days of week and months
Summary: Refactor days of week and months rules to avoid mindless copy-pasta.

Reviewed By: patapizza

Differential Revision: D6438622

fbshipit-source-id: 489261b53ba50f6624996ad4581c2bf1206a8cc1
2017-11-29 17:00:37 -08:00
Julien Odent
fb10a6e6ba Time/EN: Parse spelled out times + AM/PM
Summary:
When using speech recognition, we might see things like "six thirty six a m" or
"ten thirty p m".
Also fixed the argument order of `timeOfDayAMPM` to be more idiomatic.

Reviewed By: blandinw

Differential Revision: D6316542

fbshipit-source-id: 0008c049040219b3a1dd80d9e4661ba8a246fa7f
2017-11-14 13:30:26 -08:00
Julien Odent
305358b2f7 Time/PT: don't parse 'ter'
Summary:
`ter` also means `to have`, which is very common.
`ter` is a very rare form for `Tuesday`, only used in calendar-like contexts

Reviewed By: blandinw

Differential Revision: D6066506

fbshipit-source-id: c8cd231
2017-10-16 12:04:25 -07:00
Julien Odent
ab0ad0256e Locales support
Summary:
* Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33)
* `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations
* API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility.
*  `Rules/<Lang>.hs` exposes
  - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs`
  - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs`
  - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility
* Corpus, tests & classifiers
  - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible)
  - Default classifiers are built on existing corpus
  - Locale classifiers are built on
  - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>`
  - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`).
  - Locale classifiers use the language corpus extended with the locale examples as training set.
  - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus.
  - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`.
  - Tests run against each classifier to make sure runtime works as expected.
* MM/DD (en_US) vs DD/MM (en_GB) example to illustrate

Reviewed By: JonCoens, blandinw

Differential Revision: D6038096

fbshipit-source-id: f29c28d
2017-10-13 08:34:21 -07:00
Julien Odent
b3fb913a23 Time: don't parse subsequent numbers
Summary:
- Fixes #89. "<number> <number>" would parse as Time if in the right range.
- Applied same rule for all languages. Note that for Italian and Polish, I updated "<hour> <minute>" tests to be in the form "at <hour> <minute>".
- Replaced `liftM2` with more generic `and|or . sequence [f1, f2, ...]`.

Reviewed By: blandinw

Differential Revision: D5992879

fbshipit-source-id: 5409ffb
2017-10-06 12:19:29 -07:00
Julien Odent
83ea150d94 Convert back escaped characters in rules
Summary:
We noticed that using UTF-8 characters directly in regexes work.
Hence converting back the escaped characters for readability and maintenance.

Reviewed By: blandinw

Differential Revision: D5787146

fbshipit-source-id: e5a4b9a
2017-09-07 12:49:33 -07:00
Julien Odent
9f856cec48 Time/PT: don't parse 'um' alone
Summary: In Portuguese, "um" means the numeral "one" and the article "a".

Reviewed By: bfiss

Differential Revision: D5703396

fbshipit-source-id: 92ed04f
2017-08-24 21:49:36 -07:00
Atyansh Jaiswal
4e96a15c15 Refactored weekend rules to use the weekend helper for all languages
Summary: This is a simple refactor that uses the weekend helper for all languages

Reviewed By: patapizza

Differential Revision: D5677330

fbshipit-source-id: 9984539
2017-08-22 10:34:24 -07:00
Bartosz Nitka
cdd2f1c9cb Don't produce empty tokens from interval
Summary:
This is analogous to
[Duckling] Don't produce trivially empty Tokens
but that change did that for intersect, this one
deals with interval.

Reviewed By: patapizza

Differential Revision: D5039215

fbshipit-source-id: 95bd821
2017-05-10 16:04:22 -07:00
Bartosz Nitka
878f85b9e1 Codemod intersectMB to intersect
Summary:
`intersectMB` was a name used for the purpose of migrating.
This is the last part of the migration.

Reviewed By: patapizza

Differential Revision: D4906098

fbshipit-source-id: a70af78
2017-04-18 10:19:20 -07:00
Bartosz Nitka
fe39a55a4c Use intervalMB instead of interval
Summary:
This continues the work from:
"[Duckling] Don't produce trivially empty Tokens"
All the Rules should use intervalMB from now on.

Reviewed By: patapizza

Differential Revision: D4906072

fbshipit-source-id: 277b961
2017-04-18 10:19:20 -07:00
Bartosz Nitka
3d18cf5ea9 Don't produce trivially empty Tokens
Summary:
We can detect certain kinds of contradictions sooner,
producing a token with an unresolvable Predicate is wasteful.
For a text like:
```
"Demain apres midi 14h 15 h 16h vendredi 14 a 15h"
```
it could produce 7000 tokens with empty predicates.
After this change it produces none and we get a 4x improvement in
time and 6x improvement in allocations.

Note I only covered `ruleIntersect*` here. I need to do this for
other instances as well.

Reviewed By: JonCoens

Differential Revision: D4871078

fbshipit-source-id: 9f0e7ad
2017-04-11 16:35:05 -07:00
Bartosz Nitka
1cf8496967 tt helper for returning Time Tokens
Summary:
This is a very common pattern (>1k occurrences).
Replacing it with something shorter makes the rules a bit less
boilerplate-y.
Feel free to bikeshed the name, I can easily redo the codemod.

Reviewed By: patapizza

Differential Revision: D4848864

fbshipit-source-id: 7baeee3
2017-04-10 12:34:43 -07:00
Bartosz Nitka
f46539ced2 Type for Closed/Open intervals
Summary:
This makes the code easier to read.
I'm not attached to naming, but this is
standard terminology from topology.

Reviewed By: JonCoens, patapizza

Differential Revision: D4848740

fbshipit-source-id: 79c2c20
2017-04-07 12:19:17 -07:00
Bartosz Nitka
290ca48e25 Fix 4:23am returning 5:23am
Summary:
This is the easiest way to fix it, but talking offline
with Julien, we may need to revisit.
It basically gets rid of time series where we were
producing intervals that are not a multiply of the grain.

Reviewed By: patapizza

Differential Revision: D4841759

fbshipit-source-id: 1c4742a
2017-04-06 11:04:16 -07:00
Bartosz Nitka
bd94622f64 Move tests to tests and exes to exe
Summary:
This works around https://github.com/haskell/cabal/issues/4350
If we don't do this files get compiled multiple times
and cabal is unhappy.

Reviewed By: patapizza

Differential Revision: D4782749

fbshipit-source-id: 5bbe425
2017-03-27 16:04:24 -07:00
Julien Odent
54c9448fba Rename Number to Numeral
Summary: For consistency with the dimension name.

Reviewed By: JonCoens

Differential Revision: D4722216

fbshipit-source-id: 82c56d3
2017-03-16 13:49:16 -07:00
Julien Odent
33fa98734a Fix 'no dia 20'
Summary:
* 'no dia 20' (on the 20)
* Unifying two rules into one, with a day grain

See https://github.com/wit-ai/wit/issues/388

Reviewed By: blandinw

Differential Revision: D4715780

fbshipit-source-id: e990954
2017-03-15 13:49:17 -07:00
Jonathan Coens
41800a3171 Move onto dependent-sum instead of custom local data Some
Summary:
No need to reinvent the wheel when `dependent-sum` has what we need. I re-export `Some(..)` from `Duckling.Dimensions.Types` to cut down on import bloat.
Instead of a `Read` instance I created a `fromName` function.

Reviewed By: zilberstein

Differential Revision: D4710014

fbshipit-source-id: 1d4e86d
2017-03-15 10:34:17 -07:00
Bartosz Nitka
28d53fce30 Remove ruleIntersect2
Summary:
It is no longer necessary after D4676812 and D4698788.
`"I have 9 am 12 pm 1 pm 2pm 4 pm 3 pm on Saturday"` now works in
less than a second, it used to be 10s.

The test suite also got 3s faster.

Reviewed By: patapizza

Differential Revision: D4701890

fbshipit-source-id: 107a55f
2017-03-14 05:04:12 -07:00
FBShipIt
3f8e52e70a Initial commit
fbshipit-source-id: 301a10f448e9623aa1c953544f42de562909e192
2017-03-08 10:33:56 -08:00