Commit Graph

95 Commits

Author SHA1 Message Date
Matt Schultz
ff9b54ad43 Added English fractional Numeral rule (ex: "3/4", "1/2", "5/7")
Summary:
Also added real-world test to English `Quantity` corpus ("3/4 cup", as a culinary example)
Closes https://github.com/facebookincubator/duckling/pull/14

Reviewed By: patapizza

Differential Revision: D5035990

Pulled By: niteria

fbshipit-source-id: c1b8f65
2017-05-10 07:04:16 -07:00
Matt Schultz
e0d48b08a3 Expanded english matching rules for miles and inches
Summary:
Adds canonical abbreviation matching rules for miles ("mi",) and inches (" and "in".)
Closes https://github.com/facebookincubator/duckling/pull/11

Differential Revision: D5035991

Pulled By: niteria

fbshipit-source-id: 3a0c423
2017-05-10 06:49:33 -07:00
Bartosz Nitka
9e88e95faa More instructions for pcre on macOS
Summary:
Turns out that `brew` can succeed installing `pcre`, but still
not install development headers in `/usr/local/include/`
if that path is not writable. `brew doctor` should find that
and related problems.

Relevant ticket: #8

Reviewed By: patapizza, JonCoens

Differential Revision: D5031503

fbshipit-source-id: ba0b8e8
2017-05-09 14:34:21 -07:00
Bartosz Nitka
9ee46831da license-file: vs license-files:
Summary: Doh

Reviewed By: JonCoens

Differential Revision: D5011285

fbshipit-source-id: 8e980eb
2017-05-05 10:34:22 -07:00
Bartosz Nitka
46191fdcb8 Fix license spec in the .cabal file
Reviewed By: JonCoens

Differential Revision: D5011092

fbshipit-source-id: 432fe48
2017-05-05 09:52:09 -07:00
Bartosz Nitka
3cc5d85ebd Add a travis script
Summary:
This adds a travis script, so that we get feedback when
we push or on pull requests.

It builds and runs tests.

We currently only test with GHC 8.0.2, other versions
are broken for reasons given in the script.

I relaxed the version of `time` in preparation for GHC 8.2.

It also adds an icon in the README.md

Reviewed By: JonCoens

Differential Revision: D5002255

fbshipit-source-id: 47ff3af
2017-05-04 09:19:16 -07:00
Bartosz Nitka
d171c547dd Remove redundant constraints in .cabal
Summary: See title.

Reviewed By: JonCoens

Differential Revision: D5002238

fbshipit-source-id: 0de239c
2017-05-04 09:19:16 -07:00
Bartosz Nitka
6327b614f9 Add instructions about PCRE for macOS
Summary: This should resolve #8 and #6.

Reviewed By: xich

Differential Revision: D5000220

fbshipit-source-id: b713931
2017-05-03 18:04:19 -07:00
Matteo
e11014dc4b Volume for IT lang
Summary:
I notice that there are several missing dimensions for the IT language: this patch is for the Volume dimension

Regards
Matteo
Closes https://github.com/facebookincubator/duckling/pull/4

Reviewed By: JonCoens

Differential Revision: D4986389

Pulled By: patapizza

fbshipit-source-id: 314d33e
2017-05-02 11:19:14 -07:00
Noon van der Silk
88639e8a56 fix default port in doc
Summary:
when you run as-is it it launches on 8000 not 8080
Closes https://github.com/facebookincubator/duckling/pull/1

Differential Revision: D4981822

Pulled By: patapizza

fbshipit-source-id: 3352a80
2017-05-02 07:34:25 -07:00
Julien Odent
e85d2f507c README nits
Summary:
* moved `AmountOfMoney` up for alphabetical order
* added example request for HTTP server example

Reviewed By: JonCoens

Differential Revision: D4978108

fbshipit-source-id: e98de49
2017-05-01 09:19:18 -07:00
Julien Odent
1c3ccf671f Remove redundant duration rules
Summary: This rule was already present (in `Duckling/Duration/Rules.hs`).

Reviewed By: niteria

Differential Revision: D4956233

fbshipit-source-id: 9e8ca64
2017-04-28 11:34:40 -07:00
Julien Odent
d3d3703015 HE: Time
Summary:
Time dimension for Hebrew.
Commented out the failing tests that actually also fail in Clojure.

Reviewed By: JonCoens

Differential Revision: D4970308

fbshipit-source-id: b455142
2017-04-28 10:04:35 -07:00
Julien Odent
ab2c89df4f IT: Temperature
Summary: Temperature dimension for Italian.

Reviewed By: JonCoens

Differential Revision: D4970338

fbshipit-source-id: 024802e
2017-04-28 10:04:35 -07:00
Bartosz Nitka
74936df848 Make matching anywhere vs at pos obvious
Summary:
This change refactors the Engine to use a different
code path for when we're calling `lookupItem` to find
a first token `Node` matching the rule and a different
one for subsequent ones.

This division lets us get better invariants and more importantly
do full text regexp matches only when necessary.

This should be particularly useful for longer texts.

Reviewed By: patapizza

Differential Revision: D4953918

fbshipit-source-id: e3a69ad
2017-04-28 09:19:20 -07:00
Julien Odent
9269727617 PT: Bring latest changes
Summary: * PhoneNumber: support for "ramal" as extension keyword

Reviewed By: niteria

Differential Revision: D4959209

fbshipit-source-id: cd12c1f
2017-04-28 08:04:22 -07:00
Julien Odent
5ba2c9e9a1 NB: Bringing latest changes
Summary:
* Numeral: fixed "hundre" (not "hundred")
* Numeral: added "tretti", "søtti"
* Time: updated last times to support "sist"
* Time: christmas days

Reviewed By: niteria

Differential Revision: D4958919

fbshipit-source-id: e4eecf5
2017-04-28 08:04:22 -07:00
Julien Odent
2182d94edb Bring latest updates for ID
Summary: * added one example in `AmountOfMoney`

Reviewed By: niteria

Differential Revision: D4958635

fbshipit-source-id: c70ce7c
2017-04-28 08:04:22 -07:00
Julien Odent
3f40625339 Temperature for Croatian
Summary: Temperature dimension for Croatian

Reviewed By: niteria

Differential Revision: D4958590

fbshipit-source-id: fe6c2e4
2017-04-28 08:04:22 -07:00
Julien Odent
3cc3266e28 Quantity for Croatian
Summary: Quantity dimension for Croatian.

Reviewed By: niteria

Differential Revision: D4958501

fbshipit-source-id: b90c8f6
2017-04-28 08:04:22 -07:00
Julien Odent
0372f4f3da Volume for Croatian
Summary: Volume dimension for Croatian

Reviewed By: niteria

Differential Revision: D4957186

fbshipit-source-id: 63012ad
2017-04-28 08:04:22 -07:00
Julien Odent
0aa4aa56bb Distance for Croatian
Summary: Distance dimension for Croatian.

Reviewed By: niteria

Differential Revision: D4957067

fbshipit-source-id: 232ce30
2017-04-28 08:04:21 -07:00
Julien Odent
35b9101c48 VI: Time
Summary:
* Time dimension for Vietnamese.
* Expose `debugContext`.

Reviewed By: niteria

Differential Revision: D4963594

fbshipit-source-id: 2373735
2017-04-28 08:04:21 -07:00
Julien Odent
e4d4531877 VI: Duration
Summary:
Duration dimension for Vietnamese.
This only uses the common rule.

Reviewed By: niteria

Differential Revision: D4962329

fbshipit-source-id: 9273245
2017-04-28 08:04:21 -07:00
Julien Odent
432ff51bd0 VI: TimeGrain
Summary: TimeGrain dimension for Vietnamese.

Reviewed By: niteria

Differential Revision: D4959399

fbshipit-source-id: e053413
2017-04-28 08:04:21 -07:00
Julien Odent
3314ddc7a4 VI: Ordinal
Summary: Ordinal for Vietnamese.

Reviewed By: niteria

Differential Revision: D4959285

fbshipit-source-id: 7212cc9
2017-04-28 08:04:21 -07:00
Julien Odent
0370c452f1 Time
Summary: Time dimension for Croatian.

Reviewed By: niteria

Differential Revision: D4954399

fbshipit-source-id: 906c4a6
2017-04-26 09:19:27 -07:00
Julien Odent
2d0594576f Duration
Summary: Duration dimension for Croatian.

Reviewed By: niteria

Differential Revision: D4947983

fbshipit-source-id: 8e55a7e
2017-04-26 09:19:27 -07:00
Julien Odent
1c15d0bbb2 TimeGrain
Summary: TimeGrain dimension for Croatian.

Reviewed By: niteria

Differential Revision: D4947837

fbshipit-source-id: b86d256
2017-04-26 09:19:27 -07:00
Julien Odent
b32696f8eb AmountOfMoney
Summary: AmountOfMoney dimension for Croatian.

Reviewed By: niteria

Differential Revision: D4947584

fbshipit-source-id: a20670a
2017-04-26 09:19:27 -07:00
Julien Odent
0f98a42b03 Ordinal
Summary: Ordinal dimension for Croatian.

Reviewed By: niteria

Differential Revision: D4947244

fbshipit-source-id: 54bda8f
2017-04-26 09:19:27 -07:00
Julien Odent
840deda7dd Setup + Numeral
Summary: Setup + Numeral dimension for Croatian.

Reviewed By: niteria

Differential Revision: D4946964

fbshipit-source-id: 204429b
2017-04-26 09:19:26 -07:00
Bartosz Nitka
c70cf6d38d Move Duckling.Stash to Duckling.Types.Stash
Summary: This is for consistency with Duckling.Types.Document

Reviewed By: patapizza

Differential Revision: D4948569

fbshipit-source-id: 459565a
2017-04-25 16:49:18 -07:00
Bartosz Nitka
8db73688d7 Move Document and helpers to a fresh module
Summary:
Document had its internal details leaked over 2 files.
This consolidates it.

It took a long time to make this perf neutral (now it's even a tiny
win), for reasons I don't completely understand.
The INLINE pragma on byteStringFromPos I semi-understand,
but I also had to move isRangeValid to Document and that's
a bit of a mystery.

Reviewed By: patapizza

Differential Revision: D4948449

fbshipit-source-id: ffb251a
2017-04-25 16:49:18 -07:00
Bartosz Nitka
924516103b Revert Duckling part of 'clean up unused imports'
Summary: it doesn't take .cabal into account

Reviewed By: patapizza

Differential Revision: D4938400

fbshipit-source-id: 8bc99a5
2017-04-24 07:34:27 -07:00
Julien Odent
dbe9e73541 Duration
Summary: Duration dimension for Hebrew.

Reviewed By: niteria

Differential Revision: D4930403

fbshipit-source-id: 690db8f
2017-04-24 06:49:40 -07:00
Julien Odent
efa38401b5 TimeGrain
Summary: TimeGrain dimension for Hebrew.

Reviewed By: niteria

Differential Revision: D4930294

fbshipit-source-id: 9c0f0da
2017-04-24 06:49:40 -07:00
Julien Odent
f5f4889770 Ordinal
Summary: Ordinal dimension for Hebrew.

Reviewed By: niteria

Differential Revision: D4930162

fbshipit-source-id: 02545ae
2017-04-24 06:49:40 -07:00
Julien Odent
bd96d3dd95 Setup + Numeral
Summary: Setup for Hebrew + Numeral dimension

Reviewed By: niteria

Differential Revision: D4930041

fbshipit-source-id: 965132b
2017-04-24 06:49:40 -07:00
Bartosz Nitka
b26aa7d84d clean up unused imports
Summary:
This diff was generated by running `hsclimps`

PLEASE TAKE ONE OF THE FOLLOWING ACTIONS AS SOON AS POSSIBLE:
  1) Select Accept and Ship to land this change
  2) If you have issues with this diff, request changes
  3) If you are no longer the owner, add reviewers and update the `.context` file with the appropriate owner

NOTE: If the diff is unable to land because of a merge conflict I will automatically update it for you.

#accept2ship

Reviewed By: niteria

Differential Revision: D4937839

fbshipit-source-id: bb3d330
2017-04-24 05:19:24 -07:00
Bartosz Nitka
7f7cc70d72 Make first pass more obvious
Summary:
Separating out the first pass lets us avoid repeated filtering
and makes the structure of the algorithm a bit more clear.

Previously `Stash.null` was used as a test for being part of
the first pass or not, but that is a bit indirect. Encoding
the algorithm structure (the state automaton) as function calls
lets us make additional assumptions.

It also has a nice side effect of costs being attributed to
first/subsequent passes in the profile.

I also prepend to `matches` because it's likely to be bigger.

Reviewed By: patapizza

Differential Revision: D4922195

fbshipit-source-id: 0aec79f
2017-04-20 11:49:15 -07:00
Bartosz Nitka
878f85b9e1 Codemod intersectMB to intersect
Summary:
`intersectMB` was a name used for the purpose of migrating.
This is the last part of the migration.

Reviewed By: patapizza

Differential Revision: D4906098

fbshipit-source-id: a70af78
2017-04-18 10:19:20 -07:00
Bartosz Nitka
fe39a55a4c Use intervalMB instead of interval
Summary:
This continues the work from:
"[Duckling] Don't produce trivially empty Tokens"
All the Rules should use intervalMB from now on.

Reviewed By: patapizza

Differential Revision: D4906072

fbshipit-source-id: 277b961
2017-04-18 10:19:20 -07:00
Bartosz Nitka
a91e787bb7 Derive Eq, Show for TimeIntervalType
Summary: This is always useful to have.

Reviewed By: patapizza

Differential Revision: D4864208

fbshipit-source-id: b879893
2017-04-18 08:19:20 -07:00
Bartosz Nitka
879b103ca3 Fix indexing problems with new regexp matcher
Summary:
My change had a couple of problems:
* utf8 character width logic was completely wrong for characters that need 3 or 4 bytes
* `Array.listArray (start, end)` produces an array where `end` is a valid index
* because of ^ the `arraySize` logic also has to change

Reviewed By: watashi, darshankapashi

Differential Revision: D4894355

fbshipit-source-id: 8d07dfd
2017-04-14 15:49:17 -07:00
Bartosz Nitka
e7aeef5436 Avoid allocations and encoding in regexp matching
Summary: The rationale is explained in a new Note.

Reviewed By: patapizza

Differential Revision: D4884104

fbshipit-source-id: 81f36ee
2017-04-14 12:19:21 -07:00
Bartosz Nitka
3d18cf5ea9 Don't produce trivially empty Tokens
Summary:
We can detect certain kinds of contradictions sooner,
producing a token with an unresolvable Predicate is wasteful.
For a text like:
```
"Demain apres midi 14h 15 h 16h vendredi 14 a 15h"
```
it could produce 7000 tokens with empty predicates.
After this change it produces none and we get a 4x improvement in
time and 6x improvement in allocations.

Note I only covered `ruleIntersect*` here. I need to do this for
other instances as well.

Reviewed By: JonCoens

Differential Revision: D4871078

fbshipit-source-id: 9f0e7ad
2017-04-11 16:35:05 -07:00
Kevin Cros
62bc5a317b Using hashmap look up instead of 'case of'
Summary: Updating regex with hashmap look ups.

Reviewed By: patapizza

Differential Revision: D4848178

fbshipit-source-id: 4d5ded8
2017-04-11 11:04:20 -07:00
ADAM LIU
928139569c Refactor of Duckling.Numeral.TR to hashmap lookup
Summary: Update of TR Rules hashmap

Reviewed By: patapizza

Differential Revision: D4860819

fbshipit-source-id: 6f5a722
2017-04-11 09:34:23 -07:00
Bartosz Nitka
f7b3f2ed73 Detect interval contradictions sooner
Summary:
So far contradictions from intersection only
propagated through intersection. This change
makes it so that it also propagates through intervals
and lets intervals also generate contradictions.

Reviewed By: patapizza

Differential Revision: D4864160

fbshipit-source-id: 8348267
2017-04-10 16:35:27 -07:00