Summary:
Also added real-world test to English `Quantity` corpus ("3/4 cup", as a culinary example)
Closes https://github.com/facebookincubator/duckling/pull/14
Reviewed By: patapizza
Differential Revision: D5035990
Pulled By: niteria
fbshipit-source-id: c1b8f65
Summary:
Turns out that `brew` can succeed installing `pcre`, but still
not install development headers in `/usr/local/include/`
if that path is not writable. `brew doctor` should find that
and related problems.
Relevant ticket: #8
Reviewed By: patapizza, JonCoens
Differential Revision: D5031503
fbshipit-source-id: ba0b8e8
Summary:
This adds a travis script, so that we get feedback when
we push or on pull requests.
It builds and runs tests.
We currently only test with GHC 8.0.2, other versions
are broken for reasons given in the script.
I relaxed the version of `time` in preparation for GHC 8.2.
It also adds an icon in the README.md
Reviewed By: JonCoens
Differential Revision: D5002255
fbshipit-source-id: 47ff3af
Summary:
I notice that there are several missing dimensions for the IT language: this patch is for the Volume dimension
Regards
Matteo
Closes https://github.com/facebookincubator/duckling/pull/4
Reviewed By: JonCoens
Differential Revision: D4986389
Pulled By: patapizza
fbshipit-source-id: 314d33e
Summary:
* moved `AmountOfMoney` up for alphabetical order
* added example request for HTTP server example
Reviewed By: JonCoens
Differential Revision: D4978108
fbshipit-source-id: e98de49
Summary:
Time dimension for Hebrew.
Commented out the failing tests that actually also fail in Clojure.
Reviewed By: JonCoens
Differential Revision: D4970308
fbshipit-source-id: b455142
Summary:
This change refactors the Engine to use a different
code path for when we're calling `lookupItem` to find
a first token `Node` matching the rule and a different
one for subsequent ones.
This division lets us get better invariants and more importantly
do full text regexp matches only when necessary.
This should be particularly useful for longer texts.
Reviewed By: patapizza
Differential Revision: D4953918
fbshipit-source-id: e3a69ad
Summary:
Duration dimension for Vietnamese.
This only uses the common rule.
Reviewed By: niteria
Differential Revision: D4962329
fbshipit-source-id: 9273245
Summary:
Document had its internal details leaked over 2 files.
This consolidates it.
It took a long time to make this perf neutral (now it's even a tiny
win), for reasons I don't completely understand.
The INLINE pragma on byteStringFromPos I semi-understand,
but I also had to move isRangeValid to Document and that's
a bit of a mystery.
Reviewed By: patapizza
Differential Revision: D4948449
fbshipit-source-id: ffb251a
Summary:
This diff was generated by running `hsclimps`
PLEASE TAKE ONE OF THE FOLLOWING ACTIONS AS SOON AS POSSIBLE:
1) Select Accept and Ship to land this change
2) If you have issues with this diff, request changes
3) If you are no longer the owner, add reviewers and update the `.context` file with the appropriate owner
NOTE: If the diff is unable to land because of a merge conflict I will automatically update it for you.
#accept2ship
Reviewed By: niteria
Differential Revision: D4937839
fbshipit-source-id: bb3d330
Summary:
Separating out the first pass lets us avoid repeated filtering
and makes the structure of the algorithm a bit more clear.
Previously `Stash.null` was used as a test for being part of
the first pass or not, but that is a bit indirect. Encoding
the algorithm structure (the state automaton) as function calls
lets us make additional assumptions.
It also has a nice side effect of costs being attributed to
first/subsequent passes in the profile.
I also prepend to `matches` because it's likely to be bigger.
Reviewed By: patapizza
Differential Revision: D4922195
fbshipit-source-id: 0aec79f
Summary:
`intersectMB` was a name used for the purpose of migrating.
This is the last part of the migration.
Reviewed By: patapizza
Differential Revision: D4906098
fbshipit-source-id: a70af78
Summary:
This continues the work from:
"[Duckling] Don't produce trivially empty Tokens"
All the Rules should use intervalMB from now on.
Reviewed By: patapizza
Differential Revision: D4906072
fbshipit-source-id: 277b961
Summary:
My change had a couple of problems:
* utf8 character width logic was completely wrong for characters that need 3 or 4 bytes
* `Array.listArray (start, end)` produces an array where `end` is a valid index
* because of ^ the `arraySize` logic also has to change
Reviewed By: watashi, darshankapashi
Differential Revision: D4894355
fbshipit-source-id: 8d07dfd
Summary:
We can detect certain kinds of contradictions sooner,
producing a token with an unresolvable Predicate is wasteful.
For a text like:
```
"Demain apres midi 14h 15 h 16h vendredi 14 a 15h"
```
it could produce 7000 tokens with empty predicates.
After this change it produces none and we get a 4x improvement in
time and 6x improvement in allocations.
Note I only covered `ruleIntersect*` here. I need to do this for
other instances as well.
Reviewed By: JonCoens
Differential Revision: D4871078
fbshipit-source-id: 9f0e7ad
Summary:
So far contradictions from intersection only
propagated through intersection. This change
makes it so that it also propagates through intervals
and lets intervals also generate contradictions.
Reviewed By: patapizza
Differential Revision: D4864160
fbshipit-source-id: 8348267