Commit Graph

79 Commits

Author SHA1 Message Date
Julien Odent
5ba2c9e9a1 NB: Bringing latest changes
Summary:
* Numeral: fixed "hundre" (not "hundred")
* Numeral: added "tretti", "søtti"
* Time: updated last times to support "sist"
* Time: christmas days

Reviewed By: niteria

Differential Revision: D4958919

fbshipit-source-id: e4eecf5
2017-04-28 08:04:22 -07:00
Julien Odent
2182d94edb Bring latest updates for ID
Summary: * added one example in `AmountOfMoney`

Reviewed By: niteria

Differential Revision: D4958635

fbshipit-source-id: c70ce7c
2017-04-28 08:04:22 -07:00
Julien Odent
3f40625339 Temperature for Croatian
Summary: Temperature dimension for Croatian

Reviewed By: niteria

Differential Revision: D4958590

fbshipit-source-id: fe6c2e4
2017-04-28 08:04:22 -07:00
Julien Odent
3cc3266e28 Quantity for Croatian
Summary: Quantity dimension for Croatian.

Reviewed By: niteria

Differential Revision: D4958501

fbshipit-source-id: b90c8f6
2017-04-28 08:04:22 -07:00
Julien Odent
0372f4f3da Volume for Croatian
Summary: Volume dimension for Croatian

Reviewed By: niteria

Differential Revision: D4957186

fbshipit-source-id: 63012ad
2017-04-28 08:04:22 -07:00
Julien Odent
0aa4aa56bb Distance for Croatian
Summary: Distance dimension for Croatian.

Reviewed By: niteria

Differential Revision: D4957067

fbshipit-source-id: 232ce30
2017-04-28 08:04:21 -07:00
Julien Odent
35b9101c48 VI: Time
Summary:
* Time dimension for Vietnamese.
* Expose `debugContext`.

Reviewed By: niteria

Differential Revision: D4963594

fbshipit-source-id: 2373735
2017-04-28 08:04:21 -07:00
Julien Odent
e4d4531877 VI: Duration
Summary:
Duration dimension for Vietnamese.
This only uses the common rule.

Reviewed By: niteria

Differential Revision: D4962329

fbshipit-source-id: 9273245
2017-04-28 08:04:21 -07:00
Julien Odent
432ff51bd0 VI: TimeGrain
Summary: TimeGrain dimension for Vietnamese.

Reviewed By: niteria

Differential Revision: D4959399

fbshipit-source-id: e053413
2017-04-28 08:04:21 -07:00
Julien Odent
3314ddc7a4 VI: Ordinal
Summary: Ordinal for Vietnamese.

Reviewed By: niteria

Differential Revision: D4959285

fbshipit-source-id: 7212cc9
2017-04-28 08:04:21 -07:00
Julien Odent
0370c452f1 Time
Summary: Time dimension for Croatian.

Reviewed By: niteria

Differential Revision: D4954399

fbshipit-source-id: 906c4a6
2017-04-26 09:19:27 -07:00
Julien Odent
2d0594576f Duration
Summary: Duration dimension for Croatian.

Reviewed By: niteria

Differential Revision: D4947983

fbshipit-source-id: 8e55a7e
2017-04-26 09:19:27 -07:00
Julien Odent
1c15d0bbb2 TimeGrain
Summary: TimeGrain dimension for Croatian.

Reviewed By: niteria

Differential Revision: D4947837

fbshipit-source-id: b86d256
2017-04-26 09:19:27 -07:00
Julien Odent
b32696f8eb AmountOfMoney
Summary: AmountOfMoney dimension for Croatian.

Reviewed By: niteria

Differential Revision: D4947584

fbshipit-source-id: a20670a
2017-04-26 09:19:27 -07:00
Julien Odent
0f98a42b03 Ordinal
Summary: Ordinal dimension for Croatian.

Reviewed By: niteria

Differential Revision: D4947244

fbshipit-source-id: 54bda8f
2017-04-26 09:19:27 -07:00
Julien Odent
840deda7dd Setup + Numeral
Summary: Setup + Numeral dimension for Croatian.

Reviewed By: niteria

Differential Revision: D4946964

fbshipit-source-id: 204429b
2017-04-26 09:19:26 -07:00
Bartosz Nitka
c70cf6d38d Move Duckling.Stash to Duckling.Types.Stash
Summary: This is for consistency with Duckling.Types.Document

Reviewed By: patapizza

Differential Revision: D4948569

fbshipit-source-id: 459565a
2017-04-25 16:49:18 -07:00
Bartosz Nitka
8db73688d7 Move Document and helpers to a fresh module
Summary:
Document had its internal details leaked over 2 files.
This consolidates it.

It took a long time to make this perf neutral (now it's even a tiny
win), for reasons I don't completely understand.
The INLINE pragma on byteStringFromPos I semi-understand,
but I also had to move isRangeValid to Document and that's
a bit of a mystery.

Reviewed By: patapizza

Differential Revision: D4948449

fbshipit-source-id: ffb251a
2017-04-25 16:49:18 -07:00
Bartosz Nitka
924516103b Revert Duckling part of 'clean up unused imports'
Summary: it doesn't take .cabal into account

Reviewed By: patapizza

Differential Revision: D4938400

fbshipit-source-id: 8bc99a5
2017-04-24 07:34:27 -07:00
Julien Odent
dbe9e73541 Duration
Summary: Duration dimension for Hebrew.

Reviewed By: niteria

Differential Revision: D4930403

fbshipit-source-id: 690db8f
2017-04-24 06:49:40 -07:00
Julien Odent
efa38401b5 TimeGrain
Summary: TimeGrain dimension for Hebrew.

Reviewed By: niteria

Differential Revision: D4930294

fbshipit-source-id: 9c0f0da
2017-04-24 06:49:40 -07:00
Julien Odent
f5f4889770 Ordinal
Summary: Ordinal dimension for Hebrew.

Reviewed By: niteria

Differential Revision: D4930162

fbshipit-source-id: 02545ae
2017-04-24 06:49:40 -07:00
Julien Odent
bd96d3dd95 Setup + Numeral
Summary: Setup for Hebrew + Numeral dimension

Reviewed By: niteria

Differential Revision: D4930041

fbshipit-source-id: 965132b
2017-04-24 06:49:40 -07:00
Bartosz Nitka
b26aa7d84d clean up unused imports
Summary:
This diff was generated by running `hsclimps`

PLEASE TAKE ONE OF THE FOLLOWING ACTIONS AS SOON AS POSSIBLE:
  1) Select Accept and Ship to land this change
  2) If you have issues with this diff, request changes
  3) If you are no longer the owner, add reviewers and update the `.context` file with the appropriate owner

NOTE: If the diff is unable to land because of a merge conflict I will automatically update it for you.

#accept2ship

Reviewed By: niteria

Differential Revision: D4937839

fbshipit-source-id: bb3d330
2017-04-24 05:19:24 -07:00
Bartosz Nitka
7f7cc70d72 Make first pass more obvious
Summary:
Separating out the first pass lets us avoid repeated filtering
and makes the structure of the algorithm a bit more clear.

Previously `Stash.null` was used as a test for being part of
the first pass or not, but that is a bit indirect. Encoding
the algorithm structure (the state automaton) as function calls
lets us make additional assumptions.

It also has a nice side effect of costs being attributed to
first/subsequent passes in the profile.

I also prepend to `matches` because it's likely to be bigger.

Reviewed By: patapizza

Differential Revision: D4922195

fbshipit-source-id: 0aec79f
2017-04-20 11:49:15 -07:00
Bartosz Nitka
878f85b9e1 Codemod intersectMB to intersect
Summary:
`intersectMB` was a name used for the purpose of migrating.
This is the last part of the migration.

Reviewed By: patapizza

Differential Revision: D4906098

fbshipit-source-id: a70af78
2017-04-18 10:19:20 -07:00
Bartosz Nitka
fe39a55a4c Use intervalMB instead of interval
Summary:
This continues the work from:
"[Duckling] Don't produce trivially empty Tokens"
All the Rules should use intervalMB from now on.

Reviewed By: patapizza

Differential Revision: D4906072

fbshipit-source-id: 277b961
2017-04-18 10:19:20 -07:00
Bartosz Nitka
a91e787bb7 Derive Eq, Show for TimeIntervalType
Summary: This is always useful to have.

Reviewed By: patapizza

Differential Revision: D4864208

fbshipit-source-id: b879893
2017-04-18 08:19:20 -07:00
Bartosz Nitka
879b103ca3 Fix indexing problems with new regexp matcher
Summary:
My change had a couple of problems:
* utf8 character width logic was completely wrong for characters that need 3 or 4 bytes
* `Array.listArray (start, end)` produces an array where `end` is a valid index
* because of ^ the `arraySize` logic also has to change

Reviewed By: watashi, darshankapashi

Differential Revision: D4894355

fbshipit-source-id: 8d07dfd
2017-04-14 15:49:17 -07:00
Bartosz Nitka
e7aeef5436 Avoid allocations and encoding in regexp matching
Summary: The rationale is explained in a new Note.

Reviewed By: patapizza

Differential Revision: D4884104

fbshipit-source-id: 81f36ee
2017-04-14 12:19:21 -07:00
Bartosz Nitka
3d18cf5ea9 Don't produce trivially empty Tokens
Summary:
We can detect certain kinds of contradictions sooner,
producing a token with an unresolvable Predicate is wasteful.
For a text like:
```
"Demain apres midi 14h 15 h 16h vendredi 14 a 15h"
```
it could produce 7000 tokens with empty predicates.
After this change it produces none and we get a 4x improvement in
time and 6x improvement in allocations.

Note I only covered `ruleIntersect*` here. I need to do this for
other instances as well.

Reviewed By: JonCoens

Differential Revision: D4871078

fbshipit-source-id: 9f0e7ad
2017-04-11 16:35:05 -07:00
Kevin Cros
62bc5a317b Using hashmap look up instead of 'case of'
Summary: Updating regex with hashmap look ups.

Reviewed By: patapizza

Differential Revision: D4848178

fbshipit-source-id: 4d5ded8
2017-04-11 11:04:20 -07:00
ADAM LIU
928139569c Refactor of Duckling.Numeral.TR to hashmap lookup
Summary: Update of TR Rules hashmap

Reviewed By: patapizza

Differential Revision: D4860819

fbshipit-source-id: 6f5a722
2017-04-11 09:34:23 -07:00
Bartosz Nitka
f7b3f2ed73 Detect interval contradictions sooner
Summary:
So far contradictions from intersection only
propagated through intersection. This change
makes it so that it also propagates through intervals
and lets intervals also generate contradictions.

Reviewed By: patapizza

Differential Revision: D4864160

fbshipit-source-id: 8348267
2017-04-10 16:35:27 -07:00
Bartosz Nitka
1cf8496967 tt helper for returning Time Tokens
Summary:
This is a very common pattern (>1k occurrences).
Replacing it with something shorter makes the rules a bit less
boilerplate-y.
Feel free to bikeshed the name, I can easily redo the codemod.

Reviewed By: patapizza

Differential Revision: D4848864

fbshipit-source-id: 7baeee3
2017-04-10 12:34:43 -07:00
Bartosz Nitka
f46539ced2 Type for Closed/Open intervals
Summary:
This makes the code easier to read.
I'm not attached to naming, but this is
standard terminology from topology.

Reviewed By: JonCoens, patapizza

Differential Revision: D4848740

fbshipit-source-id: 79c2c20
2017-04-07 12:19:17 -07:00
Jonathan Coens
b3ca32104d Simple example HTTP server
Summary: Runs a `snap` server to return the support targets as well as do parsing. It's a bit cludgy, but gets the job done.

Reviewed By: patapizza

Differential Revision: D4813197

fbshipit-source-id: 0fa165b
2017-04-06 17:04:48 -07:00
ADAM LIU
572ff95adf Update RU Rules HashMap lookups update
Summary: Update of RU Rules hashmap

Reviewed By: patapizza

Differential Revision: D4840947

fbshipit-source-id: 00cb679
2017-04-06 15:49:17 -07:00
Bartosz Nitka
78ecaa3728 Derive NFData for Entity
Summary: This makes benchmarking easier.

Reviewed By: JonCoens

Differential Revision: D4846839

fbshipit-source-id: 9cc8dfa
2017-04-06 15:34:43 -07:00
Bartosz Nitka
290ca48e25 Fix 4:23am returning 5:23am
Summary:
This is the easiest way to fix it, but talking offline
with Julien, we may need to revisit.
It basically gets rid of time series where we were
producing intervals that are not a multiply of the grain.

Reviewed By: patapizza

Differential Revision: D4841759

fbshipit-source-id: 1c4742a
2017-04-06 11:04:16 -07:00
Amelia Wilson
70ef9b1bbe using hashmap lookups
Summary: converting large regex lookups to hashmap lookups in Duckling/Numeral/FR/Rules.hs and Duckling/Ordinal/FR/Rules.hs

Reviewed By: patapizza

Differential Revision: D4836336

fbshipit-source-id: 2241a3a
2017-04-05 12:20:10 -07:00
Jonathan Coens
7c47431ce5 Upgrade to stackage 8.8
Summary: Just a little bounds bump

Reviewed By: patapizza

Differential Revision: D4835536

fbshipit-source-id: d51fbb8
2017-04-05 11:19:31 -07:00
Jonathan Coens
e2da9bc7fb Upgrade to stackage 8.6
Summary: Moves to the 8.6 resolver, updates package limits, and fixes errors due to upgrade.

Reviewed By: patapizza

Differential Revision: D4810924

fbshipit-source-id: c8a64a9
2017-04-04 15:19:41 -07:00
Bartosz Nitka
e37bb7c186 Duckling monad for Engine
Summary:
This converts the code to monadic style, so that
we can in the future:
* stop threading the `Document` parameter everywhere
* keep some state, like regexp match cache (I've already checked that it makes a substantial difference)

There should be no difference in performance or behavior
at this point.

Reviewed By: patapizza

Differential Revision: D4778808

fbshipit-source-id: a167ed8
2017-03-31 14:19:40 -07:00
Julien Odent
78228dea83 Update email
Summary: Setup the correct email.

Reviewed By: JonCoens

Differential Revision: D4806876

fbshipit-source-id: a52f9f8
2017-03-30 16:20:08 -07:00
Bartosz Nitka
a1917a53f3 Make sure regen is rebuilt
Summary:
`stack exe/RegenMain.hs` uses runghc which is a tool
we don't test with often. Making sure the executable
is rebuilt and using it should be enough.

Reviewed By: patapizza

Differential Revision: D4783844

fbshipit-source-id: 459dbc4
2017-03-28 07:49:19 -07:00
Bartosz Nitka
bd94622f64 Move tests to tests and exes to exe
Summary:
This works around https://github.com/haskell/cabal/issues/4350
If we don't do this files get compiled multiple times
and cabal is unhappy.

Reviewed By: patapizza

Differential Revision: D4782749

fbshipit-source-id: 5bbe425
2017-03-27 16:04:24 -07:00
Christian Bell
02e74cacd6 HashMap lookups for large regexes
Summary: Use HashMaps to speed up string pattern matching for UK (Ukranian).

Reviewed By: patapizza

Differential Revision: D4747195

fbshipit-source-id: e582dba
2017-03-22 08:49:17 -07:00
Julien Odent
96f365e927 Expose toName
Summary: .

Reviewed By: niteria

Differential Revision: D4753842

fbshipit-source-id: 2e88e86
2017-03-22 08:19:19 -07:00
Bartosz Nitka
b108ab260f Allocate less in lookupRegexp
Summary:
Contrary to my intuitions this part is the lion share
of allocations in `lookupRegexp`. I'd have expected `Text`
operations to dwarf it.

It's a bit doubious that we build such big lists that it
matters, perhaps in the future we can explore limiting the
number of matches considered.

Reviewed By: patapizza

Differential Revision: D4745711

fbshipit-source-id: ebdc1aa
2017-03-21 09:19:18 -07:00