Commit Graph

740 Commits

Author SHA1 Message Date
Bartosz Nitka
290ca48e25 Fix 4:23am returning 5:23am
Summary:
This is the easiest way to fix it, but talking offline
with Julien, we may need to revisit.
It basically gets rid of time series where we were
producing intervals that are not a multiply of the grain.

Reviewed By: patapizza

Differential Revision: D4841759

fbshipit-source-id: 1c4742a
2017-04-06 11:04:16 -07:00
Amelia Wilson
70ef9b1bbe using hashmap lookups
Summary: converting large regex lookups to hashmap lookups in Duckling/Numeral/FR/Rules.hs and Duckling/Ordinal/FR/Rules.hs

Reviewed By: patapizza

Differential Revision: D4836336

fbshipit-source-id: 2241a3a
2017-04-05 12:20:10 -07:00
Jonathan Coens
7c47431ce5 Upgrade to stackage 8.8
Summary: Just a little bounds bump

Reviewed By: patapizza

Differential Revision: D4835536

fbshipit-source-id: d51fbb8
2017-04-05 11:19:31 -07:00
Jonathan Coens
e2da9bc7fb Upgrade to stackage 8.6
Summary: Moves to the 8.6 resolver, updates package limits, and fixes errors due to upgrade.

Reviewed By: patapizza

Differential Revision: D4810924

fbshipit-source-id: c8a64a9
2017-04-04 15:19:41 -07:00
Bartosz Nitka
e37bb7c186 Duckling monad for Engine
Summary:
This converts the code to monadic style, so that
we can in the future:
* stop threading the `Document` parameter everywhere
* keep some state, like regexp match cache (I've already checked that it makes a substantial difference)

There should be no difference in performance or behavior
at this point.

Reviewed By: patapizza

Differential Revision: D4778808

fbshipit-source-id: a167ed8
2017-03-31 14:19:40 -07:00
Julien Odent
78228dea83 Update email
Summary: Setup the correct email.

Reviewed By: JonCoens

Differential Revision: D4806876

fbshipit-source-id: a52f9f8
2017-03-30 16:20:08 -07:00
Bartosz Nitka
a1917a53f3 Make sure regen is rebuilt
Summary:
`stack exe/RegenMain.hs` uses runghc which is a tool
we don't test with often. Making sure the executable
is rebuilt and using it should be enough.

Reviewed By: patapizza

Differential Revision: D4783844

fbshipit-source-id: 459dbc4
2017-03-28 07:49:19 -07:00
Bartosz Nitka
bd94622f64 Move tests to tests and exes to exe
Summary:
This works around https://github.com/haskell/cabal/issues/4350
If we don't do this files get compiled multiple times
and cabal is unhappy.

Reviewed By: patapizza

Differential Revision: D4782749

fbshipit-source-id: 5bbe425
2017-03-27 16:04:24 -07:00
Christian Bell
02e74cacd6 HashMap lookups for large regexes
Summary: Use HashMaps to speed up string pattern matching for UK (Ukranian).

Reviewed By: patapizza

Differential Revision: D4747195

fbshipit-source-id: e582dba
2017-03-22 08:49:17 -07:00
Julien Odent
96f365e927 Expose toName
Summary: .

Reviewed By: niteria

Differential Revision: D4753842

fbshipit-source-id: 2e88e86
2017-03-22 08:19:19 -07:00
Bartosz Nitka
b108ab260f Allocate less in lookupRegexp
Summary:
Contrary to my intuitions this part is the lion share
of allocations in `lookupRegexp`. I'd have expected `Text`
operations to dwarf it.

It's a bit doubious that we build such big lists that it
matters, perhaps in the future we can explore limiting the
number of matches considered.

Reviewed By: patapizza

Differential Revision: D4745711

fbshipit-source-id: ebdc1aa
2017-03-21 09:19:18 -07:00
Bartosz Nitka
56a039eef1 Optimize isRangeValid
Summary:
`isRangeValid` was doing lots of random indexing inside a Text.
Since we already have a convenient O(1), indexable `Vector Char`
we can just use it instead.

Reviewed By: patapizza

Differential Revision: D4744297

fbshipit-source-id: b23011b
2017-03-21 08:49:16 -07:00
Bartosz Nitka
58bf36b9f4 Optimize isAdjacent
Summary:
`isAdjacent` was doing a ton of useless copies and
redundant work. But pre-computing a `firstNonAdjacent` table
we can answer every `isAdjacent` query in `O(1)` time and
(almost?) no allocations.

It may be a symptom of algorithmic problems, but we shouldn't
make it more expensive than it needs to be.

Reviewed By: patapizza

Differential Revision: D4744172

fbshipit-source-id: dd70be2
2017-03-21 07:34:24 -07:00
Bartosz Nitka
26b1327bcd Make Document type abstract
Summary:
This will let me do smarter things on document construction,
like precomputing where all the whitespace is so that
I can answer `isAdjacent` in O(1) time.

If I'm measuring things right my next diff will cut down
allocations 4x on problematic inputs.

Reviewed By: patapizza

Differential Revision: D4742664

fbshipit-source-id: 7e14e25
2017-03-20 20:49:24 -07:00
Bartosz Nitka
09acefbcf5 Make Show Dimension "law-abiding"
Summary:
`Show` should print things close to source level representation.
I wanted to generate some tests from inputs that cause problems
and there was no way to get source level representation of
Dimension.

Reviewed By: patapizza

Differential Revision: D4723711

fbshipit-source-id: fff658d
2017-03-16 16:34:16 -07:00
Julien Odent
e76cee3a6d Rename Finance to AmountOfMoney
Summary: Because it makes more sense.

Reviewed By: JonCoens

Differential Revision: D4721646

fbshipit-source-id: 449bfb4
2017-03-16 14:49:44 -07:00
Julien Odent
54c9448fba Rename Number to Numeral
Summary: For consistency with the dimension name.

Reviewed By: JonCoens

Differential Revision: D4722216

fbshipit-source-id: 82c56d3
2017-03-16 13:49:16 -07:00
Julien Odent
33fa98734a Fix 'no dia 20'
Summary:
* 'no dia 20' (on the 20)
* Unifying two rules into one, with a day grain

See https://github.com/wit-ai/wit/issues/388

Reviewed By: blandinw

Differential Revision: D4715780

fbshipit-source-id: e990954
2017-03-15 13:49:17 -07:00
Julien Odent
1c98c0308c Fix Some in README
Summary: #accept2ship

Reviewed By: niteria

Differential Revision: D4715804

fbshipit-source-id: d53ca9a
2017-03-15 13:19:36 -07:00
Jonathan Coens
41800a3171 Move onto dependent-sum instead of custom local data Some
Summary:
No need to reinvent the wheel when `dependent-sum` has what we need. I re-export `Some(..)` from `Duckling.Dimensions.Types` to cut down on import bloat.
Instead of a `Read` instance I created a `fromName` function.

Reviewed By: zilberstein

Differential Revision: D4710014

fbshipit-source-id: 1d4e86d
2017-03-15 10:34:17 -07:00
Bartosz Nitka
d23ae54ab9 .gitignore .stack-work
Summary:
stack creates this directory, we should
prevent it from being commited.

Reviewed By: JonCoens

Differential Revision: D4713790

fbshipit-source-id: 34b723d
2017-03-15 10:04:30 -07:00
Bartosz Nitka
1a251d8e42 Use HashMap.lookupDefault
Summary: This is a small stylystic improvement.

Reviewed By: patapizza

Differential Revision: D4713463

fbshipit-source-id: 47720d3
2017-03-15 08:19:11 -07:00
Julien Odent
1edf62f347 Adding logo
Summary: happy_duck

Reviewed By: niteria

Differential Revision: D4713395

fbshipit-source-id: dd1c141
2017-03-15 08:04:31 -07:00
Julien Odent
ea80ab07d3 Update maintainer email
Summary: .

Reviewed By: niteria

Differential Revision: D4713313

fbshipit-source-id: 4fbeabb
2017-03-15 07:49:12 -07:00
Julien Odent
cc016bb178 Refactoring + return domain
Summary:
* Simplified `Url` to only keep track of what we need (we can change back later)
* Normalize domain: remove subdomains like `www`, `www2` and lower case
* Return the full domain in the JSON value field
* Updated offensive url example

Reviewed By: JonCoens

Differential Revision: D4705403

fbshipit-source-id: e5d11ee
2017-03-14 13:49:20 -07:00
Jonathan Coens
1b91b70c58 codemod DNumber to Numeral
Summary: `DNumber` is a terrible name and was only there because legacy. `Numeral` makes more sense for this dimension, so let's use that instead.

Reviewed By: patapizza

Differential Revision: D4707167

fbshipit-source-id: cd78aa3
2017-03-14 13:34:11 -07:00
Bartosz Nitka
ec39c21593 Make the regexp less dangerous
Summary:
The current regexp matches sequences of numbers of unbounded
length with lots of backtracking. Since phone numbers
are shorter than X=20 characters we can put a bound
on every currently unbounded match.

Additionally we can use groups that don't capture, to
avoid marshalling data that we won't need.

Reviewed By: JonCoens

Differential Revision: D4706862

fbshipit-source-id: 39ca9bb
2017-03-14 12:19:12 -07:00
Julien Odent
2f4ecfba08 Update README
Summary: Doc to extend existing dimension/language support

Reviewed By: JonCoens

Differential Revision: D4706035

fbshipit-source-id: a8ecca4
2017-03-14 11:34:11 -07:00
Julien Odent
483ad4a191 OverloadedStrings for Debug
Summary: #accept2ship

Reviewed By: niteria

Differential Revision: D4705625

fbshipit-source-id: 1245858
2017-03-14 08:34:11 -07:00
Bartosz Nitka
28d53fce30 Remove ruleIntersect2
Summary:
It is no longer necessary after D4676812 and D4698788.
`"I have 9 am 12 pm 1 pm 2pm 4 pm 3 pm on Saturday"` now works in
less than a second, it used to be 10s.

The test suite also got 3s faster.

Reviewed By: patapizza

Differential Revision: D4701890

fbshipit-source-id: 107a55f
2017-03-14 05:04:12 -07:00
Zejun Wu
3001604548 Clean redudant parentheses to test landcastle
Summary:
Clean redudant parentheses to test landcastle
opt-out-review

Differential Revision:
D4703203

verified-sandcastle

fbshipit-source-id: def175d
2017-03-13 18:19:24 -07:00
Bartosz Nitka
003604dce7 Optimize simple time predicates
Summary:
This is the next step for:
https://fb.facebook.com/groups/527352907463243/permalink/600056483526218/

This:
* changes the time language to be able to track contradictions (`EmptyPredicate`)
* changes the time language to be able to collect non-contradicting pieces, like month and hour and unify them
* provides an efficient way to convert those pieces into (past,future) time series
* adds AMPM predicate runner - there's a bit of overlap with is12H, but it basically works
* changes a test case that was wrong before
* regenerates classifiers, I'm not sure why they changed exactly

Before:
```
res <- H.io $ let sentence = "10am thurs 4.30 thurs 12pm sat" in (debugTokens sentence $ analyze sentence (testContext {lang = EN}) HashSet.empty)
(15.50 secs, 6,171,188,928 bytes)

res <- H.io $ let sentence = "I have 9 am 12 pm 1 pm 2pm 4 pm 3 pm on Saturday" in (debugTokens sentence $ analyze sentence (testContext {lang = EN}) HashSet.empty)
(110.82 secs, 44,031,569,512 bytes)
```

After:
```
res <- H.io $ let sentence = "10am thurs 4.30 thurs 12pm sat" in (debugTokens sentence $ analyze sentence (testContext {lang = EN}) HashSet.empty)
(1.24 secs, 703,020,912 bytes)

res <- H.io $ let sentence = "I have 9 am 12 pm 1 pm 2pm 4 pm 3 pm on Saturday" in (debugTokens sentence $ analyze sentence (testContext {lang = EN}) HashSet.empty)
(9.51 secs, 5,891,109,592 bytes)
```

Reviewed By: JonCoens

Differential Revision: D4676812

fbshipit-source-id: 9810203
2017-03-13 17:04:10 -07:00
Julien Odent
fd80953407 Adding Feb tomorrow
Summary: .

Reviewed By: niteria

Differential Revision: D4700059

fbshipit-source-id: 3d63aa4
2017-03-13 14:04:22 -07:00
Julien Odent
2e50aa5ea0 Fix 'tomorrow July' + IT fixes
Summary:
* we weren't checking the right reference time in `takeNth` and `takeN`
* fixing resulting failing tests for `IT`
* `analyzedNTest` to check that input results in `n` parsed tokens

Reviewed By: niteria

Differential Revision: D4698788

fbshipit-source-id: 2cd4762
2017-03-13 12:04:17 -07:00
Bartosz Nitka
5f6c4fcec3 Make the license field more precise
Summary:
`cabal` is spewing this (it still successfully loads):
```
Warning: 'license: BSD' is not a recognised license. The known licenses are:
GPL, GPL-2, GPL-3, LGPL, LGPL-2.1, LGPL-3, AGPL, AGPL-3, BSD2, BSD3, MIT, ISC,
MPL-2.0, Apache, Apache-2.0, PublicDomain, AllRightsReserved, OtherLicense
```
Looking at the LICENSE file we have in the repo and the wikipedia page: https://en.wikipedia.org/wiki/BSD_licenses, it looks like we're using BSD3.

Reviewed By: patapizza

Differential Revision: D4697670

fbshipit-source-id: 6c80078
2017-03-13 06:04:10 -07:00
Julien Odent
161889c3e6 README.md + updating cabal
Summary:
* basic `README.md`
* updated `duckling.cabal`

Reviewed By: JonCoens

Differential Revision: D4691967

fbshipit-source-id: 0a5cdf7
2017-03-10 15:04:23 -08:00
Julien Odent
d5690f5e5e CONTRIBUTING.md
Summary:
https://our.intern.facebook.com/intern/dex/open-source/open-source-licenses/#a-contributing-template
Adapted https://github.com/facebook/bistro/blob/master/CONTRIBUTING.md for `Our Development Process`.
Test-driven workflow.

Reviewed By: JonCoens

Differential Revision: D4691472

fbshipit-source-id: d296c77
2017-03-10 14:49:18 -08:00
Julien Odent
ab06262291 Strip off TODO/FIXME
Summary: as the title says

Differential Revision: D4682120

fbshipit-source-id: 3f66286
2017-03-10 12:04:16 -08:00
Julien Odent
69aeff3a71 Fix st build
Summary: `RebindableSyntax` looks for `fromString` in scope.

Reviewed By: JonCoens

Differential Revision: D4675221

fbshipit-source-id: d7ff49d
2017-03-09 10:49:26 -08:00
FBShipIt
3f8e52e70a Initial commit
fbshipit-source-id: 301a10f448e9623aa1c953544f42de562909e192
2017-03-08 10:33:56 -08:00