Commit Graph

30 Commits

Author SHA1 Message Date
Bartosz Nitka
b108ab260f Allocate less in lookupRegexp
Summary:
Contrary to my intuitions this part is the lion share
of allocations in `lookupRegexp`. I'd have expected `Text`
operations to dwarf it.

It's a bit doubious that we build such big lists that it
matters, perhaps in the future we can explore limiting the
number of matches considered.

Reviewed By: patapizza

Differential Revision: D4745711

fbshipit-source-id: ebdc1aa
2017-03-21 09:19:18 -07:00
Bartosz Nitka
56a039eef1 Optimize isRangeValid
Summary:
`isRangeValid` was doing lots of random indexing inside a Text.
Since we already have a convenient O(1), indexable `Vector Char`
we can just use it instead.

Reviewed By: patapizza

Differential Revision: D4744297

fbshipit-source-id: b23011b
2017-03-21 08:49:16 -07:00
Bartosz Nitka
58bf36b9f4 Optimize isAdjacent
Summary:
`isAdjacent` was doing a ton of useless copies and
redundant work. But pre-computing a `firstNonAdjacent` table
we can answer every `isAdjacent` query in `O(1)` time and
(almost?) no allocations.

It may be a symptom of algorithmic problems, but we shouldn't
make it more expensive than it needs to be.

Reviewed By: patapizza

Differential Revision: D4744172

fbshipit-source-id: dd70be2
2017-03-21 07:34:24 -07:00
Bartosz Nitka
26b1327bcd Make Document type abstract
Summary:
This will let me do smarter things on document construction,
like precomputing where all the whitespace is so that
I can answer `isAdjacent` in O(1) time.

If I'm measuring things right my next diff will cut down
allocations 4x on problematic inputs.

Reviewed By: patapizza

Differential Revision: D4742664

fbshipit-source-id: 7e14e25
2017-03-20 20:49:24 -07:00
Bartosz Nitka
09acefbcf5 Make Show Dimension "law-abiding"
Summary:
`Show` should print things close to source level representation.
I wanted to generate some tests from inputs that cause problems
and there was no way to get source level representation of
Dimension.

Reviewed By: patapizza

Differential Revision: D4723711

fbshipit-source-id: fff658d
2017-03-16 16:34:16 -07:00
Julien Odent
e76cee3a6d Rename Finance to AmountOfMoney
Summary: Because it makes more sense.

Reviewed By: JonCoens

Differential Revision: D4721646

fbshipit-source-id: 449bfb4
2017-03-16 14:49:44 -07:00
Julien Odent
54c9448fba Rename Number to Numeral
Summary: For consistency with the dimension name.

Reviewed By: JonCoens

Differential Revision: D4722216

fbshipit-source-id: 82c56d3
2017-03-16 13:49:16 -07:00
Julien Odent
33fa98734a Fix 'no dia 20'
Summary:
* 'no dia 20' (on the 20)
* Unifying two rules into one, with a day grain

See https://github.com/wit-ai/wit/issues/388

Reviewed By: blandinw

Differential Revision: D4715780

fbshipit-source-id: e990954
2017-03-15 13:49:17 -07:00
Julien Odent
1c98c0308c Fix Some in README
Summary: #accept2ship

Reviewed By: niteria

Differential Revision: D4715804

fbshipit-source-id: d53ca9a
2017-03-15 13:19:36 -07:00
Jonathan Coens
41800a3171 Move onto dependent-sum instead of custom local data Some
Summary:
No need to reinvent the wheel when `dependent-sum` has what we need. I re-export `Some(..)` from `Duckling.Dimensions.Types` to cut down on import bloat.
Instead of a `Read` instance I created a `fromName` function.

Reviewed By: zilberstein

Differential Revision: D4710014

fbshipit-source-id: 1d4e86d
2017-03-15 10:34:17 -07:00
Bartosz Nitka
d23ae54ab9 .gitignore .stack-work
Summary:
stack creates this directory, we should
prevent it from being commited.

Reviewed By: JonCoens

Differential Revision: D4713790

fbshipit-source-id: 34b723d
2017-03-15 10:04:30 -07:00
Bartosz Nitka
1a251d8e42 Use HashMap.lookupDefault
Summary: This is a small stylystic improvement.

Reviewed By: patapizza

Differential Revision: D4713463

fbshipit-source-id: 47720d3
2017-03-15 08:19:11 -07:00
Julien Odent
1edf62f347 Adding logo
Summary: happy_duck

Reviewed By: niteria

Differential Revision: D4713395

fbshipit-source-id: dd1c141
2017-03-15 08:04:31 -07:00
Julien Odent
ea80ab07d3 Update maintainer email
Summary: .

Reviewed By: niteria

Differential Revision: D4713313

fbshipit-source-id: 4fbeabb
2017-03-15 07:49:12 -07:00
Julien Odent
cc016bb178 Refactoring + return domain
Summary:
* Simplified `Url` to only keep track of what we need (we can change back later)
* Normalize domain: remove subdomains like `www`, `www2` and lower case
* Return the full domain in the JSON value field
* Updated offensive url example

Reviewed By: JonCoens

Differential Revision: D4705403

fbshipit-source-id: e5d11ee
2017-03-14 13:49:20 -07:00
Jonathan Coens
1b91b70c58 codemod DNumber to Numeral
Summary: `DNumber` is a terrible name and was only there because legacy. `Numeral` makes more sense for this dimension, so let's use that instead.

Reviewed By: patapizza

Differential Revision: D4707167

fbshipit-source-id: cd78aa3
2017-03-14 13:34:11 -07:00
Bartosz Nitka
ec39c21593 Make the regexp less dangerous
Summary:
The current regexp matches sequences of numbers of unbounded
length with lots of backtracking. Since phone numbers
are shorter than X=20 characters we can put a bound
on every currently unbounded match.

Additionally we can use groups that don't capture, to
avoid marshalling data that we won't need.

Reviewed By: JonCoens

Differential Revision: D4706862

fbshipit-source-id: 39ca9bb
2017-03-14 12:19:12 -07:00
Julien Odent
2f4ecfba08 Update README
Summary: Doc to extend existing dimension/language support

Reviewed By: JonCoens

Differential Revision: D4706035

fbshipit-source-id: a8ecca4
2017-03-14 11:34:11 -07:00
Julien Odent
483ad4a191 OverloadedStrings for Debug
Summary: #accept2ship

Reviewed By: niteria

Differential Revision: D4705625

fbshipit-source-id: 1245858
2017-03-14 08:34:11 -07:00
Bartosz Nitka
28d53fce30 Remove ruleIntersect2
Summary:
It is no longer necessary after D4676812 and D4698788.
`"I have 9 am 12 pm 1 pm 2pm 4 pm 3 pm on Saturday"` now works in
less than a second, it used to be 10s.

The test suite also got 3s faster.

Reviewed By: patapizza

Differential Revision: D4701890

fbshipit-source-id: 107a55f
2017-03-14 05:04:12 -07:00
Zejun Wu
3001604548 Clean redudant parentheses to test landcastle
Summary:
Clean redudant parentheses to test landcastle
opt-out-review

Differential Revision:
D4703203

verified-sandcastle

fbshipit-source-id: def175d
2017-03-13 18:19:24 -07:00
Bartosz Nitka
003604dce7 Optimize simple time predicates
Summary:
This is the next step for:
https://fb.facebook.com/groups/527352907463243/permalink/600056483526218/

This:
* changes the time language to be able to track contradictions (`EmptyPredicate`)
* changes the time language to be able to collect non-contradicting pieces, like month and hour and unify them
* provides an efficient way to convert those pieces into (past,future) time series
* adds AMPM predicate runner - there's a bit of overlap with is12H, but it basically works
* changes a test case that was wrong before
* regenerates classifiers, I'm not sure why they changed exactly

Before:
```
res <- H.io $ let sentence = "10am thurs 4.30 thurs 12pm sat" in (debugTokens sentence $ analyze sentence (testContext {lang = EN}) HashSet.empty)
(15.50 secs, 6,171,188,928 bytes)

res <- H.io $ let sentence = "I have 9 am 12 pm 1 pm 2pm 4 pm 3 pm on Saturday" in (debugTokens sentence $ analyze sentence (testContext {lang = EN}) HashSet.empty)
(110.82 secs, 44,031,569,512 bytes)
```

After:
```
res <- H.io $ let sentence = "10am thurs 4.30 thurs 12pm sat" in (debugTokens sentence $ analyze sentence (testContext {lang = EN}) HashSet.empty)
(1.24 secs, 703,020,912 bytes)

res <- H.io $ let sentence = "I have 9 am 12 pm 1 pm 2pm 4 pm 3 pm on Saturday" in (debugTokens sentence $ analyze sentence (testContext {lang = EN}) HashSet.empty)
(9.51 secs, 5,891,109,592 bytes)
```

Reviewed By: JonCoens

Differential Revision: D4676812

fbshipit-source-id: 9810203
2017-03-13 17:04:10 -07:00
Julien Odent
fd80953407 Adding Feb tomorrow
Summary: .

Reviewed By: niteria

Differential Revision: D4700059

fbshipit-source-id: 3d63aa4
2017-03-13 14:04:22 -07:00
Julien Odent
2e50aa5ea0 Fix 'tomorrow July' + IT fixes
Summary:
* we weren't checking the right reference time in `takeNth` and `takeN`
* fixing resulting failing tests for `IT`
* `analyzedNTest` to check that input results in `n` parsed tokens

Reviewed By: niteria

Differential Revision: D4698788

fbshipit-source-id: 2cd4762
2017-03-13 12:04:17 -07:00
Bartosz Nitka
5f6c4fcec3 Make the license field more precise
Summary:
`cabal` is spewing this (it still successfully loads):
```
Warning: 'license: BSD' is not a recognised license. The known licenses are:
GPL, GPL-2, GPL-3, LGPL, LGPL-2.1, LGPL-3, AGPL, AGPL-3, BSD2, BSD3, MIT, ISC,
MPL-2.0, Apache, Apache-2.0, PublicDomain, AllRightsReserved, OtherLicense
```
Looking at the LICENSE file we have in the repo and the wikipedia page: https://en.wikipedia.org/wiki/BSD_licenses, it looks like we're using BSD3.

Reviewed By: patapizza

Differential Revision: D4697670

fbshipit-source-id: 6c80078
2017-03-13 06:04:10 -07:00
Julien Odent
161889c3e6 README.md + updating cabal
Summary:
* basic `README.md`
* updated `duckling.cabal`

Reviewed By: JonCoens

Differential Revision: D4691967

fbshipit-source-id: 0a5cdf7
2017-03-10 15:04:23 -08:00
Julien Odent
d5690f5e5e CONTRIBUTING.md
Summary:
https://our.intern.facebook.com/intern/dex/open-source/open-source-licenses/#a-contributing-template
Adapted https://github.com/facebook/bistro/blob/master/CONTRIBUTING.md for `Our Development Process`.
Test-driven workflow.

Reviewed By: JonCoens

Differential Revision: D4691472

fbshipit-source-id: d296c77
2017-03-10 14:49:18 -08:00
Julien Odent
ab06262291 Strip off TODO/FIXME
Summary: as the title says

Differential Revision: D4682120

fbshipit-source-id: 3f66286
2017-03-10 12:04:16 -08:00
Julien Odent
69aeff3a71 Fix st build
Summary: `RebindableSyntax` looks for `fromString` in scope.

Reviewed By: JonCoens

Differential Revision: D4675221

fbshipit-source-id: d7ff49d
2017-03-09 10:49:26 -08:00
FBShipIt
3f8e52e70a Initial commit
fbshipit-source-id: 301a10f448e9623aa1c953544f42de562909e192
2017-03-08 10:33:56 -08:00