duckling

mirror of https://github.com/facebook/duckling.git synced 2024-12-19 10:01:43 +03:00

Author	SHA1	Message	Date
Bartosz Nitka	290ca48e25	Fix 4:23am returning 5:23am Summary: This is the easiest way to fix it, but talking offline with Julien, we may need to revisit. It basically gets rid of time series where we were producing intervals that are not a multiply of the grain. Reviewed By: patapizza Differential Revision: D4841759 fbshipit-source-id: 1c4742a	2017-04-06 11:04:16 -07:00
Amelia Wilson	70ef9b1bbe	using hashmap lookups Summary: converting large regex lookups to hashmap lookups in Duckling/Numeral/FR/Rules.hs and Duckling/Ordinal/FR/Rules.hs Reviewed By: patapizza Differential Revision: D4836336 fbshipit-source-id: 2241a3a	2017-04-05 12:20:10 -07:00
Jonathan Coens	7c47431ce5	Upgrade to stackage 8.8 Summary: Just a little bounds bump Reviewed By: patapizza Differential Revision: D4835536 fbshipit-source-id: d51fbb8	2017-04-05 11:19:31 -07:00
Jonathan Coens	e2da9bc7fb	Upgrade to stackage 8.6 Summary: Moves to the 8.6 resolver, updates package limits, and fixes errors due to upgrade. Reviewed By: patapizza Differential Revision: D4810924 fbshipit-source-id: c8a64a9	2017-04-04 15:19:41 -07:00
Bartosz Nitka	e37bb7c186	Duckling monad for Engine Summary: This converts the code to monadic style, so that we can in the future: * stop threading the `Document` parameter everywhere * keep some state, like regexp match cache (I've already checked that it makes a substantial difference) There should be no difference in performance or behavior at this point. Reviewed By: patapizza Differential Revision: D4778808 fbshipit-source-id: a167ed8	2017-03-31 14:19:40 -07:00
Julien Odent	78228dea83	Update email Summary: Setup the correct email. Reviewed By: JonCoens Differential Revision: D4806876 fbshipit-source-id: a52f9f8	2017-03-30 16:20:08 -07:00
Bartosz Nitka	a1917a53f3	Make sure regen is rebuilt Summary: `stack exe/RegenMain.hs` uses runghc which is a tool we don't test with often. Making sure the executable is rebuilt and using it should be enough. Reviewed By: patapizza Differential Revision: D4783844 fbshipit-source-id: 459dbc4	2017-03-28 07:49:19 -07:00
Bartosz Nitka	bd94622f64	Move tests to tests and exes to exe Summary: This works around https://github.com/haskell/cabal/issues/4350 If we don't do this files get compiled multiple times and cabal is unhappy. Reviewed By: patapizza Differential Revision: D4782749 fbshipit-source-id: 5bbe425	2017-03-27 16:04:24 -07:00
Christian Bell	02e74cacd6	HashMap lookups for large regexes Summary: Use HashMaps to speed up string pattern matching for UK (Ukranian). Reviewed By: patapizza Differential Revision: D4747195 fbshipit-source-id: e582dba	2017-03-22 08:49:17 -07:00
Julien Odent	96f365e927	Expose toName Summary: . Reviewed By: niteria Differential Revision: D4753842 fbshipit-source-id: 2e88e86	2017-03-22 08:19:19 -07:00
Bartosz Nitka	b108ab260f	Allocate less in lookupRegexp Summary: Contrary to my intuitions this part is the lion share of allocations in `lookupRegexp`. I'd have expected `Text` operations to dwarf it. It's a bit doubious that we build such big lists that it matters, perhaps in the future we can explore limiting the number of matches considered. Reviewed By: patapizza Differential Revision: D4745711 fbshipit-source-id: ebdc1aa	2017-03-21 09:19:18 -07:00
Bartosz Nitka	56a039eef1	Optimize isRangeValid Summary: `isRangeValid` was doing lots of random indexing inside a Text. Since we already have a convenient O(1), indexable `Vector Char` we can just use it instead. Reviewed By: patapizza Differential Revision: D4744297 fbshipit-source-id: b23011b	2017-03-21 08:49:16 -07:00
Bartosz Nitka	58bf36b9f4	Optimize isAdjacent Summary: `isAdjacent` was doing a ton of useless copies and redundant work. But pre-computing a `firstNonAdjacent` table we can answer every `isAdjacent` query in `O(1)` time and (almost?) no allocations. It may be a symptom of algorithmic problems, but we shouldn't make it more expensive than it needs to be. Reviewed By: patapizza Differential Revision: D4744172 fbshipit-source-id: dd70be2	2017-03-21 07:34:24 -07:00
Bartosz Nitka	26b1327bcd	Make Document type abstract Summary: This will let me do smarter things on document construction, like precomputing where all the whitespace is so that I can answer `isAdjacent` in O(1) time. If I'm measuring things right my next diff will cut down allocations 4x on problematic inputs. Reviewed By: patapizza Differential Revision: D4742664 fbshipit-source-id: 7e14e25	2017-03-20 20:49:24 -07:00
Bartosz Nitka	09acefbcf5	Make Show Dimension "law-abiding" Summary: `Show` should print things close to source level representation. I wanted to generate some tests from inputs that cause problems and there was no way to get source level representation of Dimension. Reviewed By: patapizza Differential Revision: D4723711 fbshipit-source-id: fff658d	2017-03-16 16:34:16 -07:00
Julien Odent	e76cee3a6d	Rename Finance to AmountOfMoney Summary: Because it makes more sense. Reviewed By: JonCoens Differential Revision: D4721646 fbshipit-source-id: 449bfb4	2017-03-16 14:49:44 -07:00
Julien Odent	54c9448fba	Rename Number to Numeral Summary: For consistency with the dimension name. Reviewed By: JonCoens Differential Revision: D4722216 fbshipit-source-id: 82c56d3	2017-03-16 13:49:16 -07:00
Julien Odent	33fa98734a	Fix 'no dia 20' Summary: * 'no dia 20' (on the 20) * Unifying two rules into one, with a day grain See https://github.com/wit-ai/wit/issues/388 Reviewed By: blandinw Differential Revision: D4715780 fbshipit-source-id: e990954	2017-03-15 13:49:17 -07:00
Julien Odent	1c98c0308c	Fix Some in README Summary: #accept2ship Reviewed By: niteria Differential Revision: D4715804 fbshipit-source-id: d53ca9a	2017-03-15 13:19:36 -07:00
Jonathan Coens	41800a3171	Move onto dependent-sum instead of custom local data Some Summary: No need to reinvent the wheel when `dependent-sum` has what we need. I re-export `Some(..)` from `Duckling.Dimensions.Types` to cut down on import bloat. Instead of a `Read` instance I created a `fromName` function. Reviewed By: zilberstein Differential Revision: D4710014 fbshipit-source-id: 1d4e86d	2017-03-15 10:34:17 -07:00
Bartosz Nitka	d23ae54ab9	.gitignore .stack-work Summary: stack creates this directory, we should prevent it from being commited. Reviewed By: JonCoens Differential Revision: D4713790 fbshipit-source-id: 34b723d	2017-03-15 10:04:30 -07:00
Bartosz Nitka	1a251d8e42	Use HashMap.lookupDefault Summary: This is a small stylystic improvement. Reviewed By: patapizza Differential Revision: D4713463 fbshipit-source-id: 47720d3	2017-03-15 08:19:11 -07:00
Julien Odent	1edf62f347	Adding logo Summary: happy_duck Reviewed By: niteria Differential Revision: D4713395 fbshipit-source-id: dd1c141	2017-03-15 08:04:31 -07:00
Julien Odent	ea80ab07d3	Update maintainer email Summary: . Reviewed By: niteria Differential Revision: D4713313 fbshipit-source-id: 4fbeabb	2017-03-15 07:49:12 -07:00
Julien Odent	cc016bb178	Refactoring + return domain Summary: * Simplified `Url` to only keep track of what we need (we can change back later) * Normalize domain: remove subdomains like `www`, `www2` and lower case * Return the full domain in the JSON value field * Updated offensive url example Reviewed By: JonCoens Differential Revision: D4705403 fbshipit-source-id: e5d11ee	2017-03-14 13:49:20 -07:00
Jonathan Coens	1b91b70c58	codemod DNumber to Numeral Summary: `DNumber` is a terrible name and was only there because legacy. `Numeral` makes more sense for this dimension, so let's use that instead. Reviewed By: patapizza Differential Revision: D4707167 fbshipit-source-id: cd78aa3	2017-03-14 13:34:11 -07:00
Bartosz Nitka	ec39c21593	Make the regexp less dangerous Summary: The current regexp matches sequences of numbers of unbounded length with lots of backtracking. Since phone numbers are shorter than X=20 characters we can put a bound on every currently unbounded match. Additionally we can use groups that don't capture, to avoid marshalling data that we won't need. Reviewed By: JonCoens Differential Revision: D4706862 fbshipit-source-id: 39ca9bb	2017-03-14 12:19:12 -07:00
Julien Odent	2f4ecfba08	Update README Summary: Doc to extend existing dimension/language support Reviewed By: JonCoens Differential Revision: D4706035 fbshipit-source-id: a8ecca4	2017-03-14 11:34:11 -07:00
Julien Odent	483ad4a191	OverloadedStrings for Debug Summary: #accept2ship Reviewed By: niteria Differential Revision: D4705625 fbshipit-source-id: 1245858	2017-03-14 08:34:11 -07:00
Bartosz Nitka	28d53fce30	Remove ruleIntersect2 Summary: It is no longer necessary after D4676812 and D4698788. `"I have 9 am 12 pm 1 pm 2pm 4 pm 3 pm on Saturday"` now works in less than a second, it used to be 10s. The test suite also got 3s faster. Reviewed By: patapizza Differential Revision: D4701890 fbshipit-source-id: 107a55f	2017-03-14 05:04:12 -07:00
Zejun Wu	3001604548	Clean redudant parentheses to test landcastle Summary: Clean redudant parentheses to test landcastle opt-out-review Differential Revision: D4703203 verified-sandcastle fbshipit-source-id: def175d	2017-03-13 18:19:24 -07:00
Bartosz Nitka	003604dce7	Optimize simple time predicates Summary: This is the next step for: https://fb.facebook.com/groups/527352907463243/permalink/600056483526218/ This: * changes the time language to be able to track contradictions (`EmptyPredicate`) * changes the time language to be able to collect non-contradicting pieces, like month and hour and unify them * provides an efficient way to convert those pieces into (past,future) time series * adds AMPM predicate runner - there's a bit of overlap with is12H, but it basically works * changes a test case that was wrong before * regenerates classifiers, I'm not sure why they changed exactly Before: ``` res <- H.io $ let sentence = "10am thurs 4.30 thurs 12pm sat" in (debugTokens sentence $ analyze sentence (testContext {lang = EN}) HashSet.empty) (15.50 secs, 6,171,188,928 bytes) res <- H.io $ let sentence = "I have 9 am 12 pm 1 pm 2pm 4 pm 3 pm on Saturday" in (debugTokens sentence $ analyze sentence (testContext {lang = EN}) HashSet.empty) (110.82 secs, 44,031,569,512 bytes) ``` After: ``` res <- H.io $ let sentence = "10am thurs 4.30 thurs 12pm sat" in (debugTokens sentence $ analyze sentence (testContext {lang = EN}) HashSet.empty) (1.24 secs, 703,020,912 bytes) res <- H.io $ let sentence = "I have 9 am 12 pm 1 pm 2pm 4 pm 3 pm on Saturday" in (debugTokens sentence $ analyze sentence (testContext {lang = EN}) HashSet.empty) (9.51 secs, 5,891,109,592 bytes) ``` Reviewed By: JonCoens Differential Revision: D4676812 fbshipit-source-id: 9810203	2017-03-13 17:04:10 -07:00
Julien Odent	fd80953407	Adding Feb tomorrow Summary: . Reviewed By: niteria Differential Revision: D4700059 fbshipit-source-id: 3d63aa4	2017-03-13 14:04:22 -07:00
Julien Odent	2e50aa5ea0	Fix 'tomorrow July' + IT fixes Summary: * we weren't checking the right reference time in `takeNth` and `takeN` * fixing resulting failing tests for `IT` * `analyzedNTest` to check that input results in `n` parsed tokens Reviewed By: niteria Differential Revision: D4698788 fbshipit-source-id: 2cd4762	2017-03-13 12:04:17 -07:00
Bartosz Nitka	5f6c4fcec3	Make the license field more precise Summary: `cabal` is spewing this (it still successfully loads): ``` Warning: 'license: BSD' is not a recognised license. The known licenses are: GPL, GPL-2, GPL-3, LGPL, LGPL-2.1, LGPL-3, AGPL, AGPL-3, BSD2, BSD3, MIT, ISC, MPL-2.0, Apache, Apache-2.0, PublicDomain, AllRightsReserved, OtherLicense ``` Looking at the LICENSE file we have in the repo and the wikipedia page: https://en.wikipedia.org/wiki/BSD_licenses, it looks like we're using BSD3. Reviewed By: patapizza Differential Revision: D4697670 fbshipit-source-id: 6c80078	2017-03-13 06:04:10 -07:00
Julien Odent	161889c3e6	README.md + updating cabal Summary: * basic `README.md` * updated `duckling.cabal` Reviewed By: JonCoens Differential Revision: D4691967 fbshipit-source-id: 0a5cdf7	2017-03-10 15:04:23 -08:00
Julien Odent	d5690f5e5e	CONTRIBUTING.md Summary: https://our.intern.facebook.com/intern/dex/open-source/open-source-licenses/#a-contributing-template Adapted https://github.com/facebook/bistro/blob/master/CONTRIBUTING.md for `Our Development Process`. Test-driven workflow. Reviewed By: JonCoens Differential Revision: D4691472 fbshipit-source-id: d296c77	2017-03-10 14:49:18 -08:00
Julien Odent	ab06262291	Strip off TODO/FIXME Summary: as the title says Differential Revision: D4682120 fbshipit-source-id: 3f66286	2017-03-10 12:04:16 -08:00
Julien Odent	69aeff3a71	Fix st build Summary: `RebindableSyntax` looks for `fromString` in scope. Reviewed By: JonCoens Differential Revision: D4675221 fbshipit-source-id: d7ff49d	2017-03-09 10:49:26 -08:00
FBShipIt	3f8e52e70a	Initial commit fbshipit-source-id: 301a10f448e9623aa1c953544f42de562909e192	2017-03-08 10:33:56 -08:00

... 11 12 13 14 15

740 Commits