duckling

mirror of https://github.com/facebook/duckling.git synced 2024-12-26 13:46:19 +03:00

Author	SHA1	Message	Date
Bartosz Nitka	8db73688d7	Move Document and helpers to a fresh module Summary: Document had its internal details leaked over 2 files. This consolidates it. It took a long time to make this perf neutral (now it's even a tiny win), for reasons I don't completely understand. The INLINE pragma on byteStringFromPos I semi-understand, but I also had to move isRangeValid to Document and that's a bit of a mystery. Reviewed By: patapizza Differential Revision: D4948449 fbshipit-source-id: ffb251a	2017-04-25 16:49:18 -07:00
Bartosz Nitka	924516103b	Revert Duckling part of 'clean up unused imports' Summary: it doesn't take .cabal into account Reviewed By: patapizza Differential Revision: D4938400 fbshipit-source-id: 8bc99a5	2017-04-24 07:34:27 -07:00
Julien Odent	dbe9e73541	Duration Summary: Duration dimension for Hebrew. Reviewed By: niteria Differential Revision: D4930403 fbshipit-source-id: 690db8f	2017-04-24 06:49:40 -07:00
Julien Odent	efa38401b5	TimeGrain Summary: TimeGrain dimension for Hebrew. Reviewed By: niteria Differential Revision: D4930294 fbshipit-source-id: 9c0f0da	2017-04-24 06:49:40 -07:00
Julien Odent	f5f4889770	Ordinal Summary: Ordinal dimension for Hebrew. Reviewed By: niteria Differential Revision: D4930162 fbshipit-source-id: 02545ae	2017-04-24 06:49:40 -07:00
Julien Odent	bd96d3dd95	Setup + Numeral Summary: Setup for Hebrew + Numeral dimension Reviewed By: niteria Differential Revision: D4930041 fbshipit-source-id: 965132b	2017-04-24 06:49:40 -07:00
Bartosz Nitka	b26aa7d84d	clean up unused imports Summary: This diff was generated by running `hsclimps` PLEASE TAKE ONE OF THE FOLLOWING ACTIONS AS SOON AS POSSIBLE: 1) Select Accept and Ship to land this change 2) If you have issues with this diff, request changes 3) If you are no longer the owner, add reviewers and update the `.context` file with the appropriate owner NOTE: If the diff is unable to land because of a merge conflict I will automatically update it for you. #accept2ship Reviewed By: niteria Differential Revision: D4937839 fbshipit-source-id: bb3d330	2017-04-24 05:19:24 -07:00
Bartosz Nitka	7f7cc70d72	Make first pass more obvious Summary: Separating out the first pass lets us avoid repeated filtering and makes the structure of the algorithm a bit more clear. Previously `Stash.null` was used as a test for being part of the first pass or not, but that is a bit indirect. Encoding the algorithm structure (the state automaton) as function calls lets us make additional assumptions. It also has a nice side effect of costs being attributed to first/subsequent passes in the profile. I also prepend to `matches` because it's likely to be bigger. Reviewed By: patapizza Differential Revision: D4922195 fbshipit-source-id: 0aec79f	2017-04-20 11:49:15 -07:00
Bartosz Nitka	878f85b9e1	Codemod intersectMB to intersect Summary: `intersectMB` was a name used for the purpose of migrating. This is the last part of the migration. Reviewed By: patapizza Differential Revision: D4906098 fbshipit-source-id: a70af78	2017-04-18 10:19:20 -07:00
Bartosz Nitka	fe39a55a4c	Use intervalMB instead of interval Summary: This continues the work from: "[Duckling] Don't produce trivially empty Tokens" All the Rules should use intervalMB from now on. Reviewed By: patapizza Differential Revision: D4906072 fbshipit-source-id: 277b961	2017-04-18 10:19:20 -07:00
Bartosz Nitka	a91e787bb7	Derive Eq, Show for TimeIntervalType Summary: This is always useful to have. Reviewed By: patapizza Differential Revision: D4864208 fbshipit-source-id: b879893	2017-04-18 08:19:20 -07:00
Bartosz Nitka	879b103ca3	Fix indexing problems with new regexp matcher Summary: My change had a couple of problems: * utf8 character width logic was completely wrong for characters that need 3 or 4 bytes * `Array.listArray (start, end)` produces an array where `end` is a valid index * because of ^ the `arraySize` logic also has to change Reviewed By: watashi, darshankapashi Differential Revision: D4894355 fbshipit-source-id: 8d07dfd	2017-04-14 15:49:17 -07:00
Bartosz Nitka	e7aeef5436	Avoid allocations and encoding in regexp matching Summary: The rationale is explained in a new Note. Reviewed By: patapizza Differential Revision: D4884104 fbshipit-source-id: 81f36ee	2017-04-14 12:19:21 -07:00
Bartosz Nitka	3d18cf5ea9	Don't produce trivially empty Tokens Summary: We can detect certain kinds of contradictions sooner, producing a token with an unresolvable Predicate is wasteful. For a text like: ``` "Demain apres midi 14h 15 h 16h vendredi 14 a 15h" ``` it could produce 7000 tokens with empty predicates. After this change it produces none and we get a 4x improvement in time and 6x improvement in allocations. Note I only covered `ruleIntersect*` here. I need to do this for other instances as well. Reviewed By: JonCoens Differential Revision: D4871078 fbshipit-source-id: 9f0e7ad	2017-04-11 16:35:05 -07:00
Kevin Cros	62bc5a317b	Using hashmap look up instead of 'case of' Summary: Updating regex with hashmap look ups. Reviewed By: patapizza Differential Revision: D4848178 fbshipit-source-id: 4d5ded8	2017-04-11 11:04:20 -07:00
ADAM LIU	928139569c	Refactor of Duckling.Numeral.TR to hashmap lookup Summary: Update of TR Rules hashmap Reviewed By: patapizza Differential Revision: D4860819 fbshipit-source-id: 6f5a722	2017-04-11 09:34:23 -07:00
Bartosz Nitka	f7b3f2ed73	Detect interval contradictions sooner Summary: So far contradictions from intersection only propagated through intersection. This change makes it so that it also propagates through intervals and lets intervals also generate contradictions. Reviewed By: patapizza Differential Revision: D4864160 fbshipit-source-id: 8348267	2017-04-10 16:35:27 -07:00
Bartosz Nitka	1cf8496967	tt helper for returning Time Tokens Summary: This is a very common pattern (>1k occurrences). Replacing it with something shorter makes the rules a bit less boilerplate-y. Feel free to bikeshed the name, I can easily redo the codemod. Reviewed By: patapizza Differential Revision: D4848864 fbshipit-source-id: 7baeee3	2017-04-10 12:34:43 -07:00
Bartosz Nitka	f46539ced2	Type for Closed/Open intervals Summary: This makes the code easier to read. I'm not attached to naming, but this is standard terminology from topology. Reviewed By: JonCoens, patapizza Differential Revision: D4848740 fbshipit-source-id: 79c2c20	2017-04-07 12:19:17 -07:00
Jonathan Coens	b3ca32104d	Simple example HTTP server Summary: Runs a `snap` server to return the support targets as well as do parsing. It's a bit cludgy, but gets the job done. Reviewed By: patapizza Differential Revision: D4813197 fbshipit-source-id: 0fa165b	2017-04-06 17:04:48 -07:00
ADAM LIU	572ff95adf	Update RU Rules HashMap lookups update Summary: Update of RU Rules hashmap Reviewed By: patapizza Differential Revision: D4840947 fbshipit-source-id: 00cb679	2017-04-06 15:49:17 -07:00
Bartosz Nitka	78ecaa3728	Derive NFData for Entity Summary: This makes benchmarking easier. Reviewed By: JonCoens Differential Revision: D4846839 fbshipit-source-id: 9cc8dfa	2017-04-06 15:34:43 -07:00
Bartosz Nitka	290ca48e25	Fix 4:23am returning 5:23am Summary: This is the easiest way to fix it, but talking offline with Julien, we may need to revisit. It basically gets rid of time series where we were producing intervals that are not a multiply of the grain. Reviewed By: patapizza Differential Revision: D4841759 fbshipit-source-id: 1c4742a	2017-04-06 11:04:16 -07:00
Amelia Wilson	70ef9b1bbe	using hashmap lookups Summary: converting large regex lookups to hashmap lookups in Duckling/Numeral/FR/Rules.hs and Duckling/Ordinal/FR/Rules.hs Reviewed By: patapizza Differential Revision: D4836336 fbshipit-source-id: 2241a3a	2017-04-05 12:20:10 -07:00
Jonathan Coens	7c47431ce5	Upgrade to stackage 8.8 Summary: Just a little bounds bump Reviewed By: patapizza Differential Revision: D4835536 fbshipit-source-id: d51fbb8	2017-04-05 11:19:31 -07:00
Jonathan Coens	e2da9bc7fb	Upgrade to stackage 8.6 Summary: Moves to the 8.6 resolver, updates package limits, and fixes errors due to upgrade. Reviewed By: patapizza Differential Revision: D4810924 fbshipit-source-id: c8a64a9	2017-04-04 15:19:41 -07:00
Bartosz Nitka	e37bb7c186	Duckling monad for Engine Summary: This converts the code to monadic style, so that we can in the future: * stop threading the `Document` parameter everywhere * keep some state, like regexp match cache (I've already checked that it makes a substantial difference) There should be no difference in performance or behavior at this point. Reviewed By: patapizza Differential Revision: D4778808 fbshipit-source-id: a167ed8	2017-03-31 14:19:40 -07:00
Julien Odent	78228dea83	Update email Summary: Setup the correct email. Reviewed By: JonCoens Differential Revision: D4806876 fbshipit-source-id: a52f9f8	2017-03-30 16:20:08 -07:00
Bartosz Nitka	a1917a53f3	Make sure regen is rebuilt Summary: `stack exe/RegenMain.hs` uses runghc which is a tool we don't test with often. Making sure the executable is rebuilt and using it should be enough. Reviewed By: patapizza Differential Revision: D4783844 fbshipit-source-id: 459dbc4	2017-03-28 07:49:19 -07:00
Bartosz Nitka	bd94622f64	Move tests to tests and exes to exe Summary: This works around https://github.com/haskell/cabal/issues/4350 If we don't do this files get compiled multiple times and cabal is unhappy. Reviewed By: patapizza Differential Revision: D4782749 fbshipit-source-id: 5bbe425	2017-03-27 16:04:24 -07:00
Christian Bell	02e74cacd6	HashMap lookups for large regexes Summary: Use HashMaps to speed up string pattern matching for UK (Ukranian). Reviewed By: patapizza Differential Revision: D4747195 fbshipit-source-id: e582dba	2017-03-22 08:49:17 -07:00
Julien Odent	96f365e927	Expose toName Summary: . Reviewed By: niteria Differential Revision: D4753842 fbshipit-source-id: 2e88e86	2017-03-22 08:19:19 -07:00
Bartosz Nitka	b108ab260f	Allocate less in lookupRegexp Summary: Contrary to my intuitions this part is the lion share of allocations in `lookupRegexp`. I'd have expected `Text` operations to dwarf it. It's a bit doubious that we build such big lists that it matters, perhaps in the future we can explore limiting the number of matches considered. Reviewed By: patapizza Differential Revision: D4745711 fbshipit-source-id: ebdc1aa	2017-03-21 09:19:18 -07:00
Bartosz Nitka	56a039eef1	Optimize isRangeValid Summary: `isRangeValid` was doing lots of random indexing inside a Text. Since we already have a convenient O(1), indexable `Vector Char` we can just use it instead. Reviewed By: patapizza Differential Revision: D4744297 fbshipit-source-id: b23011b	2017-03-21 08:49:16 -07:00
Bartosz Nitka	58bf36b9f4	Optimize isAdjacent Summary: `isAdjacent` was doing a ton of useless copies and redundant work. But pre-computing a `firstNonAdjacent` table we can answer every `isAdjacent` query in `O(1)` time and (almost?) no allocations. It may be a symptom of algorithmic problems, but we shouldn't make it more expensive than it needs to be. Reviewed By: patapizza Differential Revision: D4744172 fbshipit-source-id: dd70be2	2017-03-21 07:34:24 -07:00
Bartosz Nitka	26b1327bcd	Make Document type abstract Summary: This will let me do smarter things on document construction, like precomputing where all the whitespace is so that I can answer `isAdjacent` in O(1) time. If I'm measuring things right my next diff will cut down allocations 4x on problematic inputs. Reviewed By: patapizza Differential Revision: D4742664 fbshipit-source-id: 7e14e25	2017-03-20 20:49:24 -07:00
Bartosz Nitka	09acefbcf5	Make Show Dimension "law-abiding" Summary: `Show` should print things close to source level representation. I wanted to generate some tests from inputs that cause problems and there was no way to get source level representation of Dimension. Reviewed By: patapizza Differential Revision: D4723711 fbshipit-source-id: fff658d	2017-03-16 16:34:16 -07:00
Julien Odent	e76cee3a6d	Rename Finance to AmountOfMoney Summary: Because it makes more sense. Reviewed By: JonCoens Differential Revision: D4721646 fbshipit-source-id: 449bfb4	2017-03-16 14:49:44 -07:00
Julien Odent	54c9448fba	Rename Number to Numeral Summary: For consistency with the dimension name. Reviewed By: JonCoens Differential Revision: D4722216 fbshipit-source-id: 82c56d3	2017-03-16 13:49:16 -07:00
Julien Odent	33fa98734a	Fix 'no dia 20' Summary: * 'no dia 20' (on the 20) * Unifying two rules into one, with a day grain See https://github.com/wit-ai/wit/issues/388 Reviewed By: blandinw Differential Revision: D4715780 fbshipit-source-id: e990954	2017-03-15 13:49:17 -07:00
Julien Odent	1c98c0308c	Fix Some in README Summary: #accept2ship Reviewed By: niteria Differential Revision: D4715804 fbshipit-source-id: d53ca9a	2017-03-15 13:19:36 -07:00
Jonathan Coens	41800a3171	Move onto dependent-sum instead of custom local data Some Summary: No need to reinvent the wheel when `dependent-sum` has what we need. I re-export `Some(..)` from `Duckling.Dimensions.Types` to cut down on import bloat. Instead of a `Read` instance I created a `fromName` function. Reviewed By: zilberstein Differential Revision: D4710014 fbshipit-source-id: 1d4e86d	2017-03-15 10:34:17 -07:00
Bartosz Nitka	d23ae54ab9	.gitignore .stack-work Summary: stack creates this directory, we should prevent it from being commited. Reviewed By: JonCoens Differential Revision: D4713790 fbshipit-source-id: 34b723d	2017-03-15 10:04:30 -07:00
Bartosz Nitka	1a251d8e42	Use HashMap.lookupDefault Summary: This is a small stylystic improvement. Reviewed By: patapizza Differential Revision: D4713463 fbshipit-source-id: 47720d3	2017-03-15 08:19:11 -07:00
Julien Odent	1edf62f347	Adding logo Summary: happy_duck Reviewed By: niteria Differential Revision: D4713395 fbshipit-source-id: dd1c141	2017-03-15 08:04:31 -07:00
Julien Odent	ea80ab07d3	Update maintainer email Summary: . Reviewed By: niteria Differential Revision: D4713313 fbshipit-source-id: 4fbeabb	2017-03-15 07:49:12 -07:00
Julien Odent	cc016bb178	Refactoring + return domain Summary: * Simplified `Url` to only keep track of what we need (we can change back later) * Normalize domain: remove subdomains like `www`, `www2` and lower case * Return the full domain in the JSON value field * Updated offensive url example Reviewed By: JonCoens Differential Revision: D4705403 fbshipit-source-id: e5d11ee	2017-03-14 13:49:20 -07:00
Jonathan Coens	1b91b70c58	codemod DNumber to Numeral Summary: `DNumber` is a terrible name and was only there because legacy. `Numeral` makes more sense for this dimension, so let's use that instead. Reviewed By: patapizza Differential Revision: D4707167 fbshipit-source-id: cd78aa3	2017-03-14 13:34:11 -07:00
Bartosz Nitka	ec39c21593	Make the regexp less dangerous Summary: The current regexp matches sequences of numbers of unbounded length with lots of backtracking. Since phone numbers are shorter than X=20 characters we can put a bound on every currently unbounded match. Additionally we can use groups that don't capture, to avoid marshalling data that we won't need. Reviewed By: JonCoens Differential Revision: D4706862 fbshipit-source-id: 39ca9bb	2017-03-14 12:19:12 -07:00
Julien Odent	2f4ecfba08	Update README Summary: Doc to extend existing dimension/language support Reviewed By: JonCoens Differential Revision: D4706035 fbshipit-source-id: a8ecca4	2017-03-14 11:34:11 -07:00

1 2 3

112 Commits