Commit Graph

25 Commits

Author SHA1 Message Date
Daniel Cartwright
69d951220e Make isRangeValid take Lang as input
Summary: There are different implementations of isRangeValid that work well for different languages, thus it makes sense to facilitate having different implementations based on the language.

Reviewed By: patapizza

Differential Revision: D28362777

fbshipit-source-id: 5f2991d54af3095c8e95cf534e2dd3b4a34dee3a
2021-05-17 13:18:11 -07:00
Steven Troxler
323a7df023 Rearrange Engine.hs to top-down ordering
Summary:
Make the code reflect the call graph, which looks roughly like this:
```
parseAndResolve
  runDuckling
  resolveNode
  parseString
    saturateParseString
    parseString1
      matchFirst
         ... low level stuff
      matchFirstAnywhere
         ... low level stuff
```

I found the existing order pretty hard to untangle when I was writing some architecture notes on this module, I think the new ordering will help

Reviewed By: chessai

Differential Revision: D28441933

fbshipit-source-id: 07c722aa6d4038baa7f14fec84660ecc2736ed2e
2021-05-14 11:50:03 -07:00
Steven Troxler
9151f9e1ab Specify where the note on regex + text lives
Summary:
I spent a surprising amount of time trying to figure out what
this comment was referring to because it wasn't at all clear to me
that it meant a comment in another file. Making it more specific

Reviewed By: chessai

Differential Revision: D28411103

fbshipit-source-id: 26cd29b47367a7e0d865f616f289fef570544c39
2021-05-13 11:18:11 -07:00
Steven Troxler
eba5d0a825 Simple style fixes for outer layers around Engine.hs
Summary:
Easy style fixes for ExampleMain.hs, Debug.hs, Api.hs, Core.hs

Most of these are just lint fixes, but I also made a few not-just-lint changes
to conform to some elements of our style guide that I agree with:
- if the type signature doesn't fit on one line, then put one type per line
  with nothing on the first line, so that all types are vertically aligned - makes
  for a quick skim
- try to avoid mixing same-line function args with hanging function args: hang
  all arguments or none at all to get a more outline-like feel, again better for
  skimming

I was actually able to eliminate all errors for most of these modules - the name
collisions I usually give up on were manageable by hiding + easy variable renames

Reviewed By: chessai

Differential Revision: D28213246

fbshipit-source-id: 1f77d56f2ff8dccfd5f3b534f087c07047b92885
2021-05-06 08:54:56 -07:00
chessai
cdeefe1d4d ghc88x compat (#550)
Summary: Pull Request resolved: https://github.com/facebook/duckling/pull/550

Reviewed By: haoxuany

Differential Revision: D24844625

Pulled By: chessai

fbshipit-source-id: 52dcf5f9488386f7f407535e876bff1207823fe0
2020-11-12 13:47:46 -08:00
Julien Odent
bf89e34365 Relicense to BSD3
Reviewed By: JoelMarcey

Differential Revision: D15439223

fbshipit-source-id: c5af3cb06318748142fe503945b38beffadfc28a
2019-05-22 10:46:39 -07:00
Ziyang Liu
a3b35880e5 Change value in Entity to typed value instead of JSON
Summary:
Modified `Entity` to use the new `ResolvedVal` data type. Other changes follow naturally. Related issues: https://github.com/facebook/duckling/issues/121 and https://github.com/facebook/duckling/issues/172

Now one can pattern match on the output value, for instance:

```
{-# LANGUAGE GADTs #-}

import Data.Text
import Duckling.Core
import Duckling.Testing.Types
import qualified Duckling.PhoneNumber.Types as PN

parsePhoneNumber :: Text -> Text
parsePhoneNumber input =
  case value entity of
    (RVal PhoneNumber (PN.PhoneNumberValue v)) -> v
    where
    (entity:_) = parse input testContext testOptions [This PhoneNumber]
```

Reviewed By: patapizza

Differential Revision: D7502020

fbshipit-source-id: 76ba7b315cfd0d2c61ff95c855b7c95efc0a401c
2018-04-20 14:18:47 -07:00
Julien Odent
6d7d0ba354 fix build
Summary: `Duckling.Resolve.Options` clashes with `Data.Aeson.Options`.

Reviewed By: niteria

Differential Revision: D7337751

fbshipit-source-id: c9dc633301d45704dbb4975d1942fc5f360a4a44
2018-03-20 10:30:35 -07:00
Chinmay Deshmukh
5ac990bbe2 Return latent entities
Summary: Add an option to return latent time entities. This can be used when one is pretty certain that the input contains a datetime.

Reviewed By: patapizza

Differential Revision: D7254245

fbshipit-source-id: e9e0503cace2691804056fcebdc18fd9090fb181
2018-03-19 14:45:27 -07:00
Julien Odent
bef7a44fa8 Remove redundant brackets and language pragmas
Summary: .

Reviewed By: JonCoens

Differential Revision: D6838082

fbshipit-source-id: 94757bdb80c6d3c29a7a6554429940a1b7403108
2018-01-29 16:45:28 -08:00
Julien Odent
ab0ad0256e Locales support
Summary:
* Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33)
* `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations
* API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility.
*  `Rules/<Lang>.hs` exposes
  - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs`
  - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs`
  - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility
* Corpus, tests & classifiers
  - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible)
  - Default classifiers are built on existing corpus
  - Locale classifiers are built on
  - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>`
  - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`).
  - Locale classifiers use the language corpus extended with the locale examples as training set.
  - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus.
  - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`.
  - Tests run against each classifier to make sure runtime works as expected.
* MM/DD (en_US) vs DD/MM (en_GB) example to illustrate

Reviewed By: JonCoens, blandinw

Differential Revision: D6038096

fbshipit-source-id: f29c28d
2017-10-13 08:34:21 -07:00
Bartosz Nitka
74936df848 Make matching anywhere vs at pos obvious
Summary:
This change refactors the Engine to use a different
code path for when we're calling `lookupItem` to find
a first token `Node` matching the rule and a different
one for subsequent ones.

This division lets us get better invariants and more importantly
do full text regexp matches only when necessary.

This should be particularly useful for longer texts.

Reviewed By: patapizza

Differential Revision: D4953918

fbshipit-source-id: e3a69ad
2017-04-28 09:19:20 -07:00
Bartosz Nitka
c70cf6d38d Move Duckling.Stash to Duckling.Types.Stash
Summary: This is for consistency with Duckling.Types.Document

Reviewed By: patapizza

Differential Revision: D4948569

fbshipit-source-id: 459565a
2017-04-25 16:49:18 -07:00
Bartosz Nitka
8db73688d7 Move Document and helpers to a fresh module
Summary:
Document had its internal details leaked over 2 files.
This consolidates it.

It took a long time to make this perf neutral (now it's even a tiny
win), for reasons I don't completely understand.
The INLINE pragma on byteStringFromPos I semi-understand,
but I also had to move isRangeValid to Document and that's
a bit of a mystery.

Reviewed By: patapizza

Differential Revision: D4948449

fbshipit-source-id: ffb251a
2017-04-25 16:49:18 -07:00
Bartosz Nitka
924516103b Revert Duckling part of 'clean up unused imports'
Summary: it doesn't take .cabal into account

Reviewed By: patapizza

Differential Revision: D4938400

fbshipit-source-id: 8bc99a5
2017-04-24 07:34:27 -07:00
Bartosz Nitka
b26aa7d84d clean up unused imports
Summary:
This diff was generated by running `hsclimps`

PLEASE TAKE ONE OF THE FOLLOWING ACTIONS AS SOON AS POSSIBLE:
  1) Select Accept and Ship to land this change
  2) If you have issues with this diff, request changes
  3) If you are no longer the owner, add reviewers and update the `.context` file with the appropriate owner

NOTE: If the diff is unable to land because of a merge conflict I will automatically update it for you.

#accept2ship

Reviewed By: niteria

Differential Revision: D4937839

fbshipit-source-id: bb3d330
2017-04-24 05:19:24 -07:00
Bartosz Nitka
7f7cc70d72 Make first pass more obvious
Summary:
Separating out the first pass lets us avoid repeated filtering
and makes the structure of the algorithm a bit more clear.

Previously `Stash.null` was used as a test for being part of
the first pass or not, but that is a bit indirect. Encoding
the algorithm structure (the state automaton) as function calls
lets us make additional assumptions.

It also has a nice side effect of costs being attributed to
first/subsequent passes in the profile.

I also prepend to `matches` because it's likely to be bigger.

Reviewed By: patapizza

Differential Revision: D4922195

fbshipit-source-id: 0aec79f
2017-04-20 11:49:15 -07:00
Bartosz Nitka
e7aeef5436 Avoid allocations and encoding in regexp matching
Summary: The rationale is explained in a new Note.

Reviewed By: patapizza

Differential Revision: D4884104

fbshipit-source-id: 81f36ee
2017-04-14 12:19:21 -07:00
Bartosz Nitka
e37bb7c186 Duckling monad for Engine
Summary:
This converts the code to monadic style, so that
we can in the future:
* stop threading the `Document` parameter everywhere
* keep some state, like regexp match cache (I've already checked that it makes a substantial difference)

There should be no difference in performance or behavior
at this point.

Reviewed By: patapizza

Differential Revision: D4778808

fbshipit-source-id: a167ed8
2017-03-31 14:19:40 -07:00
Bartosz Nitka
b108ab260f Allocate less in lookupRegexp
Summary:
Contrary to my intuitions this part is the lion share
of allocations in `lookupRegexp`. I'd have expected `Text`
operations to dwarf it.

It's a bit doubious that we build such big lists that it
matters, perhaps in the future we can explore limiting the
number of matches considered.

Reviewed By: patapizza

Differential Revision: D4745711

fbshipit-source-id: ebdc1aa
2017-03-21 09:19:18 -07:00
Bartosz Nitka
56a039eef1 Optimize isRangeValid
Summary:
`isRangeValid` was doing lots of random indexing inside a Text.
Since we already have a convenient O(1), indexable `Vector Char`
we can just use it instead.

Reviewed By: patapizza

Differential Revision: D4744297

fbshipit-source-id: b23011b
2017-03-21 08:49:16 -07:00
Bartosz Nitka
58bf36b9f4 Optimize isAdjacent
Summary:
`isAdjacent` was doing a ton of useless copies and
redundant work. But pre-computing a `firstNonAdjacent` table
we can answer every `isAdjacent` query in `O(1)` time and
(almost?) no allocations.

It may be a symptom of algorithmic problems, but we shouldn't
make it more expensive than it needs to be.

Reviewed By: patapizza

Differential Revision: D4744172

fbshipit-source-id: dd70be2
2017-03-21 07:34:24 -07:00
Bartosz Nitka
26b1327bcd Make Document type abstract
Summary:
This will let me do smarter things on document construction,
like precomputing where all the whitespace is so that
I can answer `isAdjacent` in O(1) time.

If I'm measuring things right my next diff will cut down
allocations 4x on problematic inputs.

Reviewed By: patapizza

Differential Revision: D4742664

fbshipit-source-id: 7e14e25
2017-03-20 20:49:24 -07:00
Julien Odent
ab06262291 Strip off TODO/FIXME
Summary: as the title says

Differential Revision: D4682120

fbshipit-source-id: 3f66286
2017-03-10 12:04:16 -08:00
FBShipIt
3f8e52e70a Initial commit
fbshipit-source-id: 301a10f448e9623aa1c953544f42de562909e192
2017-03-08 10:33:56 -08:00