restrict dimensions to only those specified (#625)

Summary:
Resolves https://github.com/facebook/duckling/issues/624

Before patch (specifying quantity and numeral, but time still shows up):
```
❯ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_US&text="June 21 and 3 cups of sugar"&dims="[\"quantity\",\"numeral\"]"' | jq
[
  {
    "body": "June 21",
    "start": 1,
    "value": {
      "values": [
        {
          "value": "2021-06-21T00:00:00.000-07:00",
          "grain": "day",
          "type": "value"
        },
        {
          "value": "2022-06-21T00:00:00.000-07:00",
          "grain": "day",
          "type": "value"
        },
        {
          "value": "2023-06-21T00:00:00.000-07:00",
          "grain": "day",
          "type": "value"
        }
      ],
      "value": "2021-06-21T00:00:00.000-07:00",
      "grain": "day",
      "type": "value"
    },
    "end": 8,
    "dim": "time",
    "latent": false
  },
  {
    "body": "3 cups of sugar",
    "start": 13,
    "value": {
      "value": 3,
      "type": "value",
      "product": "sugar",
      "unit": "cup"
    },
    "end": 28,
    "dim": "quantity",
    "latent": false
  }
]
```

After patch (time no longer shows up):
```
❯ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_US&text="June 21 and 3 cups of sugar"&dims="[\"quantity\",\"numeral\"]"' | jq
[
  {
    "body": "3 cups of sugar",
    "start": 13,
    "value": {
      "value": 3,
      "type": "value",
      "product": "sugar",
      "unit": "cup"
    },
    "end": 28,
    "dim": "quantity",
    "latent": false
  }
]
```

Pull Request resolved: https://github.com/facebook/duckling/pull/625

Reviewed By: stroxler

Differential Revision: D28851759

Pulled By: chessai

fbshipit-source-id: d3b3f33092c7e60bf29886939488ed562a213c35
This commit is contained in:
chessai 2021-06-03 10:22:54 -07:00 committed by Facebook GitHub Bot
parent 878beb7aa1
commit 99e1dce9c4
2 changed files with 16 additions and 5 deletions

View File

@ -35,9 +35,9 @@ $ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_GB&text=tomorrow at ei
In the example application, all dimensions are enabled by default. Provide the parameter `dims` to specify which ones you want. Examples:
```
Identify credit card numbers only:
$ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_US&text="4111-1111-1111-1111"&dims="[\"credit-card-number\"]"'
$ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_US&text="4111-1111-1111-1111"&dims="["credit-card-number"]"'
If you want multiple dimensions, comma-separate them in the array:
$ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_US&text="3 cups of sugar"&dims="[\"quantity\",\"numeral\"]"'
$ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_US&text="3 cups of sugar"&dims="["quantity","numeral"]"'
```
See `exe/ExampleMain.hs` for an example on how to integrate Duckling in your

View File

@ -27,6 +27,7 @@ import System.Environment (lookupEnv)
import TextShow
import Text.Read (readMaybe)
import qualified Data.ByteString.Lazy as LBS
import qualified Data.ByteString.Lazy.Char8 as LBS8
import qualified Data.HashMap.Strict as HashMap
import qualified Data.Text as Text
import qualified Data.Text.Encoding as Text
@ -109,6 +110,7 @@ parseHandler tzs = do
Just tx -> do
let timezone = parseTimeZone tz
now <- liftIO $ currentReftime tzs timezone
let
lang = parseLang l
@ -118,9 +120,18 @@ parseHandler tzs = do
}
options = Options {withLatent = parseLatent latent}
cleanupDims =
LBS8.filter (/= '\\') -- strip out escape chars people throw in
. stripSuffix "\"" -- remove trailing double quote
. stripPrefix "\"" -- remote leading double quote
where
stripSuffix suffix str = fromMaybe str $ LBS.stripSuffix suffix str
stripPrefix prefix str = fromMaybe str $ LBS.stripPrefix prefix str
dims = fromMaybe (allDimensions lang) $ do
queryDims <- ds
txtDims <- decode @[Text] $ LBS.fromStrict queryDims
queryDims <- fmap (cleanupDims . LBS.fromStrict) ds
txtDims <- decode @[Text] queryDims
pure $ mapMaybe parseDimension txtDims
parsedResult = parse (Text.decodeUtf8 tx) context options dims
@ -138,7 +149,7 @@ parseHandler tzs = do
fromCustomName :: Text -> Maybe (Seal Dimension)
fromCustomName name = HashMap.lookup name m
m = HashMap.fromList
[ -- ("my-dimension", This (CustomDimension MyDimension))
[ -- ("my-dimension", Seal (CustomDimension MyDimension))
]
parseTimeZone :: Maybe ByteString -> Text