duckling/Duckling/Numeral/IT/Corpus.hs
Julien Odent ab0ad0256e Locales support
Summary:
* Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33)
* `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations
* API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility.
*  `Rules/<Lang>.hs` exposes
  - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs`
  - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs`
  - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility
* Corpus, tests & classifiers
  - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible)
  - Default classifiers are built on existing corpus
  - Locale classifiers are built on
  - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>`
  - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`).
  - Locale classifiers use the language corpus extended with the locale examples as training set.
  - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus.
  - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`.
  - Tests run against each classifier to make sure runtime works as expected.
* MM/DD (en_US) vs DD/MM (en_GB) example to illustrate

Reviewed By: JonCoens, blandinw

Differential Revision: D6038096

fbshipit-source-id: f29c28d
2017-10-13 08:34:21 -07:00

156 lines
3.5 KiB
Haskell

-- Copyright (c) 2016-present, Facebook, Inc.
-- All rights reserved.
--
-- This source code is licensed under the BSD-style license found in the
-- LICENSE file in the root directory of this source tree. An additional grant
-- of patent rights can be found in the PATENTS file in the same directory.
{-# LANGUAGE OverloadedStrings #-}
module Duckling.Numeral.IT.Corpus
( corpus ) where
import Prelude
import Data.String
import Duckling.Locale
import Duckling.Numeral.Types
import Duckling.Resolve
import Duckling.Testing.Types
corpus :: Corpus
corpus = (testContext {locale = makeLocale IT Nothing}, allExamples)
allExamples :: [Example]
allExamples = concat
[ examples (NumeralValue 0)
[ "0"
, "nulla"
, "zero"
]
, examples (NumeralValue 1)
[ "1"
, "uno"
, "Un"
]
, examples (NumeralValue 2)
[ "2"
, "due"
]
, examples (NumeralValue 3)
[ "3"
, "tre"
]
, examples (NumeralValue 4)
[ "4"
, "quattro"
]
, examples (NumeralValue 5)
[ "5"
, "cinque"
]
, examples (NumeralValue 6)
[ "6"
, "sei"
]
, examples (NumeralValue 7)
[ "7"
, "sette"
]
, examples (NumeralValue 8)
[ "8"
, "otto"
]
, examples (NumeralValue 9)
[ "9"
, "nove"
]
, examples (NumeralValue 10)
[ "10"
, "dieci"
]
, examples (NumeralValue 33)
[ "33"
, "trentatré"
, "0033"
]
, examples (NumeralValue 11)
[ "11"
, "Undici"
]
, examples (NumeralValue 12)
[ "12"
, "dodici"
]
, examples (NumeralValue 13)
[ "13"
, "tredici"
]
, examples (NumeralValue 14)
[ "14"
, "quattordici"
]
, examples (NumeralValue 15)
[ "15"
, "quindici"
]
, examples (NumeralValue 16)
[ "16"
, "sedici"
]
, examples (NumeralValue 17)
[ "17"
, "diciassette"
]
, examples (NumeralValue 18)
[ "18"
, "diciotto"
]
, examples (NumeralValue 19)
[ "19"
, "diciannove"
]
, examples (NumeralValue 20)
[ "20"
, "venti"
]
, examples (NumeralValue 1.1)
[ "1,1"
, "1,10"
, "01,10"
]
, examples (NumeralValue 0.77)
[ "0,77"
, ",77"
]
, examples (NumeralValue 100000)
[ "100.000"
, "100000"
, "100K"
, "100k"
]
, examples (NumeralValue 3000000)
[ "3M"
, "3000K"
, "3000000"
, "3.000.000"
]
, examples (NumeralValue 1200000)
[ "1.200.000"
, "1200000"
, "1,2M"
, "1200K"
, ",0012G"
]
, examples (NumeralValue (-1200000))
[ "- 1.200.000"
, "-1200000"
, "meno 1.200.000"
, "negativo 1200000"
, "-1,2M"
, "-1200K"
, "-,0012G"
]
]