duckling/Duckling/Numeral/UK/Corpus.hs
Julien Odent ab0ad0256e Locales support
Summary:
* Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33)
* `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations
* API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility.
*  `Rules/<Lang>.hs` exposes
  - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs`
  - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs`
  - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility
* Corpus, tests & classifiers
  - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible)
  - Default classifiers are built on existing corpus
  - Locale classifiers are built on
  - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>`
  - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`).
  - Locale classifiers use the language corpus extended with the locale examples as training set.
  - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus.
  - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`.
  - Tests run against each classifier to make sure runtime works as expected.
* MM/DD (en_US) vs DD/MM (en_GB) example to illustrate

Reviewed By: JonCoens, blandinw

Differential Revision: D6038096

fbshipit-source-id: f29c28d
2017-10-13 08:34:21 -07:00

115 lines
2.8 KiB
Haskell
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

-- Copyright (c) 2016-present, Facebook, Inc.
-- All rights reserved.
--
-- This source code is licensed under the BSD-style license found in the
-- LICENSE file in the root directory of this source tree. An additional grant
-- of patent rights can be found in the PATENTS file in the same directory.
{-# LANGUAGE OverloadedStrings #-}
module Duckling.Numeral.UK.Corpus
( corpus ) where
import Prelude
import Data.String
import Duckling.Locale
import Duckling.Numeral.Types
import Duckling.Resolve
import Duckling.Testing.Types
corpus :: Corpus
corpus = (testContext {locale = makeLocale UK Nothing}, allExamples)
allExamples :: [Example]
allExamples = concat
[ examples (NumeralValue 0)
[ "0"
, "нуль"
]
, examples (NumeralValue 1)
[ "1"
, "один"
]
, examples (NumeralValue 2)
[ "2"
, "02"
, "два"
]
, examples (NumeralValue 3)
[ "3"
, "три"
, "03"
]
, examples (NumeralValue 4)
[ "4"
, "чотири"
, "04"
]
, examples (NumeralValue 5)
[ "п‘ять"
, "5"
, "05"
]
, examples (NumeralValue 33)
[ "33"
, "тридцять три"
, "0033"
]
, examples (NumeralValue 14)
[ "14"
, "чотирнадцять"
]
, examples (NumeralValue 16)
[ "16"
, "шістнадцять"
]
, examples (NumeralValue 17)
[ "17"
, "сімнадцять"
]
, examples (NumeralValue 18)
[ "18"
, "вісімнадцять"
]
, examples (NumeralValue 525)
[ "п‘ятсот двадцять п‘ять"
, "525"
]
, examples (NumeralValue 1.1)
[ "1.1"
, "1.10"
, "01.10"
, "1 крапка 1"
, "один крапка один"
]
, examples (NumeralValue 0.77)
[ "0.77"
, ".77"
]
, examples (NumeralValue 100000)
[ "100000"
, "100к"
, "100К"
]
, examples (NumeralValue 3000000)
[ "3М"
, "3000К"
, "3000000"
]
, examples (NumeralValue 1200000)
[ "1200000"
, "1.2М"
, "1200К"
, ".0012Г"
]
, examples (NumeralValue (-1200000))
[ "-1200000"
, "мінус 1200000"
, "-1.2М"
, "-1200К"
, "-.0012Г"
]
]