duckling/Duckling/Locale.hs

151 lines
2.5 KiB
Haskell
Raw Normal View History

Locales support Summary: * Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33) * `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations * API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility. * `Rules/<Lang>.hs` exposes - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs` - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs` - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility * Corpus, tests & classifiers - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible) - Default classifiers are built on existing corpus - Locale classifiers are built on - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>` - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`). - Locale classifiers use the language corpus extended with the locale examples as training set. - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus. - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`. - Tests run against each classifier to make sure runtime works as expected. * MM/DD (en_US) vs DD/MM (en_GB) example to illustrate Reviewed By: JonCoens, blandinw Differential Revision: D6038096 fbshipit-source-id: f29c28d
2017-10-13 18:15:32 +03:00
-- Copyright (c) 2016-present, Facebook, Inc.
-- All rights reserved.
--
-- This source code is licensed under the BSD-style license found in the
-- LICENSE file in the root directory of this source tree.
Locales support Summary: * Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33) * `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations * API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility. * `Rules/<Lang>.hs` exposes - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs` - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs` - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility * Corpus, tests & classifiers - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible) - Default classifiers are built on existing corpus - Locale classifiers are built on - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>` - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`). - Locale classifiers use the language corpus extended with the locale examples as training set. - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus. - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`. - Tests run against each classifier to make sure runtime works as expected. * MM/DD (en_US) vs DD/MM (en_GB) example to illustrate Reviewed By: JonCoens, blandinw Differential Revision: D6038096 fbshipit-source-id: f29c28d
2017-10-13 18:15:32 +03:00
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE NoRebindableSyntax #-}
module Duckling.Locale
( Lang(..)
, Locale(..)
, Region
( AU
, BE
, BZ
, CL
, CN
, CO
, EG
, GB
, HK
, IE
, IN
, JM
, MO
, MX
, NZ
, PE
, PH
, TT
, TW
, US
, VE
, ZA
)
Locales support Summary: * Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33) * `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations * API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility. * `Rules/<Lang>.hs` exposes - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs` - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs` - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility * Corpus, tests & classifiers - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible) - Default classifiers are built on existing corpus - Locale classifiers are built on - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>` - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`). - Locale classifiers use the language corpus extended with the locale examples as training set. - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus. - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`. - Tests run against each classifier to make sure runtime works as expected. * MM/DD (en_US) vs DD/MM (en_GB) example to illustrate Reviewed By: JonCoens, blandinw Differential Revision: D6038096 fbshipit-source-id: f29c28d
2017-10-13 18:15:32 +03:00
, allLocales
, makeLocale
) where
import Data.HashMap.Strict (HashMap)
import qualified Data.HashMap.Strict as HashMap
Locales support Summary: * Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33) * `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations * API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility. * `Rules/<Lang>.hs` exposes - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs` - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs` - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility * Corpus, tests & classifiers - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible) - Default classifiers are built on existing corpus - Locale classifiers are built on - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>` - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`). - Locale classifiers use the language corpus extended with the locale examples as training set. - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus. - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`. - Tests run against each classifier to make sure runtime works as expected. * MM/DD (en_US) vs DD/MM (en_GB) example to illustrate Reviewed By: JonCoens, blandinw Differential Revision: D6038096 fbshipit-source-id: f29c28d
2017-10-13 18:15:32 +03:00
import Data.HashSet (HashSet)
import qualified Data.HashSet as HashSet
import Data.Hashable
Locales support Summary: * Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33) * `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations * API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility. * `Rules/<Lang>.hs` exposes - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs` - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs` - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility * Corpus, tests & classifiers - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible) - Default classifiers are built on existing corpus - Locale classifiers are built on - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>` - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`). - Locale classifiers use the language corpus extended with the locale examples as training set. - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus. - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`. - Tests run against each classifier to make sure runtime works as expected. * MM/DD (en_US) vs DD/MM (en_GB) example to illustrate Reviewed By: JonCoens, blandinw Differential Revision: D6038096 fbshipit-source-id: f29c28d
2017-10-13 18:15:32 +03:00
import GHC.Generics
import Prelude
import TextShow (TextShow)
import qualified TextShow as TS
import Duckling.Region hiding
( AR
, CA
, ES
, NL
)
import qualified Duckling.Region as R
( Region
( AR
, CA
, ES
, NL
)
)
Locales support Summary: * Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33) * `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations * API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility. * `Rules/<Lang>.hs` exposes - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs` - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs` - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility * Corpus, tests & classifiers - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible) - Default classifiers are built on existing corpus - Locale classifiers are built on - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>` - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`). - Locale classifiers use the language corpus extended with the locale examples as training set. - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus. - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`. - Tests run against each classifier to make sure runtime works as expected. * MM/DD (en_US) vs DD/MM (en_GB) example to illustrate Reviewed By: JonCoens, blandinw Differential Revision: D6038096 fbshipit-source-id: f29c28d
2017-10-13 18:15:32 +03:00
-- | ISO 639-1 Language.
-- See https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
data Lang
= AF
| AR
Locales support Summary: * Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33) * `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations * API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility. * `Rules/<Lang>.hs` exposes - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs` - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs` - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility * Corpus, tests & classifiers - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible) - Default classifiers are built on existing corpus - Locale classifiers are built on - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>` - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`). - Locale classifiers use the language corpus extended with the locale examples as training set. - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus. - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`. - Tests run against each classifier to make sure runtime works as expected. * MM/DD (en_US) vs DD/MM (en_GB) example to illustrate Reviewed By: JonCoens, blandinw Differential Revision: D6038096 fbshipit-source-id: f29c28d
2017-10-13 18:15:32 +03:00
| BG
| BN
| CA
Locales support Summary: * Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33) * `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations * API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility. * `Rules/<Lang>.hs` exposes - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs` - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs` - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility * Corpus, tests & classifiers - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible) - Default classifiers are built on existing corpus - Locale classifiers are built on - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>` - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`). - Locale classifiers use the language corpus extended with the locale examples as training set. - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus. - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`. - Tests run against each classifier to make sure runtime works as expected. * MM/DD (en_US) vs DD/MM (en_GB) example to illustrate Reviewed By: JonCoens, blandinw Differential Revision: D6038096 fbshipit-source-id: f29c28d
2017-10-13 18:15:32 +03:00
| CS
| DA
| DE
| EL
Locales support Summary: * Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33) * `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations * API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility. * `Rules/<Lang>.hs` exposes - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs` - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs` - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility * Corpus, tests & classifiers - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible) - Default classifiers are built on existing corpus - Locale classifiers are built on - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>` - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`). - Locale classifiers use the language corpus extended with the locale examples as training set. - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus. - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`. - Tests run against each classifier to make sure runtime works as expected. * MM/DD (en_US) vs DD/MM (en_GB) example to illustrate Reviewed By: JonCoens, blandinw Differential Revision: D6038096 fbshipit-source-id: f29c28d
2017-10-13 18:15:32 +03:00
| EN
| ES
| ET
| FA
| FI
Locales support Summary: * Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33) * `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations * API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility. * `Rules/<Lang>.hs` exposes - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs` - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs` - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility * Corpus, tests & classifiers - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible) - Default classifiers are built on existing corpus - Locale classifiers are built on - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>` - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`). - Locale classifiers use the language corpus extended with the locale examples as training set. - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus. - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`. - Tests run against each classifier to make sure runtime works as expected. * MM/DD (en_US) vs DD/MM (en_GB) example to illustrate Reviewed By: JonCoens, blandinw Differential Revision: D6038096 fbshipit-source-id: f29c28d
2017-10-13 18:15:32 +03:00
| FR
| GA
| HE
| HI
Locales support Summary: * Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33) * `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations * API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility. * `Rules/<Lang>.hs` exposes - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs` - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs` - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility * Corpus, tests & classifiers - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible) - Default classifiers are built on existing corpus - Locale classifiers are built on - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>` - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`). - Locale classifiers use the language corpus extended with the locale examples as training set. - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus. - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`. - Tests run against each classifier to make sure runtime works as expected. * MM/DD (en_US) vs DD/MM (en_GB) example to illustrate Reviewed By: JonCoens, blandinw Differential Revision: D6038096 fbshipit-source-id: f29c28d
2017-10-13 18:15:32 +03:00
| HR
| HU
| ID
| IS
Locales support Summary: * Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33) * `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations * API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility. * `Rules/<Lang>.hs` exposes - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs` - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs` - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility * Corpus, tests & classifiers - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible) - Default classifiers are built on existing corpus - Locale classifiers are built on - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>` - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`). - Locale classifiers use the language corpus extended with the locale examples as training set. - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus. - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`. - Tests run against each classifier to make sure runtime works as expected. * MM/DD (en_US) vs DD/MM (en_GB) example to illustrate Reviewed By: JonCoens, blandinw Differential Revision: D6038096 fbshipit-source-id: f29c28d
2017-10-13 18:15:32 +03:00
| IT
| JA
| KA
| KN
| KM
Locales support Summary: * Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33) * `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations * API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility. * `Rules/<Lang>.hs` exposes - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs` - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs` - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility * Corpus, tests & classifiers - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible) - Default classifiers are built on existing corpus - Locale classifiers are built on - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>` - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`). - Locale classifiers use the language corpus extended with the locale examples as training set. - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus. - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`. - Tests run against each classifier to make sure runtime works as expected. * MM/DD (en_US) vs DD/MM (en_GB) example to illustrate Reviewed By: JonCoens, blandinw Differential Revision: D6038096 fbshipit-source-id: f29c28d
2017-10-13 18:15:32 +03:00
| KO
| LO
| ML
| MN
Locales support Summary: * Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33) * `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations * API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility. * `Rules/<Lang>.hs` exposes - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs` - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs` - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility * Corpus, tests & classifiers - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible) - Default classifiers are built on existing corpus - Locale classifiers are built on - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>` - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`). - Locale classifiers use the language corpus extended with the locale examples as training set. - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus. - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`. - Tests run against each classifier to make sure runtime works as expected. * MM/DD (en_US) vs DD/MM (en_GB) example to illustrate Reviewed By: JonCoens, blandinw Differential Revision: D6038096 fbshipit-source-id: f29c28d
2017-10-13 18:15:32 +03:00
| MY
| NB
| NE
Locales support Summary: * Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33) * `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations * API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility. * `Rules/<Lang>.hs` exposes - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs` - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs` - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility * Corpus, tests & classifiers - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible) - Default classifiers are built on existing corpus - Locale classifiers are built on - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>` - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`). - Locale classifiers use the language corpus extended with the locale examples as training set. - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus. - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`. - Tests run against each classifier to make sure runtime works as expected. * MM/DD (en_US) vs DD/MM (en_GB) example to illustrate Reviewed By: JonCoens, blandinw Differential Revision: D6038096 fbshipit-source-id: f29c28d
2017-10-13 18:15:32 +03:00
| NL
| PL
| PT
| RO
| RU
| SK
Locales support Summary: * Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33) * `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations * API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility. * `Rules/<Lang>.hs` exposes - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs` - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs` - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility * Corpus, tests & classifiers - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible) - Default classifiers are built on existing corpus - Locale classifiers are built on - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>` - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`). - Locale classifiers use the language corpus extended with the locale examples as training set. - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus. - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`. - Tests run against each classifier to make sure runtime works as expected. * MM/DD (en_US) vs DD/MM (en_GB) example to illustrate Reviewed By: JonCoens, blandinw Differential Revision: D6038096 fbshipit-source-id: f29c28d
2017-10-13 18:15:32 +03:00
| SV
| SW
| TA
| TE
| TH
Locales support Summary: * Locales support for the library, following `<Lang>_<Region>` with ISO 639-1 code for `<Lang>` and ISO 3166-1 alpha-2 code for `<Region>` (#33) * `Locale` opaque type (composite of `Lang` and `Region`) with `makeLocale` smart constructor to only allow valid `(Lang, Region)` combinations * API: `Context`'s `lang` parameter has been replaced by `locale`, with optional `Region` and backward compatibility. * `Rules/<Lang>.hs` exposes - `langRules`: cross-locale rules for `<Lang>`, from `<Dimension>/<Lang>/Rules.hs` - `localeRules`: locale-specific rules, from `<Dimension>/<Lang>/<Region>/Rules.hs` - `defaultRules`: `langRules` + specific rules from select locales to ensure backward-compatibility * Corpus, tests & classifiers - 1 classifier per locale, with default classifier (`<Lang>_XX`) when no locale provided (backward-compatible) - Default classifiers are built on existing corpus - Locale classifiers are built on - `<Dimension>/<Lang>/Corpus.hs` exposes a common `corpus` to all locales of `<Lang>` - `<Dimension>/<Lang>/<Region>/Corpus.hs` exposes `allExamples`: a list of examples specific to the locale (following `<Dimension>/<Lang>/<Region>/Rules.hs`). - Locale classifiers use the language corpus extended with the locale examples as training set. - Locale examples need to use the same `Context` (i.e. reference time) as the language corpus. - For backward compatibility, `<Dimension>/<Lang>/Corpus.hs` can expose also `defaultCorpus`, which is `corpus` augmented with specific examples. This is controlled by `getDefaultCorpusForLang` in `Duckling.Ranking.Generate`. - Tests run against each classifier to make sure runtime works as expected. * MM/DD (en_US) vs DD/MM (en_GB) example to illustrate Reviewed By: JonCoens, blandinw Differential Revision: D6038096 fbshipit-source-id: f29c28d
2017-10-13 18:15:32 +03:00
| TR
| UK
| VI
| ZH
deriving (Bounded, Enum, Eq, Generic, Hashable, Ord, Read, Show)
instance TextShow Lang where
showb = TS.fromString . show
data Locale = Locale Lang (Maybe Region)
deriving (Eq, Generic, Hashable, Ord)
instance Show Locale where
show (Locale lang Nothing) = show lang ++ "_XX"
show (Locale lang (Just region)) = show lang ++ "_" ++ show region
instance TextShow Locale where
showb = TS.fromString . show
makeLocale :: Lang -> Maybe Region -> Locale
makeLocale lang Nothing = Locale lang Nothing
makeLocale lang (Just region)
| HashSet.member region locales = Locale lang $ Just region
| otherwise = Locale lang Nothing
where
locales = HashMap.lookupDefault HashSet.empty lang allLocales
allLocales :: HashMap Lang (HashSet Region)
allLocales =
HashMap.fromList
[ (AR, HashSet.fromList [EG])
, (EN, HashSet.fromList [AU, BZ, R.CA, GB, IN, IE, JM, NZ, PH, ZA, TT, US])
, (ES, HashSet.fromList [R.AR, CL, CO, R.ES, MX, PE, VE])
, (NL, HashSet.fromList [BE, R.NL])
, (ZH, HashSet.fromList [CN, HK, MO, TW])
]