fairseq/tests/gpu/transformer_quantization_config.yaml

# Copyright (c) Facebook, Inc. and its affiliates.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.

# This file defines example configuration arguments for quantizing
# a transformer model with product quantization

n_centroids:
    Linear:
        key: in_features
        value: {"*": 8}
    Embedding:
        key: embedding_dim
        value: {"*": 8}

block_sizes:
  Linear:
      key: fuzzy_name
      value: {fc: 8, attn: 4, emb: 4}
  Embedding:
      key: fuzzy_name
      value: {emb: 8}

layers_to_quantize:
    - decoder\\.layers\\.\d+\\.fc[12]
    - decoder\\.embed_tokens\\.embeddings\\.[012]\\.[01]
    - decoder\\.layers\\.\d+\\.self_attn\\.(k_proj|v_proj|q_proj|out_proj)
quant noise code, readme, start of adding quantization (#1896) Summary: FUNCTIONALITY: This diff provides two core pieces of functionality - Adds training with quantization noise from "Training with Quantization Noise for Extreme Model Compression" - controlled by the "quant_noise" and "quant_noise_block_size" parameters. Added in embeddings, attention, FFN for BERT and Transformer LM training - Adds quantization with product quantization based on code from "And the bit goes down: Revisiting the quantization of neural networks" (Stock et al, 2019). This is applied to a fairseq trained model to quantize after training. TODO: -> Pierre, look at quantization code -> int4 and int8 quantization will be added soon. EVALUATED TEST CASES: 0. Training of LM and BERT models starts from scratch with no errors -> yes 1. Retrain LM from scratch with code, no quantization, reproduces Wikitext-103 LM results -> yes, see /checkpoint/angelafan/qn_open_source_noise 2. Reload previously trained LM from scratch, not trained with quant noise, reproduces Wikitext-103 LM results -> yes 3. Train LM from scratch with code, no trained with quant noise, reproduces Wikitext-103 LM results -> yes, see /checkpoint/angelafan/qn_open_source_baseline 4. Train BERT model from scratch with code, no quantization, training curve looks the same as before -> yes 5. Check wps during training and wps during inference, no large change from before -> yes 6. Check structured dropout isn't being applied at eval time -> yes 7. Works in combination with LayerDrop -> yes Pull Request resolved: https://github.com/pytorch/fairseq/pull/1896 Reviewed By: myleott Differential Revision: D20609420 Pulled By: huihuifan fbshipit-source-id: 94468dd811c4caaaef46a9fab2b8d381f9d2b955 2020-04-21 19:26:26 +03:00			`# Copyright (c) Facebook, Inc. and its affiliates.`
			`#`
			`# This source code is licensed under the MIT license found in the`
			`# LICENSE file in the root directory of this source tree.`

			`# This file defines example configuration arguments for quantizing`
			`# a transformer model with product quantization`

			`n_centroids:`
			`Linear:`
			`key: in_features`
			`value: {"*": 8}`
			`Embedding:`
			`key: embedding_dim`
			`value: {"*": 8}`

			`block_sizes:`
			`Linear:`
			`key: fuzzy_name`
			`value: {fc: 8, attn: 4, emb: 4}`
			`Embedding:`
			`key: fuzzy_name`
			`value: {emb: 8}`

			`layers_to_quantize:`
			`- decoder\\.layers\\.\d+\\.fc[12]`
			`- decoder\\.embed_tokens\\.embeddings\\.[012]\\.[01]`
			`- decoder\\.layers\\.\d+\\.self_attn\\.(k_proj\|v_proj\|q_proj\|out_proj)`