fairseq/tests/gpu/transformer_quantization_config.yaml

29 lines
766 B
YAML
Raw Permalink Normal View History

quant noise code, readme, start of adding quantization (#1896) Summary: FUNCTIONALITY: This diff provides two core pieces of functionality - Adds training with quantization noise from "Training with Quantization Noise for Extreme Model Compression" - controlled by the "quant_noise" and "quant_noise_block_size" parameters. Added in embeddings, attention, FFN for BERT and Transformer LM training - Adds quantization with product quantization based on code from "And the bit goes down: Revisiting the quantization of neural networks" (Stock et al, 2019). This is applied to a fairseq trained model to quantize after training. TODO: -> Pierre, look at quantization code -> int4 and int8 quantization will be added soon. EVALUATED TEST CASES: 0. Training of LM and BERT models starts from scratch with no errors -> yes 1. Retrain LM from scratch with code, no quantization, reproduces Wikitext-103 LM results -> yes, see /checkpoint/angelafan/qn_open_source_noise 2. Reload previously trained LM from scratch, not trained with quant noise, reproduces Wikitext-103 LM results -> yes 3. Train LM from scratch with code, no trained with quant noise, reproduces Wikitext-103 LM results -> yes, see /checkpoint/angelafan/qn_open_source_baseline 4. Train BERT model from scratch with code, no quantization, training curve looks the same as before -> yes 5. Check wps during training and wps during inference, no large change from before -> yes 6. Check structured dropout isn't being applied at eval time -> yes 7. Works in combination with LayerDrop -> yes Pull Request resolved: https://github.com/pytorch/fairseq/pull/1896 Reviewed By: myleott Differential Revision: D20609420 Pulled By: huihuifan fbshipit-source-id: 94468dd811c4caaaef46a9fab2b8d381f9d2b955
2020-04-21 19:26:26 +03:00
# Copyright (c) Facebook, Inc. and its affiliates.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
# This file defines example configuration arguments for quantizing
# a transformer model with product quantization
n_centroids:
Linear:
key: in_features
value: {"*": 8}
Embedding:
key: embedding_dim
value: {"*": 8}
block_sizes:
Linear:
key: fuzzy_name
value: {fc: 8, attn: 4, emb: 4}
Embedding:
key: fuzzy_name
value: {emb: 8}
layers_to_quantize:
- decoder\\.layers\\.\d+\\.fc[12]
- decoder\\.embed_tokens\\.embeddings\\.[012]\\.[01]
- decoder\\.layers\\.\d+\\.self_attn\\.(k_proj|v_proj|q_proj|out_proj)