rust-bert/utils/convert_model.py

import argparse
import numpy as np
import subprocess
import sys
import torch

from pathlib import Path
from torch import Tensor

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "source_file", help="Absolute path to the Pytorch weights file to convert"
    )
    parser.add_argument(
        "--skip_embeddings",
        action="store_true",
        help="Skip shared embeddings",
    )
    parser.add_argument(
        "--skip_lm_head", action="store_true", help="Skip language model head"
    )
    parser.add_argument("--prefix", help="Add a prefix on weight names")
    parser.add_argument(
        "--suffix",
        action="store_true",
        help="Split weight names on '.' and keep only last part",
    )
    parser.add_argument(
        "--dtype",
        help="Convert weights to a specific numpy DataType (float32, float16, ...)",
    )
    args = parser.parse_args()

    source_file = Path(args.source_file)
    target_folder = source_file.parent

    weights = torch.load(str(source_file), map_location="cpu")

    nps = {}
    for k, v in weights.items():
        k = k.replace("gamma", "weight").replace("beta", "bias")
        if args.skip_embeddings:
            if k in {
                "model.encoder.embed_tokens.weight",
                "encoder.embed_tokens.weight",
                "model.decoder.embed_tokens.weight",
                "decoder.embed_tokens.weight",
            }:
                continue
        if args.skip_lm_head:
            if k in {
                "lm_head.weight",
            }:
                continue
        if args.prefix:
            k = args.prefix + k
        if args.suffix:
            k = k.split(".")[-1]
        if isinstance(v, Tensor):
            tensor = v.cpu().numpy()
            if args.dtype is not None:
                nps[k] = np.ascontiguousarray(tensor.astype(np.dtype(args.dtype)))
            else:
                nps[k] = np.ascontiguousarray(tensor)
            print(f"converted {k} - {str(sys.getsizeof(nps[k]))} bytes")
        else:
            print(f"skipped non-tensor object: {k}")
    np.savez(target_folder / "model.npz", **nps)

    source = str(target_folder / "model.npz")
    target = str(target_folder / "rust_model.ot")

    toml_location = (Path(__file__).resolve() / ".." / ".." / "Cargo.toml").resolve()
    subprocess.run(
        [
            "cargo",
            "run",
            "--bin=convert-tensor",
            "--manifest-path=%s" % toml_location,
            "--",
            source,
            target,
        ],
    )
Add GPT-J support (#285) (#288) * Add GPT-J support (#285) * Improve GPT-J implementation * Improve GPT-J tests * Adapt GPT-J to latest master branch * Specify how to convert GPT-J weights instead of providing them 2023-02-15 22:10:47 +03:00			`import argparse`
Addition of generic conversion script 2020-12-04 18:35:13 +03:00			`import numpy as np`
			`import subprocess`
MBart validation, weights updated 2021-06-05 12:47:56 +03:00			`import sys`
Add GPT-J support (#285) (#288) * Add GPT-J support (#285) * Improve GPT-J implementation * Improve GPT-J tests * Adapt GPT-J to latest master branch * Specify how to convert GPT-J weights instead of providing them 2023-02-15 22:10:47 +03:00			`import torch`
Addition of generic conversion script 2020-12-04 18:35:13 +03:00
Add GPT-J support (#285) (#288) * Add GPT-J support (#285) * Improve GPT-J implementation * Improve GPT-J tests * Adapt GPT-J to latest master branch * Specify how to convert GPT-J weights instead of providing them 2023-02-15 22:10:47 +03:00			`from pathlib import Path`
Updated DeBERTa configuration parsing 2021-11-28 14:43:35 +03:00			`from torch import Tensor`

Updated README (addition of benchmarks and file conversion) 2021-02-02 19:51:39 +03:00			`if __name__ == "__main__":`
			`parser = argparse.ArgumentParser()`
Add GPT-J support (#285) (#288) * Add GPT-J support (#285) * Improve GPT-J implementation * Improve GPT-J tests * Adapt GPT-J to latest master branch * Specify how to convert GPT-J weights instead of providing them 2023-02-15 22:10:47 +03:00			`parser.add_argument(`
			`"source_file", help="Absolute path to the Pytorch weights file to convert"`
			`)`
			`parser.add_argument(`
			`"--skip_embeddings",`
			`action="store_true",`
			`help="Skip shared embeddings",`
			`)`
			`parser.add_argument(`
			`"--skip_lm_head", action="store_true", help="Skip language model head"`
			`)`
Add sbert implementation for inference (#250) * Add sbert implementation for inference * Fix clippy warnings * Refactor sentence embeddings into a dedicated pipeline * Add output_attentions and output_hidden_states to T5Config * Add sbert implementation for inference * Fix clippy warnings * Refactor sentence embeddings into a dedicated pipeline * Add output_attentions and output_hidden_states to T5Config * Improve sentence_embeddings implementation * Dedicated tokenizer config for strip_accents and add_prefix_space * Rename forward to encode_as_tensor * Remove _conf from Dense layer * Add sentence embeddings docs * Addition of remote resources and tests update * Merge feature branch and fix doctests * Add SentenceEmbeddingsBuilder<Remote> and improve remote resources * Use tch::no_grad in sentence embeddings * Updated changelog, registration of sentence embeddings integration tests Co-authored-by: Guillaume Becquin <guillaume.becquin@gmail.com> 2022-06-21 22:24:09 +03:00			`parser.add_argument("--prefix", help="Add a prefix on weight names")`
Add GPT-J support (#285) (#288) * Add GPT-J support (#285) * Improve GPT-J implementation * Improve GPT-J tests * Adapt GPT-J to latest master branch * Specify how to convert GPT-J weights instead of providing them 2023-02-15 22:10:47 +03:00			`parser.add_argument(`
			`"--suffix",`
			`action="store_true",`
			`help="Split weight names on '.' and keep only last part",`
			`)`
			`parser.add_argument(`
			`"--dtype",`
			`help="Convert weights to a specific numpy DataType (float32, float16, ...)",`
			`)`
Updated README (addition of benchmarks and file conversion) 2021-02-02 19:51:39 +03:00			`args = parser.parse_args()`
Addition of generic conversion script 2020-12-04 18:35:13 +03:00
Updated README (addition of benchmarks and file conversion) 2021-02-02 19:51:39 +03:00			`source_file = Path(args.source_file)`
			`target_folder = source_file.parent`
Addition of generic conversion script 2020-12-04 18:35:13 +03:00
Add GPT-J support (#285) (#288) * Add GPT-J support (#285) * Improve GPT-J implementation * Improve GPT-J tests * Adapt GPT-J to latest master branch * Specify how to convert GPT-J weights instead of providing them 2023-02-15 22:10:47 +03:00			`weights = torch.load(str(source_file), map_location="cpu")`
Addition of generic conversion script 2020-12-04 18:35:13 +03:00
Updated README (addition of benchmarks and file conversion) 2021-02-02 19:51:39 +03:00			`nps = {}`
			`for k, v in weights.items():`
			`k = k.replace("gamma", "weight").replace("beta", "bias")`
Reuse gpt2 embeddings (#160) * Updated GPT2 to re-use embeddings for LM head * Updated conversion utilities * Updated changelog 2021-06-12 12:11:34 +03:00			`if args.skip_embeddings:`
Long t5 implementation (#333) * LongT5 config implementation * LongT5 WiP: utility functions 1 * LongT5 WiP: utility functions (2) * LongT5 WiP: utility functions (3) * LongT5 WiP: utility functions (4) * made T5 FF activations generic, expose T5 modules to crate * Longt% local attention WIP * LongT5 local attention * LongT5 global attention WIP * LongT5 global attention * LongT5 attention modules (WIP) * align LongT5 position bias with T5 * Addition of LongT5Block * LongT5Stack WiP * LongT5Stack implementation * LongT5Model implementation * LongT5ForConditionalGeneration implementation * Addition of LongT5Generator, inclusion in pipelines * LongT5 attention fixes * Fix MIN/MAX dtype computation, mask for longt5 * Updated min/max and infinity computation across models * GlobalTransient attention fixes * Updated changelog, readme, tests, clippy 2023-02-12 19:18:20 +03:00			`if k in {`
			`"model.encoder.embed_tokens.weight",`
			`"encoder.embed_tokens.weight",`
			`"model.decoder.embed_tokens.weight",`
Add GPT-J support (#285) (#288) * Add GPT-J support (#285) * Improve GPT-J implementation * Improve GPT-J tests * Adapt GPT-J to latest master branch * Specify how to convert GPT-J weights instead of providing them 2023-02-15 22:10:47 +03:00			`"decoder.embed_tokens.weight",`
Long t5 implementation (#333) * LongT5 config implementation * LongT5 WiP: utility functions 1 * LongT5 WiP: utility functions (2) * LongT5 WiP: utility functions (3) * LongT5 WiP: utility functions (4) * made T5 FF activations generic, expose T5 modules to crate * Longt% local attention WIP * LongT5 local attention * LongT5 global attention WIP * LongT5 global attention * LongT5 attention modules (WIP) * align LongT5 position bias with T5 * Addition of LongT5Block * LongT5Stack WiP * LongT5Stack implementation * LongT5Model implementation * LongT5ForConditionalGeneration implementation * Addition of LongT5Generator, inclusion in pipelines * LongT5 attention fixes * Fix MIN/MAX dtype computation, mask for longt5 * Updated min/max and infinity computation across models * GlobalTransient attention fixes * Updated changelog, readme, tests, clippy 2023-02-12 19:18:20 +03:00			`}:`
			`continue`
			`if args.skip_lm_head:`
			`if k in {`
			`"lm_head.weight",`
			`}:`
Reuse gpt2 embeddings (#160) * Updated GPT2 to re-use embeddings for LM head * Updated conversion utilities * Updated changelog 2021-06-12 12:11:34 +03:00			`continue`
Add sbert implementation for inference (#250) * Add sbert implementation for inference * Fix clippy warnings * Refactor sentence embeddings into a dedicated pipeline * Add output_attentions and output_hidden_states to T5Config * Add sbert implementation for inference * Fix clippy warnings * Refactor sentence embeddings into a dedicated pipeline * Add output_attentions and output_hidden_states to T5Config * Improve sentence_embeddings implementation * Dedicated tokenizer config for strip_accents and add_prefix_space * Rename forward to encode_as_tensor * Remove _conf from Dense layer * Add sentence embeddings docs * Addition of remote resources and tests update * Merge feature branch and fix doctests * Add SentenceEmbeddingsBuilder<Remote> and improve remote resources * Use tch::no_grad in sentence embeddings * Updated changelog, registration of sentence embeddings integration tests Co-authored-by: Guillaume Becquin <guillaume.becquin@gmail.com> 2022-06-21 22:24:09 +03:00			`if args.prefix:`
			`k = args.prefix + k`
			`if args.suffix:`
Add GPT-J support (#285) (#288) * Add GPT-J support (#285) * Improve GPT-J implementation * Improve GPT-J tests * Adapt GPT-J to latest master branch * Specify how to convert GPT-J weights instead of providing them 2023-02-15 22:10:47 +03:00			`k = k.split(".")[-1]`
Updated DeBERTa configuration parsing 2021-11-28 14:43:35 +03:00			`if isinstance(v, Tensor):`
Add GPT-J support (#285) (#288) * Add GPT-J support (#285) * Improve GPT-J implementation * Improve GPT-J tests * Adapt GPT-J to latest master branch * Specify how to convert GPT-J weights instead of providing them 2023-02-15 22:10:47 +03:00			`tensor = v.cpu().numpy()`
			`if args.dtype is not None:`
			`nps[k] = np.ascontiguousarray(tensor.astype(np.dtype(args.dtype)))`
			`else:`
			`nps[k] = np.ascontiguousarray(tensor)`
			`print(f"converted {k} - {str(sys.getsizeof(nps[k]))} bytes")`
Updated DeBERTa configuration parsing 2021-11-28 14:43:35 +03:00			`else:`
Add GPT-J support (#285) (#288) * Add GPT-J support (#285) * Improve GPT-J implementation * Improve GPT-J tests * Adapt GPT-J to latest master branch * Specify how to convert GPT-J weights instead of providing them 2023-02-15 22:10:47 +03:00			`print(f"skipped non-tensor object: {k}")`
			`np.savez(target_folder / "model.npz", **nps)`
Addition of generic conversion script 2020-12-04 18:35:13 +03:00
Add GPT-J support (#285) (#288) * Add GPT-J support (#285) * Improve GPT-J implementation * Improve GPT-J tests * Adapt GPT-J to latest master branch * Specify how to convert GPT-J weights instead of providing them 2023-02-15 22:10:47 +03:00			`source = str(target_folder / "model.npz")`
			`target = str(target_folder / "rust_model.ot")`
Reuse gpt2 embeddings (#160) * Updated GPT2 to re-use embeddings for LM head * Updated conversion utilities * Updated changelog 2021-06-12 12:11:34 +03:00
Add GPT-J support (#285) (#288) * Add GPT-J support (#285) * Improve GPT-J implementation * Improve GPT-J tests * Adapt GPT-J to latest master branch * Specify how to convert GPT-J weights instead of providing them 2023-02-15 22:10:47 +03:00			`toml_location = (Path(__file__).resolve() / ".." / ".." / "Cargo.toml").resolve()`
Reuse gpt2 embeddings (#160) * Updated GPT2 to re-use embeddings for LM head * Updated conversion utilities * Updated changelog 2021-06-12 12:11:34 +03:00			`subprocess.run(`
Add GPT-J support (#285) (#288) * Add GPT-J support (#285) * Improve GPT-J implementation * Improve GPT-J tests * Adapt GPT-J to latest master branch * Specify how to convert GPT-J weights instead of providing them 2023-02-15 22:10:47 +03:00			`[`
			`"cargo",`
			`"run",`
			`"--bin=convert-tensor",`
			`"--manifest-path=%s" % toml_location,`
			`"--",`
			`source,`
			`target,`
			`],`
Reuse gpt2 embeddings (#160) * Updated GPT2 to re-use embeddings for LM head * Updated conversion utilities * Updated changelog 2021-06-12 12:11:34 +03:00			`)`