marian/contrib/triton-aml/README.md
delong-coder ca7a887aa7
Add Triton Marian backend running in AzureML Inference Environment (#749)
* Add Triton Marian backend running in AzureML Inference Environment
2020-11-04 14:29:36 -08:00

1.7 KiB

Triton-AML

Triton-AML is a Triton custom backend running with Marian in the AzureML Inference Environment, it's one of the implementation of Triton Backend Shared Library.

This backend is compiled with the static library of Marian on a specific version.

Layout:

  • marian_backend: Triton Marian backend source code
  • src: Changed code and CMakeLists.txt of Marian
  • Dockerfile: Used for compiling the backend with the static library of Marian
  • build.sh: A simple shell script to run the Dockerfile to get the generated libtriton_marian.so

Usage

Run ./build.sh to get the Triton Marian backend shared library.

For all the users, you can put the libtriton_marian.so into the following places:

  • <model_repository>/<model_name>/<version_directory>/libtriton_marian.so
  • <model_repository>/<model_name>/libtriton_marian.so

For the AzureML Inference team members, you can put it into the following place of aml-triton base image:

  • <backend_directory>/marian/libtriton_marian.so

Where <backend_directory> is by default /opt/tritonserver/backends.

Make changes

If you want to compile with another version of Marian, you need to replace RUN git checkout youki/quantize-embedding in the Dockerfile, then copy the new CMakeLists.txt replace the old one, add src/cmarian.cpp into CMakeLists.txt and make some changes to make sure it will build a static library of Marian.

Limitation

For now, it's only used for nlxseq2seq model, some hard code is in the ModelState::SetMarianConfigPath function, some changes must be done if you want to run other models with Marian.