Config options

Truss is configurable to its core. Every Truss must include a file config.yaml in its root directory, which is automatically generated when the Truss is created. However, configuration is optional. Every configurable value has a sensible default, and a completely empty config file is valid.

YAML syntax help

YAML syntax can be a bit non-obvious when dealing with empty lists and dictionaries. You may notice the following in the default Truss config file:

requirements: []
secrets: {}

When you fill them in with values, lists and dictionaries should look like this:

requirements:
  - dep1
  - dep2
secrets:
  key1: default_value1
  key2: default_value2

Example

Here’s an example config file for a Truss that uses the WizardLM model:

WizardLM config

description: An instruction-following LLM Using Evol-Instruct.
environment_variables: {}
model_name: WizardLM
requirements:
- accelerate==0.20.3
- bitsandbytes==0.39.1
- peft==0.3.0
- protobuf==4.23.3
- sentencepiece==0.1.99
- torch==2.0.1
- transformers==4.30.2
resources:
  cpu: "3"
  memory: 14Gi
  use_gpu: true
  accelerator: A10G
secrets: {}
system_packages: []

Full config reference

`model_name`

Name of your model

`description`

Describe your model for documentation purposes.

`model_class_name`

(default: Model) The name of the class that defines your Truss model. Note that this class must implement at least a predict method.

`model_module_dir`

(default: model) Folder in the Truss where to find the model class.

`data_dir`

(default: data/) Folder where to place data files in your Truss. Note that you can access this within your model like so:

model/model.py

class Model:
  def __init__(self, **kwargs):
    data_dir = kwargs["data_dir"]

  ...

`packages`

(default: packages/) Folder in the Truss to put your custom packages. Inside the packages folder you can place your own code that you want to reference inside model.py. Here is an example: Imagine you have the project setup below:

stable-diffusion/
    packages/
        package_1/
            subpackage/
                script.py
    model/
        model.py
        __init__.py
    config.yaml

Inside the model.py the package can be imported like this:

model/model.py

from package_1.subpackage.script import run_script

class Model:
    def __init__(self, **kwargs):
        pass

    def load(self):
        run_script()

    ...
    ...
    ...

`external_package_dirs`

External package dirs enable a truss to access custom packages located outside of it. This is a convenient way to allow multiple trusses to access the same package. Here is an example project structure. Note that the external package, super_cool_awesome_plugin/ is outside the truss.

stable-diffusion/
    model/
        model.py
        __init__.py
    config.yaml
super_cool_awesome_plugin/
    plugin1/
        script.py
    plugin2/
        run.py

Inside the stable-diffusion/config.yaml the path to your external package needs to be specified. For the example above, the config.yaml would look like this:

config.yaml

environment_variables: {}
external_package_dirs:
- ../super_cool_awesome_plugin/
model_name: Stable Diffusion
python_version: py39
...
...
...

Inside stable-diffusion/model/model.py the super_cool_awesome_plugin/ package can be imported like so:

model.py

from super_cool_awesome_plugin.plugin1.script import cool_constant
from super_cool_awesome_plugin.plugin2.run import AwesomeRunner

class Model:
    def __init__(self, **kwargs):
        awesome_runner = AwesomeRunner()

    def load(self):
        awesome_runner.run(cool_constant)

    ...
    ...
    ...

`environment_variables`

Do not store secret values directly in environment variables (or anywhere in the config file). See the secrets arg for information on properly managing secrets.

Any environment variables can be provided here as key value pairs and are exposed to the environment that the model executes in. Many Python libraries can be customized using environment variables, so this field can be quite handy in those scenarios.

environment_variables:
  ENVIRONMENT: Staging
  DB_URL: https://my_database.example.com/

`model_metadata`

Set any additional metadata in this catch-all field. The entire contents of the config file are available to the model at runtime, so this is a good place to store any custom information that model needs. For example, scikit-learn models include a flag here that indicates whether the model supports returning probabilities alongside predictions.

model_metadata:
  supports_predict_proba: true

This is also where display metdata can be stored

`requirements_file`

Path the requirements file with the required Python dependencies. We strongly recommend pinning versions in your requirements.

requirements_file: ./requirements.txt

`requirements`

List the Python dependencies that the model depends on. The requirements should be provided in the pip requirements file format, but as a yaml list. These requirements are installed after the ones from requirements_file. We strongly recommend pinning versions in your requirements.

requirements:
- scikit-learn==1.0.2
- threadpoolctl==3.0.0
- joblib==1.1.0
- numpy==1.20.3
- scipy==1.7.3

`resources`

The resources section is where you specify the compute resources that your model needs. This includes CPU, memory, and GPU resources. If you need a GPU, you must also set resources.use_gpu to true.

`resources.cpu`

CPU resources needed, expressed as either a raw number, or "millicpus". For example, 1000m and 1 are equivalent. Fractional CPU amounts can be requested using millicpus. For example, 500m is half of a CPU core.

`resources.memory`

CPU RAM needed, expressed as a number with units. Units acceptable include "Gi" (Gibibytes), "G" (Gigabytes), "Mi" (Mebibytes), and"M" (Megabytes). For example, 1Gi and 1024Mi are equivalent.

`resources.use_gpu`

Whether or not a GPU is required for this model.

`resources.accelerator`

Which GPU you would like for your instance. Available Nvidia GPUs supported in Truss include:

T4
L4
A10G
V100
A100

Note that if you need multiple GPUs to server your model, you can use the : operator to request multiple GPUs on your instance, eg:

resources:
  ...
  accelerator: A10G:2 # Requests 2 A10Gs

`secrets`

This field can be used to specify the keys for such secrets and dummy default values. Never store actual secret values in the config. Dummy default values are instructive of what the actual values look like and thus act as documentation of the format.

A model may depend on certain secret values that can't be bundled with the model and need to be bound securely at runtime. For example, a model may need to download information from s3 and may need access to AWS credentials for that.

secrets:
  hf_access_token: "ACCESS TOKEN"

`system_packages`

Specify any system packages that you would typically install using apt on a Debian operating system.

system_packages:
- ffmpeg
- libsm6
- libxext6

`python_version`

Which version of Python you'd like to use. Supported versions include:

py39
py310
py311

`base_image`

The base_image option is used if you need to bring your own custom base image. Custom base images are useful if there are scripts that need to run at build time, or dependencies that are complicated to install. After creating a custom base image, you can specify it in this field. See Custom Base Images for more detail on how to use these.

`base_image.image`

A path to the docker image you'd like to use, as an example, nvcr.io/nvidia/nemo:23.03.

`base_image.python_executable_path`

A path to the Python executable on the image. For instance, /usr/bin/python. Tying it together, a custom base image configuration might look like this:

base_image:
  image: nvcr.io/nvidia/nemo:23.03
  python_executable_path: /usr/bin/python

`runtime`

Runtime settings for your model instance.

`runtime.predict_concurrency`

(default: 1) This field governs how much concurrency can run in the predict method of your model. This is useful if you have a model that has support for parallelism, and you'd like to take advantage of that. By default, this value is set to 1, implying that predict can only run for one request at a time. This protects the GPU from being over-utilized, and is a good default for many models. See How to configure concurrency for more detail on how to set this value.

`external_data`

Use external_data if you have data that you want to be bundled in your image at build time. This is useful if you have a large amount of data that you want to be available to your model. By including it at build-time, you reduce the cold-start time of your instance, as the data is already available in the image. You can use it like so:

config.yaml

external_data:
- url: https://my-bucket.s3.amazonaws.com/my-data.tar.gz
  local_data_path: data/my-data.tar.gz
  name: my-data

`external_data.<list_item>.url`

The URL to download data from.

`external_data.<list_item>.local_data_path`

The path on the image where the data will be downloaded to.

`external_data.<list_item>.name`

You can set a name for the data, which is useful for readability-purposes. Not required.

`build`

The build section is used to define options for custom servers. The two main model servers we support are TGI an vLLM. These are highly optimized servers that are built to support specific LLMs. See the following examples for how to use each of these:

TGI
vLLM

Example configuration for TGI, running Falcon-7B:

config.yaml

build:
  arguments:
    endpoint: generate_stream
    model_id: tiiuae/falcon-7b
  model_server: TGI

`build.model_server`

Either VLLM for vLLM, or TGI for TGI.

`build.arguments`

The arguments for the model server. This includes information such as which model you intend to load, and which endpoin from the server you'd like to use.

`model_cache`

The model_cache section is used for caching model weights at build-time. This is one of the biggest levers for decreasing cold start times, as downloading weights can be one of the lengthiest parts of starting a new model instance. Using this section ensures that model weights are cached at build time. See the model cache guide for the full details on how to use this field.

Despite the fact that this field is called the model_cache, there are multiple backends supported, not just Hugging Face. You can also cache weights stored on GCS, for instance.

`model_cache.<list_item>.repo_id`

The endpoint for your cloud bucket. Currently, we support Hugging Face and Google Cloud Storage. Example: madebyollin/sdxl-vae-fp16-fix for a Hugging Face repo, or gcs://path-to-my-bucket for a GCS bucket.

`model_cache.<list_item>.revision`

Points to your revision. This is only relevant if you are pulling By default, it refers to main.

`model_cache.<list_item>.allow_patterns`

Only cache files that match specified patterns. Utilize Unix shell-style wildcards to denote these patterns. By default, all paths are included.

`model_cache.<list_item>.ignore_patterns`

Conversely, you can also denote file patterns to ignore, hence streamlining the caching process. By default, nothing is ignored.

Truss reference

CLI reference

​Example

​Full config reference

​model_name

​description

​model_class_name

​model_module_dir

​data_dir

​packages

​external_package_dirs

​environment_variables

​model_metadata

​requirements_file

​requirements

​resources

​resources.cpu

​resources.memory

​resources.use_gpu

​resources.accelerator

​secrets

​system_packages

​python_version

​base_image

​base_image.image

​base_image.python_executable_path

​runtime

​runtime.predict_concurrency

​external_data

​external_data.<list_item>.url

​external_data.<list_item>.local_data_path

​external_data.<list_item>.name

​build

​build.model_server

​build.arguments

​model_cache

​model_cache.<list_item>.repo_id

​model_cache.<list_item>.revision

​model_cache.<list_item>.allow_patterns

​model_cache.<list_item>.ignore_patterns

Example

Full config reference

`model_name`

`description`

`model_class_name`

`model_module_dir`

`data_dir`

`packages`

`external_package_dirs`

`environment_variables`

`model_metadata`

`requirements_file`

`requirements`

`resources`

`resources.cpu`

`resources.memory`

`resources.use_gpu`

`resources.accelerator`

`secrets`

`system_packages`

`python_version`

`base_image`

`base_image.image`

`base_image.python_executable_path`

`runtime`

`runtime.predict_concurrency`

`external_data`

`external_data.<list_item>.url`

`external_data.<list_item>.local_data_path`

`external_data.<list_item>.name`

`build`

`build.model_server`

`build.arguments`

`model_cache`

`model_cache.<list_item>.repo_id`

`model_cache.<list_item>.revision`

`model_cache.<list_item>.allow_patterns`

`model_cache.<list_item>.ignore_patterns`