Set your model resources, dependencies, and more
YAML syntax help
model_name
description
model_class_name
Model
)
The name of the class that defines your Truss model. Note that this class must implement
at least a predict
method.
model_module_dir
model
)
Folder in the Truss where to find the model class.
data_dir
data/
)
Folder where to place data files in your Truss. Note that you can access this within your model like so:
packages
packages/
)
Folder in the Truss to put your custom packages.
Inside the packages
folder you can place your own code that you want to reference inside model.py
. Here is an example:
Imagine you have the project setup below:
model.py
the package can be imported like this:
external_package_dirs
super_cool_awesome_plugin/
is outside the truss.
stable-diffusion/config.yaml
the path to your external package needs to be specified. For the example above, the config.yaml
would look like this:
stable-diffusion/model/model.py
the super_cool_awesome_plugin/
package can be imported like so:
environment_variables
secrets
arg for information on properly managing secrets.model_metadata
requirements_file
requirements
requirements_file
.
We strongly recommend pinning versions in your requirements.
resources
resources
section is where you specify the compute resources that your model needs. This includes CPU, memory, and GPU resources.
If you need a GPU, you must also set resources.use_gpu
to true
.
resources.cpu
1000m
and 1
are equivalent.
Fractional CPU amounts can be requested using millicpus. For example, 500m
is half of a CPU core.
resources.memory
1Gi
and 1024Mi
are equivalent.
resources.use_gpu
resources.accelerator
:
operator to request multiple
GPUs on your instance, eg:
secrets
system_packages
apt
on a Debian operating system.
python_version
base_image
base_image
option is used if you need to bring your own custom base image.
Custom base images are useful if there are scripts that need to run at build time, or dependencies
that are complicated to install. After creating a custom base image, you can specify it
in this field.
See Custom Base Images for more detail on how to use these.
base_image.image
nvcr.io/nvidia/nemo:23.03
.
base_image.python_executable_path
/usr/bin/python
.
Tying it together, a custom base image configuration might look
like this:
runtime
runtime.predict_concurrency
1
)
This field governs how much concurrency can run in the predict method of your model. This is useful
if you have a model that has support for parallelism, and you'd like to take advantage of that.
By default, this value is set to 1, implying that predict
can only run for one request at a time.
This protects the GPU from being over-utilized, and is a good default for many models.
See How to configure concurrency for more detail on how to set this value.
external_data
external_data
if you have data that you want to be bundled in your image at build time.
This is useful if you have a large amount of data that you want to be available to your model.
By including it at build-time, you reduce the cold-start time of your instance, as the data is
already available in the image. You can use it like so:
external_data.<list_item>.url
external_data.<list_item>.local_data_path
external_data.<list_item>.name
build
build
section is used to define options for custom servers.
The two main model servers we support are TGI
an vLLM
. These are
highly optimized servers that are built to support specific LLMs.
See the following examples for how to use each of these:
Example configuration for TGI, running Falcon-7B:
build.model_server
VLLM
for vLLM, or TGI
for TGI.
build.arguments
model_cache
model_cache
section is used for caching model weights at build-time. This is one of the biggest levers
for decreasing cold start times, as downloading weights can be one of the lengthiest parts of starting a new
model instance. Using this section ensures that model weights are cached at build time.
See the model cache guide for the full details on how to use this field.
model_cache
, there are multiple backends supported, not just Hugging Face. You can
also cache weights stored on GCS, for instance.model_cache.<list_item>.repo_id
madebyollin/sdxl-vae-fp16-fix
for a Hugging Face repo, or gcs://path-to-my-bucket
for
a GCS bucket.
model_cache.<list_item>.revision
main
.
model_cache.<list_item>.allow_patterns
model_cache.<list_item>.ignore_patterns