Accelerate cold starts by caching your weights
model_cache
to your config.yml
with a valid repo_id
. The model_cache
has a few key configurations:
repo_id
(required): The endpoint for your cloud bucket. Currently, we support Hugging Face and Google Cloud Storage.revision
: Points to your revision. This is only relevant if you are pulling By default, it refers to main
.allow_patterns
: Only cache files that match specified patterns. Utilize Unix shell-style wildcards to denote these patterns.ignore_patterns
: Conversely, you can also denote file patterns to ignore, hence streamlining the caching process.hf_cache
to model_cache
, but don’t worry! If you’re using hf_cache
in any of your projects, it will automatically be aliased to model_cache
.model_cache
for Stable Diffusion XL. Note how it only pulls the model weights that it needs using allow_patterns
.
.bin
, .safetensors
, .h5
, .msgpack
, etc.). You only need one of these most of the time. To minimize cold starts, ensure that you only cache the weights you need.
truss push
. There is not currently a mechanism for invalidating cached model weights on an existing model.model_cache
key with an appropriate repo_id
should be enough.
However, if you want to deploy a model from a gated repo like Llama 2 to Baseten, there’s a few steps you need to take:
Get Hugging Face API Key
read
access. Make sure you have access to the model you want to serve.Add it to Baseten Secrets Manager
hf_access_token
. You can read more about secrets here.Update Config
config.yml
, add the following code:secrets
only shows up once in your config.yml
.~/.cache/huggingface/hub/models--{your_model_name}/
. You can change this directory by setting the HF_HOME
or HUGGINGFACE_HUB_CACHE
environment variable in your config.yml
.Read more here.model_cache
should look something like this:
service_account.json
and add it to the data
directory of your Truss.
Your file structure should look something like this:
service_account.json
to your .gitignore
file. You don’t want to accidentally expose your service account key./app/model_cache/{your_bucket_name}
.
model_cache
should look something like this:
aws_access_key_id
, aws_secret_access_key
, and aws_region
in your AWS dashboard. Create a file named s3_credentials.json
. Inside this file, add the credentials that you identified earlier as shown below. Place this file into the data
directory of your Truss.
The key aws_session_token
can be included, but is optional.
Here is an example of how your s3_credentials.json
file should look:
s3_credentials.json
to your .gitignore
file. You don’t want to accidentally expose your service account key./app/model_cache/{your_bucket_name}
.