A guide to setting concurrency for your model
predict
function on your Truss at once.predict
function concurrently.
To get a sense for why this matters, let’s recap the structure of a Truss:
predict
function
runs. For instance, if you are running an image classification model, and need to download images from S3, this is a good placeholder
to do it.predict
function. The most common need here is to limit access to the GPU, since multiple
requests running on the GPU at the same time could cause serious degradation in performance.
Unlike Concurrency Target, which is configured in the Baseten UI, the Predict Concurrency is configured as a part
of the Truss Config (in the config.yaml
file).