View on Github
config.yaml
- Completions — Follows the same API as the OpenAI Completions API
- ChatCommpletions — Follows the same API as the OpenAI ChatCompletions API
config.yaml
config.yaml
model_server
parameter allows you to specify TGI
config.yaml
predict_concurrency
.
One of the main benefits of vLLM is continuous batching — in which multiple requests can be
processed at the same time. Without predict_concurrency, you cannot take advantage of this
feature.
config.yaml
config.yaml