Configuration¶
Catalogs are JSON arrays of ModelAliasSpec entries.
Canonical schema shape¶
[
{
"alias": "embed/default",
"task": "embed",
"provider_id": "local/candle",
"model_id": "sentence-transformers/all-MiniLM-L6-v2",
"revision": null,
"warmup": "lazy",
"required": false,
"timeout": 10,
"load_timeout": 600,
"retry": {
"max_attempts": 3,
"initial_backoff_ms": 100
},
"options": null
}
]
Field constraints¶
alias: string matching.+/.+.task: one ofembed,rerank,generate.warmup: one ofeager,lazy,background.timeout,load_timeout: integer >= 1.retry.max_attempts,retry.initial_backoff_ms: integer >= 1.options: object or null, strict provider-specific keys only.
Provider options reference¶
| Provider ID | Allowed option keys | Notes |
|---|---|---|
local/candle |
cache_dir |
Per-model local cache path |
local/fastembed |
cache_dir |
Per-model local cache path |
local/mistralrs |
isq, force_cpu, paged_attention, max_num_seqs, chat_template, tokenizer_json, embedding_dimensions, gguf_files |
Quantization and local runtime tuning |
remote/openai |
api_key_env |
Override env var name for API key |
remote/gemini |
api_key_env |
Override env var name for API key |
remote/vertexai |
api_token_env, project_id, location, publisher, embedding_dimensions |
OAuth token + project/location metadata |
remote/mistral |
api_key_env |
Override env var name for API key |
remote/anthropic |
api_key_env, anthropic_version |
anthropic_version defaults to 2023-06-01 |
remote/voyageai |
api_key_env |
Override env var name for API key |
remote/cohere |
api_key_env, input_type |
input_type used for embedding mode |
remote/azure-openai |
api_key_env, resource_name, api_version |
resource_name required; api_version default 2024-10-21 |
Provider-specific model/config links:
Runtime builder paths¶
- Programmatic catalog:
.catalog(Vec<ModelAliasSpec>) - JSON string catalog:
.catalog_from_str(&str) - JSON file catalog:
.catalog_from_file(path)
Helpful APIs¶
runtime.contains_alias(alias)runtime.prefetch_all()runtime.prefetch(&[aliases])runtime.embedding(alias)runtime.reranker(alias)runtime.generator(alias)