local/mistralrs¶
Uni-Xervo support¶
- Provider ID:
local/mistralrs - Feature flag:
provider-mistralrs - Capabilities:
embed,generate - Pipeline types:
text(default),vision,diffusion,speech
Pipeline types¶
| Pipeline | Description | Output |
|---|---|---|
text |
Standard LLM text generation (default) | result.text |
vision |
Image + text understanding | result.text |
diffusion |
Text-to-image generation | result.images |
speech |
Text-to-audio synthesis | result.audio |
Uni-Xervo provider options¶
Common options (all pipelines)¶
| Option | Type | Description |
|---|---|---|
pipeline |
string | Pipeline type: text, vision, diffusion, speech. Default: text |
dtype |
string | Model precision: auto, f16, bf16, f32. See dtype |
force_cpu |
boolean | Force CPU inference |
Text pipeline options¶
| Option | Type | Description |
|---|---|---|
isq |
string | In-situ quantization type (e.g. Q4K, Q8_0) |
paged_attention |
boolean | Enable paged attention |
max_num_seqs |
integer > 0 | Maximum concurrent sequences |
chat_template |
string | Custom chat template |
tokenizer_json |
string | Path to tokenizer.json |
embedding_dimensions |
integer > 0 | Override output dimensions for embeddings (embed task only) |
gguf_files |
array of strings | GGUF filenames to load in GGUF mode |
Diffusion pipeline options¶
| Option | Type | Description |
|---|---|---|
diffusion_loader_type |
string | Required. One of: flux, flux_offloaded |
Speech pipeline options¶
| Option | Type | Description |
|---|---|---|
speech_loader_type |
string | Required. One of: dia |
Pipeline-specific option validity¶
| Option | text | vision | diffusion | speech |
|---|---|---|---|---|
pipeline |
Yes | Yes | Yes | Yes |
dtype |
Yes | Yes | Yes | Yes |
force_cpu |
Yes | Yes | Yes | Yes |
isq |
Yes | No | No | No |
paged_attention |
Yes | No | No | No |
max_num_seqs |
Yes | No | No | No |
chat_template |
Yes | No | No | No |
tokenizer_json |
Yes | No | No | No |
embedding_dimensions |
Yes | No | No | No |
gguf_files |
Yes | No | No | No |
diffusion_loader_type |
No | No | Yes | No |
speech_loader_type |
No | No | No | Yes |
Authoritative Uni-Xervo option schema:
Dtype¶
Model precision control. Available on all four pipeline types.
| Value | Description |
|---|---|
auto |
Automatic selection (BF16 on GPU, F32 on CPU) |
f16 |
16-bit floating point |
bf16 |
Brain floating point 16 |
f32 |
32-bit floating point |
Default resolution logic:
- Explicit
dtypevalue in catalog options, if set f32when running on CPU or without GPU supportautootherwise
Available models¶
local/mistralrs delegates model support to the upstream mistral.rs engine.
Authoritative model/support references:
- mistral.rs docs: https://ericlbuehler.github.io/mistral.rs/
- mistral.rs repository: https://github.com/EricLBuehler/mistral.rs
Generation API¶
Uni-Xervo generation API exposes:
max_tokenstemperaturetop_pwidth(diffusion only)height(diffusion only)
GenerationResult output fields:
text— generated text (text and vision pipelines)usage— optional token usage statsimages— generated images (diffusion pipeline)audio— generated audio (speech pipeline)
Example catalog entries¶
Text generation (basic)¶
{
"alias": "generate/local",
"task": "generate",
"provider_id": "local/mistralrs",
"model_id": "mistralai/Mistral-7B-Instruct-v0.2",
"options": {
"isq": "Q4K",
"paged_attention": true,
"max_num_seqs": 8
}
}
Text generation with GGUF¶
{
"alias": "generate/gguf",
"task": "generate",
"provider_id": "local/mistralrs",
"model_id": "TheBloke/Mistral-7B-Instruct-v0.2-GGUF",
"options": {
"gguf_files": ["mistral-7b-instruct-v0.2.Q4_K_M.gguf"]
}
}
Text generation with ISQ + dtype¶
{
"alias": "generate/isq",
"task": "generate",
"provider_id": "local/mistralrs",
"model_id": "mistralai/Mistral-7B-Instruct-v0.2",
"options": {
"isq": "Q8_0",
"dtype": "bf16"
}
}
Vision¶
{
"alias": "vision/qwen",
"task": "generate",
"provider_id": "local/mistralrs",
"model_id": "Qwen/Qwen2-VL-2B-Instruct",
"options": {
"pipeline": "vision",
"dtype": "bf16"
}
}
Diffusion (image generation)¶
{
"alias": "image/flux",
"task": "generate",
"provider_id": "local/mistralrs",
"model_id": "black-forest-labs/FLUX.1-schnell",
"options": {
"pipeline": "diffusion",
"diffusion_loader_type": "flux"
}
}