Crate uni_xervo

Crate uni_xervo 

Source
Expand description

Unified Rust runtime for local and remote embedding, reranking, and generation models.

Uni-Xervo provides a single, provider-agnostic API for loading and running ML models across a wide range of backends — from local inference engines (Candle, FastEmbed, mistral.rs) to remote API services (OpenAI, Gemini, Anthropic, Cohere, Mistral, Voyage AI, Vertex AI, Azure OpenAI).

§Key concepts

  • ModelRuntime — the central runtime that owns providers and manages a catalog of model aliases.
  • ModelAliasSpec — a declarative specification that maps a human-readable alias (e.g. "embed/default") to a concrete provider + model pair.
  • Providers — pluggable backends that implement ModelProvider. Each provider advertises the tasks it supports and knows how to load models.
  • TraitsEmbeddingModel, RerankerModel, and GeneratorModel are the task-specific interfaces returned by the runtime.

§Quick start

use uni_xervo::api::{ModelAliasSpec, ModelTask};
use uni_xervo::runtime::ModelRuntime;
use uni_xervo::provider::candle::LocalCandleProvider;

let spec = ModelAliasSpec {
    alias: "embed/local".into(),
    task: ModelTask::Embed,
    provider_id: "local/candle".into(),
    model_id: "sentence-transformers/all-MiniLM-L6-v2".into(),
    revision: None,
    warmup: Default::default(),
    required: true,
    timeout: None,
    load_timeout: None,
    retry: None,
    options: serde_json::Value::Null,
};

let runtime = ModelRuntime::builder()
    .register_provider(LocalCandleProvider::new())
    .catalog(vec![spec])
    .build()
    .await?;

let model = runtime.embedding("embed/local").await?;
let embeddings = model.embed(vec!["Hello, world!"]).await?;

Modules§

api
Public API types for configuring models, catalogs, and runtime behavior.
cache
Model and weight cache directory resolution.
error
Error types for the Uni-Xervo runtime.
provider
Provider implementations for local and remote model backends.
reliability
Reliability primitives: circuit breaker, instrumented model wrappers with timeout and retry support, and metrics emission.
runtime
The core runtime that manages providers, catalogs, and loaded model instances.
traits
Core traits that every provider and model implementation must satisfy.