Performance Tuning Guide¶

This guide covers strategies for optimizing Uni's performance across query execution, storage, indexing, and resource utilization.

Performance Overview¶

Uni's performance characteristics:

Operation	Typical Latency	Optimization Target
Point lookup	2-5ms	Index usage
1-hop traversal	4-8ms	Adjacency cache
Vector KNN (k=10)	1-3ms	Index tuning
Aggregation (1M rows)	50-200ms	Predicate pushdown
Bulk insert (10K)	5-10ms	Batch size

Query Optimization¶

1. Use Predicate Pushdown¶

Push filters to storage for massive I/O reduction:

// Good: Filter pushed to Lance
MATCH (p:Paper)
WHERE p.year > 2020 AND p.venue = 'NeurIPS'
RETURN p.title

// Bad: Filter applied after full scan
MATCH (p:Paper)
WHERE p.title CONTAINS 'Transformer'  // Cannot push CONTAINS
RETURN p.title

Pushable Predicates: - =, <>, <, >, <=, >= - IN [list] - IS NULL, IS NOT NULL - AND combinations of above

Non-Pushable Predicates: - CONTAINS, STARTS WITH, ENDS WITH - Function calls: lower(x) = 'value' - OR with different properties

2. Limit Early¶

Apply LIMIT as early as possible:

// Good: Limit applied early in pipeline
MATCH (p:Paper)
WHERE p.year > 2020
RETURN p.title
ORDER BY p.year DESC
LIMIT 10

// Bad: Process all then limit
MATCH (p:Paper)-[:CITES]->(cited)
WITH p, COUNT(cited) AS citation_count
ORDER BY citation_count DESC
RETURN p.title, citation_count
LIMIT 10  // All citations computed before limit

3. Project Only Needed Properties¶

Don't fetch unnecessary properties:

// Good: Only fetch needed properties
MATCH (p:Paper)
RETURN p.title, p.year

// Bad: Fetch all properties
MATCH (p:Paper)
RETURN p  // Loads all properties including large ones

// Worse: Return unused properties
MATCH (p:Paper)
RETURN p.title, p.abstract, p.embedding  // embedding loaded but unused

4. Use Indexes¶

Ensure indexes exist for filter properties:

-- Check if index is used
EXPLAIN MATCH (p:Paper) WHERE p.year = 2023 RETURN p.title

-- Create index if missing
CREATE INDEX paper_year FOR (p:Paper) ON (p.year)

5. Optimize Traversal Patterns¶

Structure patterns for efficient execution:

// Good: Filter before traverse
MATCH (p:Paper)
WHERE p.year > 2020
MATCH (p)-[:CITES]->(cited)
RETURN p.title, cited.title

// Good: Traverse from smaller set
MATCH (seed:Paper {title: 'Attention Is All You Need'})
MATCH (seed)-[:CITES]->(cited)
RETURN cited.title

// Bad: Full cross-product
MATCH (p1:Paper), (p2:Paper)
WHERE p1.title = p2.title  // Cartesian join
RETURN p1, p2

Index Tuning¶

Vector Index Configuration¶

HNSW Tuning¶

Parameter	Default	Low Latency	High Recall
`m`	16	16	48
`ef_construction`	200	100	500
`ef_search`	100	50	200

// High recall configuration
CREATE VECTOR INDEX paper_embeddings
FOR (p:Paper) ON p.embedding
OPTIONS {
  index_type: "hnsw",
  metric: "cosine",
  m: 48,
  ef_construction: 400
}

IVF_PQ Tuning¶

Parameter	Default	Memory Optimized	Recall Optimized
`num_partitions`	256	512	256
`num_sub_vectors`	8	8	48
`num_probes`	20	10	50

Scalar Index Selection¶

Query Pattern	Index Type	Notes
Equality (`= value`)	Hash	O(1) lookup
Range (`> value`)	BTree	Range scan
Low cardinality	Bitmap	Efficient for categories
High cardinality unique	Hash	Best for IDs

-- Hash for exact match (faster)
CREATE INDEX paper_doi FOR (p:Paper) ON (p.doi) OPTIONS { type: "hash" }

-- BTree for range queries
CREATE INDEX paper_year FOR (p:Paper) ON (p.year) OPTIONS { type: "btree" }

-- Bitmap for categories
CREATE INDEX paper_venue FOR (p:Paper) ON (p.venue) OPTIONS { type: "bitmap" }

Composite Indexes¶

Create composite indexes for common filter combinations:

-- Composite index for common query pattern
CREATE INDEX paper_venue_year FOR (p:Paper) ON (p.venue, p.year)

-- Query uses the composite index
MATCH (p:Paper)
WHERE p.venue = 'NeurIPS' AND p.year > 2020
RETURN p.title

Storage Optimization¶

Batch Size Tuning¶

Tune batch sizes for your workload:

# Import with larger batches (more memory, faster)
uni import data --batch-size 50000 ...

# Import with smaller batches (less memory)
uni import data --batch-size 5000 ...

Guidelines: - Increase batch size if memory allows (faster) - Decrease if OOM errors occur - Default (10000) is good for most cases

L0 Buffer Configuration¶

Tune the in-memory write buffer:

let config = WriteConfig {
    max_mutations_before_flush: 10000,  // Flush threshold
    max_l0_size_bytes: 128 * 1024 * 1024,  // 128 MB max
    auto_flush: true,
};

Trade-offs: - Larger L0: Better write throughput, higher memory, longer recovery - Smaller L0: Lower memory, more frequent flushes, faster recovery

Compaction¶

Trigger compaction after bulk operations:

# Manual compaction
uni compact --path ./storage

# Compaction levels
uni compact --path ./storage --level l1  # L0 → L1 only
uni compact --path ./storage --level l2  # Full compaction

Cache Configuration¶

Adjacency Cache¶

The CSR adjacency cache is critical for traversal performance:

let storage = StorageManager::with_config(
    path,
    schema_manager,
    StorageConfig {
        adjacency_cache_size: 1_000_000,  // Max cached vertices
        adjacency_cache_ttl: Duration::from_secs(3600),
    }
);

Sizing Guidelines: - Size for your "hot" working set - Monitor cache hit ratio - Increase if traversals are slow after warmup

Property Cache¶

Configure the property LRU cache:

let prop_manager = PropertyManager::with_config(
    storage,
    schema_manager,
    PropertyConfig {
        cache_capacity: 100_000,  // Cached property entries
        batch_load_size: 1000,    // Properties per batch load
    }
);

Query Analysis¶

EXPLAIN¶

View the query plan without execution:

uni query "MATCH (p:Paper) WHERE p.year > 2020 RETURN p.title" \
    --explain --path ./storage

Output:

Query Plan:
├── Project [p.title]
│   └── Scan [:Paper]
│         ↳ Index: paper_year (year > 2020)
│         ↳ Pushdown: year > 2020

Estimated rows: 5,000
Index usage: BTree (paper_year)

PROFILE¶

Execute with timing breakdown:

uni query "MATCH (p:Paper)-[:CITES]->(c) RETURN COUNT(c)" \
    --profile --path ./storage

Output:

┌───────────┐
│ COUNT(c)  │
├───────────┤
│ 45,231    │
└───────────┘

Execution Profile:
  Parse:      0.8ms
  Plan:       1.2ms
  Execute:    42.3ms
    ├── Scan:       12.1ms (28.6%)  [10,000 rows]
    ├── Traverse:   24.5ms (57.9%)  [45,231 edges]
    └── Aggregate:   5.7ms (13.5%)  [1 row]
  Total:      44.3ms

Identifying Bottlenecks¶

Profile Pattern	Likely Cause	Solution
High Scan time	No index, large result set	Add index, add filters
High Traverse time	Cold cache, many edges	Warm cache, limit hops
High Aggregate time	Large group count	Add LIMIT, pre-aggregate
High memory	Large intermediate results	Stream results, limit

Parallel Execution¶

Morsel-Driven Parallelism¶

Uni uses morsel-driven parallelism for large queries:

┌─────────────────────────────────────────────────────────────────────────────┐
│                         PARALLEL EXECUTION                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Source Data: [────────────────────────────────────────────────]           │
│                         │                                                   │
│                         ▼                                                   │
│   Morsels:     [────] [────] [────] [────] [────] [────]                   │
│                  │       │       │       │       │       │                  │
│                  ▼       ▼       ▼       ▼       ▼       ▼                  │
│   Workers:     [W1]   [W2]   [W3]   [W4]   [W1]   [W2]                     │
│                  │       │       │       │       │       │                  │
│                  └───────┴───────┴───────┴───────┴───────┘                  │
│                                   │                                         │
│                                   ▼                                         │
│   Merge:                     [Results]                                      │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Concurrency Configuration¶

let executor = Executor::with_config(
    storage,
    ExecutorConfig {
        worker_threads: 8,       // Parallel workers
        morsel_size: 4096,       // Rows per morsel
        max_concurrent_io: 16,   // Parallel I/O operations
    }
);

Guidelines: - Set workers to CPU core count - Increase morsel size for simpler queries - Decrease morsel size for complex operators

Memory Management¶

Memory Budget¶

Monitor and limit memory usage:

# Monitor memory during query
RUST_LOG=uni=debug uni query "..." 2>&1 | grep -i memory

# Set memory limits
export UNI_MAX_MEMORY_MB=4096

Reducing Memory Usage¶

Smaller batch sizes: --batch-size 5000
Smaller caches: Reduce cache capacities
Stream large results: Use SKIP/LIMIT pagination
Avoid large intermediates: Filter early

Memory Profile¶

// Enable memory tracking
let storage = StorageManager::with_config(
    path,
    schema_manager,
    StorageConfig {
        enable_memory_tracking: true,
        memory_limit_bytes: 4 * 1024 * 1024 * 1024,  // 4 GB
    }
);

// Query memory stats
let stats = storage.memory_stats();
println!("Adjacency cache: {} MB", stats.adjacency_cache_mb);
println!("Property cache: {} MB", stats.property_cache_mb);
println!("L0 buffer: {} MB", stats.l0_buffer_mb);

I/O Optimization¶

Object Store Configuration¶

For S3/GCS backends:

let storage = StorageManager::new_with_object_store(
    "s3://bucket/path",
    schema_manager,
    ObjectStoreConfig {
        max_connections: 32,
        connect_timeout: Duration::from_secs(10),
        read_timeout: Duration::from_secs(30),
        retry_attempts: 3,
    }
);

Local Cache for Remote Storage¶

let storage = StorageManager::new_with_cache(
    "s3://bucket/path",
    schema_manager,
    CacheConfig {
        local_cache_path: "/tmp/uni-cache",
        max_cache_size_bytes: 10 * 1024 * 1024 * 1024,  // 10 GB
        eviction_policy: EvictionPolicy::LRU,
    }
);

Read-Ahead¶

Configure read-ahead for sequential scans:

let config = StorageConfig {
    read_ahead_size: 64 * 1024 * 1024,  // 64 MB
    prefetch_enabled: true,
};

Benchmarking¶

Built-in Benchmarks¶

# Run all benchmarks
cargo bench

# Run specific benchmark
cargo bench -- vector_search

# Save baseline
cargo bench -- --save-baseline main

# Compare to baseline
cargo bench -- --baseline main

Custom Benchmarks¶

use criterion::{criterion_group, criterion_main, Criterion};

fn benchmark_traversal(c: &mut Criterion) {
    let storage = setup_storage();

    c.bench_function("1-hop traversal", |b| {
        b.iter(|| {
            let query = "MATCH (p:Paper)-[:CITES]->(c) RETURN COUNT(c)";
            executor.execute(query).unwrap()
        })
    });
}

criterion_group!(benches, benchmark_traversal);
criterion_main!(benches);

Performance Checklist¶

Before deploying to production:

Next Steps¶

Architecture — Understand system internals
Vectorized Execution — Batch processing details
Storage Engine — Storage layer optimization
Benchmarks — Performance metrics