Benchmarks¶
Measured performance data for Uni across ingestion, querying, graph traversal, vector search, and graph algorithms. All numbers come from the Criterion benchmark suite (cargo bench) and are reproducible.
Last updated: 2026-04-01
Executive Summary¶
| Workload | Performance | Dataset | Notes |
|---|---|---|---|
| Point Lookup | 3.0 ms | 1K vertices | By name, after flush |
| 1-Hop Traversal | 4.7 ms | 1K vertices | CSR adjacency, warm cache |
| Vector KNN (k=10) | 6.9 ms | 1K vectors, 128d | HNSW indexed |
| Aggregation (AVG) | 3.1 ms | 1K vertices | Single-column aggregate |
| Full Scan COUNT | 26 ms | 1K vertices | Scales linearly with data |
| Hybrid Vector+Graph | 13.7 ms | 1K n / 1K e | Vector filter + 1-hop |
| Micro: Raw Vector Search | 1.9 ms | 1K vectors, 128d | Direct storage layer |
| Micro: Neighbor Access | 2.0 ns | 1K vertices, 50-deg | CSR slice lookup |
Test Environment¶
All benchmarks run on a single machine using Criterion 0.5 with harness = false. Each measurement uses 10 samples (comprehensive/mutation/algo suites) or 100 samples (micro suite). Datasets use in-memory or temp-dir storage with auto_flush_threshold: 100_000.
Benchmark Configurations¶
Three dataset sizes used for scaling analysis:
| Config | Vertices | Edges | Edge Density | Properties per Vertex |
|---|---|---|---|---|
| Small | 1,000 | 1,000 | 1.0 e/v | 3 (name, age, embedding[128]) |
| Medium | 5,000 | 2,500 | 0.5 e/v | 3 (name, age, embedding[128]) |
| Large | 8,000 | 12,000 | 1.5 e/v | 3 (name, age, embedding[128]) |
Schema: Person label with name (String), age (Int32), embedding (Vector[128]). Edge type: KNOWS.
Ingestion Performance¶
Cypher INSERT Throughput¶
Per-statement ingestion via CREATE inside a transaction. Each vertex carries a name, age, and 128-dimensional embedding vector.
| Method | 1K vertices | 5K vertices | 8K vertices |
|---|---|---|---|
| String interpolation | 1.89 s | 10.47 s | 12.59 s |
| Parameterized query | 1.51 s | 8.61 s | 13.14 s |
| Per-vertex rate (parameterized) | 662 v/s | 581 v/s | 609 v/s |
Parameterized queries are ~20% faster at 1K scale because the plan cache can reuse parsed/planned queries and only substitute parameter values. At 8K the gap narrows as L0 buffer pressure dominates.
Flush Performance (L0 to L1)¶
Flush converts the in-memory L0 buffer to persistent Lance columnar storage:
| Dataset | Flush Time | Throughput |
|---|---|---|
| 1K vertices + 500 edges | 748 ms | ~2.0K entities/sec |
| 5K vertices + 1.25K edges | 1.19 s | ~5.3K entities/sec |
| 8K vertices + 6K edges | 1.54 s | ~9.1K entities/sec |
Flush time scales linearly with entity count. Throughput improves with batch size due to amortized Arrow serialization and Lance write overhead.
Mutation Performance¶
Benchmarked with in-memory Uni instances, 100-node operations per measurement:
| Operation | Time | Per-Op Rate |
|---|---|---|
| CREATE 100 nodes | 137.8 ms | 1.38 ms/node |
| SET 100 properties | 6.7 ms | 67 us/prop |
| DELETE 100 nodes | 3.5 ms | 35 us/node |
| CREATE 50 + MATCH | 74.1 ms | 1.48 ms/create + query |
| MERGE 100 nodes (50 create + 50 match) | 336.8 ms | 3.37 ms/merge |
Key observations:
- MERGE is ~2.4x slower than CREATE because each MERGE must first check existence (read + conditional write), while CREATE is write-only.
- SET and DELETE are fast because they operate on already-indexed vertices in the L0 buffer.
- Transaction overhead is included in all measurements (tx begin + commit).
Query Performance¶
Point Lookup¶
Single-vertex retrieval by string property after bulk insert + flush:
| Dataset | Latency | Notes |
|---|---|---|
| 1K vertices | 3.01 ms | Scan-based (no BTree index) |
| 5K vertices | 2.92 ms | Nearly constant |
| 8K vertices | 3.44 ms | Nearly constant |
Point lookup scales approximately O(1) with dataset size — the query planner pushes the filter predicate into the Lance scan, enabling predicate pushdown at the storage layer.
Aggregation¶
Full-scan aggregations over all vertices:
| Query | 1K n | 5K n | 8K n | Scaling |
|---|---|---|---|---|
COUNT(n) |
25.9 ms | 108.1 ms | 186.1 ms | Linear |
GROUP BY n.age, COUNT(n) |
26.4 ms | 108.7 ms | 181.8 ms | Linear |
AVG(n.age) |
3.10 ms | 2.93 ms | 3.18 ms | Constant |
COUNT and GROUP BY require a full table scan — 8x more data yields ~7x more time. AVG uses columnar statistics and returns in constant time regardless of dataset size.
Indexed Queries¶
Queries using BTree, HNSW, or fulltext indexes created via Cypher DDL:
| Query Type | 1K n | 5K n | 8K n | Scaling |
|---|---|---|---|---|
Scalar equality (age = 25) |
2.82 ms | 3.08 ms | 3.36 ms | ~Constant |
Scalar range (age >= 20 AND age <= 30) |
3.07 ms | 3.45 ms | 3.78 ms | ~Constant |
| Vector KNN (k=10, HNSW) | 6.89 ms | 9.85 ms | 11.29 ms | Sub-linear |
| Fulltext match (exact name) | 3.01 ms | 3.26 ms | 3.27 ms | ~Constant |
BTree and fulltext indexes deliver nearly constant-time lookups. Vector KNN scales sub-linearly (O(log n)) due to the HNSW graph structure.
Index Creation¶
Time to build an index on an already-populated and flushed dataset:
| Index Type | 1K n | 5K n | 8K n |
|---|---|---|---|
| Scalar BTree | 1.47 ms | 2.08 ms | 3.18 ms |
| Vector HNSW | 1.44 ms | 2.13 ms | 2.76 ms |
| Fulltext | 1.36 ms | 2.17 ms | 2.93 ms |
Index creation scales linearly with dataset size. All three index types have comparable build times at these scales.
Order By + Limit¶
MATCH (n:Person) RETURN n.name, n.age ORDER BY ... LIMIT 10:
| Sort | 1K n | 5K n | 8K n |
|---|---|---|---|
ORDER BY n.age |
2.93 ms | 3.45 ms | 3.97 ms |
ORDER BY n.name DESC |
2.98 ms | 3.82 ms | 4.60 ms |
Sub-linear scaling thanks to the LIMIT 10 clause — the engine uses a top-N heap instead of a full sort.
Graph Traversal Performance¶
Cypher Traversal (via Comprehensive Suite)¶
Starting from a single vertex (Person_0), following KNOWS edges:
| Hops | 1K n / 1K e | 5K n / 2.5K e | 8K n / 12K e | Notes |
|---|---|---|---|---|
| 1-hop | 4.75 ms | 4.31 ms | 5.08 ms | ~Constant |
| 2-hop | 4.86 ms | 4.83 ms | 4.89 ms | ~Constant |
| 3-hop | 4.74 ms | 4.13 ms | 4.98 ms | ~Constant |
Traversal from a single start vertex stays flat regardless of total graph size — the working set (reachable neighbors) is determined by local topology, not total vertex count.
Low-Level Traversal (via Micro Suite)¶
Direct storage-layer traversal benchmarks, bypassing the Cypher parser/planner:
| Benchmark | Time | Notes |
|---|---|---|
| 1-hop, 100-node chain | 750 us | load_subgraph from storage |
| Neighbor iteration (50-degree vertex) | 9.71 ns | CSR in-memory iteration |
| Neighbor access (50-degree vertex) | 2.00 ns | Slice pointer lookup |
The CSR adjacency structure provides near-zero-cost neighbor access — a single slice lookup followed by sequential memory reads.
SimpleGraph Construction¶
In-memory graph building performance (10K vertices, 50K edges):
| Strategy | Time | Notes |
|---|---|---|
Standard (add_vertex + add_edge) |
3.27 ms | Hash map lookups per insert |
Optimized (with_capacity + add_edge_unchecked) |
2.53 ms | Pre-allocated, no validation |
Pre-allocation saves ~23% by avoiding hash map resizing.
Vector Search Performance¶
Raw Vector Search (via Micro Suite)¶
Direct storage-layer vector search, bypassing Cypher:
| Dataset | Dimensions | k | Latency |
|---|---|---|---|
| 1K vectors | 128 | 10 | 1.92 ms |
Cypher Vector Search (via Comprehensive Suite)¶
Vector similarity filter via Cypher (WHERE vector_similarity(...) > 0.8):
| Dataset | Latency | Notes |
|---|---|---|
| 1K n | 6.12 ms | Includes Cypher parse + plan |
| 5K n | 6.91 ms | Sub-linear growth |
| 8K n | 7.80 ms | Sub-linear growth |
Indexed KNN¶
CALL uni.vector.query('Person', 'embedding', $vec, 10) with HNSW index:
| Dataset | Latency | Notes |
|---|---|---|
| 1K n | 6.89 ms | HNSW index + Cypher overhead |
| 5K n | 9.85 ms | O(log n) scaling |
| 8K n | 11.29 ms | O(log n) scaling |
Hybrid Query Performance¶
Combined vector similarity + graph traversal query:
MATCH (a:Person)-[:KNOWS]->(b:Person)
WHERE vector_similarity(a.embedding, $vec) > 0.8 AND b.age >= 1
RETURN b.name
| Dataset | Latency | Notes |
|---|---|---|
| 1K n / 1K e | 13.73 ms | Low edge density (1.0 e/v) |
| 5K n / 2.5K e | 26.30 ms | Low edge density (0.5 e/v) |
| 8K n / 12K e | 92.45 ms | Higher density (1.5 e/v) |
Hybrid queries show super-linear growth with edge density. At 8K/12K, the vector filter matches more start vertices (8x more nodes) and each has more outgoing edges (1.5x density), multiplying the traversal fan-out. This is the most edge-density-sensitive workload.
Graph Algorithm Performance¶
All algorithms benchmarked on random graphs with 1,000 nodes and 5 edges per node (avg degree = 5). Execution includes Cypher parsing, planning, graph projection, and algorithm execution.
| Algorithm | Time | Complexity | Notes |
|---|---|---|---|
| PageRank | 7.24 ms | O(E) per iteration | Iterative convergence |
| WCC | 5.83 ms | O(V + E) | Union-Find with path compression |
| SCC | 6.07 ms | O(V + E) | Tarjan's algorithm |
| Louvain | 10.09 ms | O(E) per iteration | Multi-level community detection |
| Label Propagation | 10.45 ms | O(E) per iteration | Fast community detection |
| Betweenness | 7.68 ms | O(VE) | Sampling (100 sources) |
| Closeness | 7.68 ms | O(VE) | BFS from all vertices |
| Node Similarity | 12.05 ms | O(V^2) | Jaccard on neighbor sets |
| Triangle Count | 6.41 ms | O(E^1.5) | Set intersection |
| K-Core | 6.43 ms | O(V + E) | Iterative degree pruning |
| Random Walk | 7.17 ms | O(V * walk_length) | 5 steps, 1 walk per node |
At 1K nodes / 5K edges, most algorithms complete in 6-12 ms including full Cypher overhead (parse → plan → project → execute → collect). The graph projection step (loading CSR from storage) is amortized across the algorithm execution.
Scaling Analysis¶
What Scales Linearly (Full-Scan Bound)¶
These operations must touch every row and scale proportionally with data size:
| Operation | 1K | 5K | 8K | Growth Factor (1K→8K) |
|---|---|---|---|---|
| COUNT aggregation | 25.9 ms | 108.1 ms | 186.1 ms | 7.2x (data: 8x) |
| GROUP BY aggregation | 26.4 ms | 108.7 ms | 181.8 ms | 6.9x (data: 8x) |
| Flush L0→L1 | 748 ms | 1.19 s | 1.54 s | 2.1x (data: 8x) |
| Cypher ingestion | 1.51 s | 8.61 s | 13.14 s | 8.7x (data: 8x) |
What Stays Constant (Index/Cache Bound)¶
These operations use indexes or local graph structure and are independent of total data size:
| Operation | 1K | 5K | 8K | Growth Factor (1K→8K) |
|---|---|---|---|---|
| Point lookup | 3.01 ms | 2.92 ms | 3.44 ms | 1.1x |
| Scalar equality (indexed) | 2.82 ms | 3.08 ms | 3.36 ms | 1.2x |
| Fulltext match (indexed) | 3.01 ms | 3.26 ms | 3.27 ms | 1.1x |
| AVG aggregation | 3.10 ms | 2.93 ms | 3.18 ms | 1.0x |
| 1-hop traversal | 4.75 ms | 4.31 ms | 5.08 ms | 1.1x |
| 3-hop traversal | 4.74 ms | 4.13 ms | 4.98 ms | 1.1x |
What Scales Sub-Linearly (O(log n))¶
| Operation | 1K | 5K | 8K | Growth Factor (1K→8K) |
|---|---|---|---|---|
| Vector KNN (HNSW) | 6.89 ms | 9.85 ms | 11.29 ms | 1.6x (data: 8x) |
| Vector similarity filter | 6.12 ms | 6.91 ms | 7.80 ms | 1.3x (data: 8x) |
| ORDER BY + LIMIT | 2.93 ms | 3.45 ms | 3.97 ms | 1.4x (data: 8x) |
What Scales Super-Linearly (Edge-Density Sensitive)¶
| Operation | 1K n/1K e | 5K n/2.5K e | 8K n/12K e | Notes |
|---|---|---|---|---|
| Hybrid vector+graph | 13.73 ms | 26.30 ms | 92.45 ms | 6.7x for 8x nodes + 12x edges |
The hybrid query is sensitive to both node count (more vector matches) and edge density (more traversal fan-out per match). This is expected and can be mitigated with tighter vector thresholds or earlier LIMIT clauses.
Running Benchmarks¶
Benchmark Suites¶
| Suite | File | Focus |
|---|---|---|
comprehensive |
crates/uni/benches/comprehensive.rs |
Public API: ingestion, queries, indexes, traversal |
mutation_benchmarks |
crates/uni/benches/mutation_benchmarks.rs |
CREATE, SET, DELETE, MERGE |
algo_benchmarks |
crates/uni/benches/algo_benchmarks.rs |
Graph algorithms (PageRank, WCC, etc.) |
micro_benchmarks |
crates/uni/benches/micro_benchmarks.rs |
Low-level: storage, CSR, vector search |
pushdown_performance |
crates/uni/benches/pushdown_performance.rs |
Property pushdown (placeholder) |
Running¶
# Run all benchmarks
cargo bench
# Run a specific suite
cargo bench --bench comprehensive
cargo bench --bench algo_benchmarks
cargo bench --bench mutation_benchmarks
cargo bench --bench micro_benchmarks
# Run with custom dataset size (comprehensive suite)
BENCH_NODES=5000 BENCH_EDGES=2500 cargo bench --bench comprehensive
# Run only matching benchmarks
BENCH_NODES=1000 cargo bench --bench comprehensive -- traversal
# Algorithm benchmarks with custom graph size
BENCH_NODES=2000 BENCH_EDGES_PER_NODE=10 cargo bench --bench algo_benchmarks
# Save baseline and compare
cargo bench -- --save-baseline main
# ... make changes ...
cargo bench -- --baseline main
# View HTML reports
open target/criterion/report/index.html
Profiling¶
# CPU profiling with perf
perf record cargo bench --bench comprehensive -- traversal
perf report
# Flame graphs
cargo flamegraph --bench comprehensive -- --bench
Next Steps¶
- Vectorized Execution — Execution engine details
- Storage Engine — Storage layer internals