Benchmarks¶

Measured performance data for Uni across ingestion, querying, graph traversal, vector search, and graph algorithms. All numbers come from the Criterion benchmark suite (cargo bench) and are reproducible.

Last updated: 2026-04-01

Executive Summary¶

Workload	Performance	Dataset	Notes
Point Lookup	3.0 ms	1K vertices	By name, after flush
1-Hop Traversal	4.7 ms	1K vertices	CSR adjacency, warm cache
Vector KNN (k=10)	6.9 ms	1K vectors, 128d	HNSW indexed
Aggregation (AVG)	3.1 ms	1K vertices	Single-column aggregate
Full Scan COUNT	26 ms	1K vertices	Scales linearly with data
Hybrid Vector+Graph	13.7 ms	1K n / 1K e	Vector filter + 1-hop
Micro: Raw Vector Search	1.9 ms	1K vectors, 128d	Direct storage layer
Micro: Neighbor Access	2.0 ns	1K vertices, 50-deg	CSR slice lookup

Test Environment¶

All benchmarks run on a single machine using Criterion 0.5 with harness = false. Each measurement uses 10 samples (comprehensive/mutation/algo suites) or 100 samples (micro suite). Datasets use in-memory or temp-dir storage with auto_flush_threshold: 100_000.

Benchmark Configurations¶

Three dataset sizes used for scaling analysis:

Config	Vertices	Edges	Edge Density	Properties per Vertex
Small	1,000	1,000	1.0 e/v	3 (name, age, embedding[128])
Medium	5,000	2,500	0.5 e/v	3 (name, age, embedding[128])
Large	8,000	12,000	1.5 e/v	3 (name, age, embedding[128])

Schema: Person label with name (String), age (Int32), embedding (Vector[128]). Edge type: KNOWS.

Ingestion Performance¶

Cypher INSERT Throughput¶

Per-statement ingestion via CREATE inside a transaction. Each vertex carries a name, age, and 128-dimensional embedding vector.

Method	1K vertices	5K vertices	8K vertices
String interpolation	1.89 s	10.47 s	12.59 s
Parameterized query	1.51 s	8.61 s	13.14 s
Per-vertex rate (parameterized)	662 v/s	581 v/s	609 v/s

Parameterized queries are ~20% faster at 1K scale because the plan cache can reuse parsed/planned queries and only substitute parameter values. At 8K the gap narrows as L0 buffer pressure dominates.

Flush Performance (L0 to L1)¶

Flush converts the in-memory L0 buffer to persistent Lance columnar storage:

Dataset	Flush Time	Throughput
1K vertices + 500 edges	748 ms	~2.0K entities/sec
5K vertices + 1.25K edges	1.19 s	~5.3K entities/sec
8K vertices + 6K edges	1.54 s	~9.1K entities/sec

Flush time scales linearly with entity count. Throughput improves with batch size due to amortized Arrow serialization and Lance write overhead.

Mutation Performance¶

Benchmarked with in-memory Uni instances, 100-node operations per measurement:

Operation	Time	Per-Op Rate
CREATE 100 nodes	137.8 ms	1.38 ms/node
SET 100 properties	6.7 ms	67 us/prop
DELETE 100 nodes	3.5 ms	35 us/node
CREATE 50 + MATCH	74.1 ms	1.48 ms/create + query
MERGE 100 nodes (50 create + 50 match)	336.8 ms	3.37 ms/merge

Key observations:

MERGE is ~2.4x slower than CREATE because each MERGE must first check existence (read + conditional write), while CREATE is write-only.
SET and DELETE are fast because they operate on already-indexed vertices in the L0 buffer.
Transaction overhead is included in all measurements (tx begin + commit).

Query Performance¶

Point Lookup¶

Single-vertex retrieval by string property after bulk insert + flush:

Dataset	Latency	Notes
1K vertices	3.01 ms	Scan-based (no BTree index)
5K vertices	2.92 ms	Nearly constant
8K vertices	3.44 ms	Nearly constant

Point lookup scales approximately O(1) with dataset size — the query planner pushes the filter predicate into the Lance scan, enabling predicate pushdown at the storage layer.

Aggregation¶

Full-scan aggregations over all vertices:

Query	1K n	5K n	8K n	Scaling
`COUNT(n)`	25.9 ms	108.1 ms	186.1 ms	Linear
`GROUP BY n.age, COUNT(n)`	26.4 ms	108.7 ms	181.8 ms	Linear
`AVG(n.age)`	3.10 ms	2.93 ms	3.18 ms	Constant

COUNT and GROUP BY require a full table scan — 8x more data yields ~7x more time. AVG uses columnar statistics and returns in constant time regardless of dataset size.

Indexed Queries¶

Queries using BTree, HNSW, or fulltext indexes created via Cypher DDL:

Query Type	1K n	5K n	8K n	Scaling
Scalar equality (`age = 25`)	2.82 ms	3.08 ms	3.36 ms	~Constant
Scalar range (`age >= 20 AND age <= 30`)	3.07 ms	3.45 ms	3.78 ms	~Constant
Vector KNN (k=10, HNSW)	6.89 ms	9.85 ms	11.29 ms	Sub-linear
Fulltext match (exact name)	3.01 ms	3.26 ms	3.27 ms	~Constant

BTree and fulltext indexes deliver nearly constant-time lookups. Vector KNN scales sub-linearly (O(log n)) due to the HNSW graph structure.

Index Creation¶

Time to build an index on an already-populated and flushed dataset:

Index Type	1K n	5K n	8K n
Scalar BTree	1.47 ms	2.08 ms	3.18 ms
Vector HNSW	1.44 ms	2.13 ms	2.76 ms
Fulltext	1.36 ms	2.17 ms	2.93 ms

Index creation scales linearly with dataset size. All three index types have comparable build times at these scales.

Order By + Limit¶

MATCH (n:Person) RETURN n.name, n.age ORDER BY ... LIMIT 10:

Sort	1K n	5K n	8K n
`ORDER BY n.age`	2.93 ms	3.45 ms	3.97 ms
`ORDER BY n.name DESC`	2.98 ms	3.82 ms	4.60 ms

Sub-linear scaling thanks to the LIMIT 10 clause — the engine uses a top-N heap instead of a full sort.

Graph Traversal Performance¶

Cypher Traversal (via Comprehensive Suite)¶

Starting from a single vertex (Person_0), following KNOWS edges:

Hops	1K n / 1K e	5K n / 2.5K e	8K n / 12K e	Notes
1-hop	4.75 ms	4.31 ms	5.08 ms	~Constant
2-hop	4.86 ms	4.83 ms	4.89 ms	~Constant
3-hop	4.74 ms	4.13 ms	4.98 ms	~Constant

Traversal from a single start vertex stays flat regardless of total graph size — the working set (reachable neighbors) is determined by local topology, not total vertex count.

Low-Level Traversal (via Micro Suite)¶

Direct storage-layer traversal benchmarks, bypassing the Cypher parser/planner:

Benchmark	Time	Notes
1-hop, 100-node chain	750 us	`load_subgraph` from storage
Neighbor iteration (50-degree vertex)	9.71 ns	CSR in-memory iteration
Neighbor access (50-degree vertex)	2.00 ns	Slice pointer lookup

The CSR adjacency structure provides near-zero-cost neighbor access — a single slice lookup followed by sequential memory reads.

SimpleGraph Construction¶

In-memory graph building performance (10K vertices, 50K edges):

Strategy	Time	Notes
Standard (`add_vertex` + `add_edge`)	3.27 ms	Hash map lookups per insert
Optimized (`with_capacity` + `add_edge_unchecked`)	2.53 ms	Pre-allocated, no validation

Pre-allocation saves ~23% by avoiding hash map resizing.

Vector Search Performance¶

Raw Vector Search (via Micro Suite)¶

Direct storage-layer vector search, bypassing Cypher:

Dataset	Dimensions	k	Latency
1K vectors	128	10	1.92 ms

Cypher Vector Search (via Comprehensive Suite)¶

Vector similarity filter via Cypher (WHERE vector_similarity(...) > 0.8):

Dataset	Latency	Notes
1K n	6.12 ms	Includes Cypher parse + plan
5K n	6.91 ms	Sub-linear growth
8K n	7.80 ms	Sub-linear growth

Indexed KNN¶

CALL uni.vector.query('Person', 'embedding', $vec, 10) with HNSW index:

Dataset	Latency	Notes
1K n	6.89 ms	HNSW index + Cypher overhead
5K n	9.85 ms	O(log n) scaling
8K n	11.29 ms	O(log n) scaling

Hybrid Query Performance¶

Combined vector similarity + graph traversal query:

MATCH (a:Person)-[:KNOWS]->(b:Person)
WHERE vector_similarity(a.embedding, $vec) > 0.8 AND b.age >= 1
RETURN b.name

Dataset	Latency	Notes
1K n / 1K e	13.73 ms	Low edge density (1.0 e/v)
5K n / 2.5K e	26.30 ms	Low edge density (0.5 e/v)
8K n / 12K e	92.45 ms	Higher density (1.5 e/v)

Hybrid queries show super-linear growth with edge density. At 8K/12K, the vector filter matches more start vertices (8x more nodes) and each has more outgoing edges (1.5x density), multiplying the traversal fan-out. This is the most edge-density-sensitive workload.

Graph Algorithm Performance¶

All algorithms benchmarked on random graphs with 1,000 nodes and 5 edges per node (avg degree = 5). Execution includes Cypher parsing, planning, graph projection, and algorithm execution.

Algorithm	Time	Complexity	Notes
PageRank	7.24 ms	O(E) per iteration	Iterative convergence
WCC	5.83 ms	O(V + E)	Union-Find with path compression
SCC	6.07 ms	O(V + E)	Tarjan's algorithm
Louvain	10.09 ms	O(E) per iteration	Multi-level community detection
Label Propagation	10.45 ms	O(E) per iteration	Fast community detection
Betweenness	7.68 ms	O(VE)	Sampling (100 sources)
Closeness	7.68 ms	O(VE)	BFS from all vertices
Node Similarity	12.05 ms	O(V^2)	Jaccard on neighbor sets
Triangle Count	6.41 ms	O(E^1.5)	Set intersection
K-Core	6.43 ms	O(V + E)	Iterative degree pruning
Random Walk	7.17 ms	O(V * walk_length)	5 steps, 1 walk per node

At 1K nodes / 5K edges, most algorithms complete in 6-12 ms including full Cypher overhead (parse → plan → project → execute → collect). The graph projection step (loading CSR from storage) is amortized across the algorithm execution.

Scaling Analysis¶

What Scales Linearly (Full-Scan Bound)¶

These operations must touch every row and scale proportionally with data size:

Operation	1K	5K	8K	Growth Factor (1K→8K)
COUNT aggregation	25.9 ms	108.1 ms	186.1 ms	7.2x (data: 8x)
GROUP BY aggregation	26.4 ms	108.7 ms	181.8 ms	6.9x (data: 8x)
Flush L0→L1	748 ms	1.19 s	1.54 s	2.1x (data: 8x)
Cypher ingestion	1.51 s	8.61 s	13.14 s	8.7x (data: 8x)

What Stays Constant (Index/Cache Bound)¶

These operations use indexes or local graph structure and are independent of total data size:

Operation	1K	5K	8K	Growth Factor (1K→8K)
Point lookup	3.01 ms	2.92 ms	3.44 ms	1.1x
Scalar equality (indexed)	2.82 ms	3.08 ms	3.36 ms	1.2x
Fulltext match (indexed)	3.01 ms	3.26 ms	3.27 ms	1.1x
AVG aggregation	3.10 ms	2.93 ms	3.18 ms	1.0x
1-hop traversal	4.75 ms	4.31 ms	5.08 ms	1.1x
3-hop traversal	4.74 ms	4.13 ms	4.98 ms	1.1x

What Scales Sub-Linearly (O(log n))¶

Operation	1K	5K	8K	Growth Factor (1K→8K)
Vector KNN (HNSW)	6.89 ms	9.85 ms	11.29 ms	1.6x (data: 8x)
Vector similarity filter	6.12 ms	6.91 ms	7.80 ms	1.3x (data: 8x)
ORDER BY + LIMIT	2.93 ms	3.45 ms	3.97 ms	1.4x (data: 8x)

What Scales Super-Linearly (Edge-Density Sensitive)¶

Operation	1K n/1K e	5K n/2.5K e	8K n/12K e	Notes
Hybrid vector+graph	13.73 ms	26.30 ms	92.45 ms	6.7x for 8x nodes + 12x edges

The hybrid query is sensitive to both node count (more vector matches) and edge density (more traversal fan-out per match). This is expected and can be mitigated with tighter vector thresholds or earlier LIMIT clauses.

Running Benchmarks¶

Benchmark Suites¶

Suite	File	Focus
`comprehensive`	`crates/uni/benches/comprehensive.rs`	Public API: ingestion, queries, indexes, traversal
`mutation_benchmarks`	`crates/uni/benches/mutation_benchmarks.rs`	CREATE, SET, DELETE, MERGE
`algo_benchmarks`	`crates/uni/benches/algo_benchmarks.rs`	Graph algorithms (PageRank, WCC, etc.)
`micro_benchmarks`	`crates/uni/benches/micro_benchmarks.rs`	Low-level: storage, CSR, vector search
`pushdown_performance`	`crates/uni/benches/pushdown_performance.rs`	Property pushdown (placeholder)

Running¶

# Run all benchmarks
cargo bench

# Run a specific suite
cargo bench --bench comprehensive
cargo bench --bench algo_benchmarks
cargo bench --bench mutation_benchmarks
cargo bench --bench micro_benchmarks

# Run with custom dataset size (comprehensive suite)
BENCH_NODES=5000 BENCH_EDGES=2500 cargo bench --bench comprehensive

# Run only matching benchmarks
BENCH_NODES=1000 cargo bench --bench comprehensive -- traversal

# Algorithm benchmarks with custom graph size
BENCH_NODES=2000 BENCH_EDGES_PER_NODE=10 cargo bench --bench algo_benchmarks

# Save baseline and compare
cargo bench -- --save-baseline main
# ... make changes ...
cargo bench -- --baseline main

# View HTML reports
open target/criterion/report/index.html

Profiling¶

# CPU profiling with perf
perf record cargo bench --bench comprehensive -- traversal
perf report

# Flame graphs
cargo flamegraph --bench comprehensive -- --bench

Next Steps¶

Vectorized Execution — Execution engine details
Storage Engine — Storage layer internals