Storage Engine Internals¶
Uni's storage engine is architected for high-throughput ingestion and low-latency analytics, leveraging a tiered LSM-tree-like structure backed by Lance columnar format. This design enables both OLTP-style writes and OLAP-style analytical queries on the same data.
Architecture Overview¶
Tiered Storage Model¶
L0: Memory Buffer¶
The L0 layer handles all incoming writes with maximum throughput.
pub struct L0Buffer {
/// In-memory graph structure (SimpleGraph)
pub graph: SimpleGraph,
/// Edge tombstones with version tracking
pub tombstones: HashMap<Eid, TombstoneEntry>,
/// Vertex tombstones
pub vertex_tombstones: HashSet<Vid>,
/// Edge version tracking
pub edge_versions: HashMap<Eid, u64>,
/// Vertex version tracking
pub vertex_versions: HashMap<Vid, u64>,
/// Edge properties (separate from graph structure)
pub edge_properties: HashMap<Eid, Properties>,
/// Vertex properties (separate from graph structure)
pub vertex_properties: HashMap<Vid, Properties>,
/// Edge endpoint mapping (eid -> (src, dst, type_id))
pub edge_endpoints: HashMap<Eid, (Vid, Vid, u16)>,
/// Current version number
pub current_version: u64,
/// Mutation counter for flush triggering
pub mutation_count: usize,
/// Write-ahead log reference
pub wal: Option<Arc<WriteAheadLog>>,
/// WAL LSN at last flush
pub wal_lsn_at_flush: u64,
}
Characteristics:
| Property | Value | Notes |
|---|---|---|
| Format | SimpleGraph + HashMap | Row-oriented for inserts |
| Durability | WAL-backed | Survives crashes |
| Latency | ~550ยตs per 1K writes | Memory-speed |
| Capacity | Configurable (default 128MB) | Auto-flush when full |
Write Path:
1. Acquire write lock (single-writer)
2. Append to WAL (sync or async based on config)
3. Insert into SimpleGraph (vertex/edge)
4. Store properties in HashMap
5. Increment mutation counter
6. If threshold reached โ trigger async flush
Auto-Flush Triggers¶
L0 buffer is automatically flushed to L1 (Lance storage) based on two configurable triggers:
| Trigger | Default | Description |
|---|---|---|
| Mutation Count | 10,000 | Flush when mutations exceed threshold |
| Time Interval | 5 seconds | Flush after time elapsed (if mutations > 0) |
Configuration:
let config = UniConfig {
// Flush on mutation count (high-transaction systems)
auto_flush_threshold: 10_000,
// Flush on time interval (low-transaction systems)
auto_flush_interval: Some(Duration::from_secs(5)),
// Minimum mutations before time-based flush triggers
auto_flush_min_mutations: 1,
..Default::default()
};
Flush Decision Logic:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ AUTO-FLUSH DECISION โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ After each write: โ
โ โ
โ mutations >= 10,000? โโโโโYesโโโโโบ FLUSH IMMEDIATELY โ
โ โ โ
โ No โ
โ โ โ
โ โผ โ
โ Background timer (every 5s): โ
โ โ
โ time_since_last_flush >= 5s โ
โ AND mutations >= min_mutations? โโโโโYesโโโโโบ FLUSH โ
โ โ โ
โ No โ
โ โ โ
โ โผ โ
โ Continue buffering โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Interval Tradeoffs:
| Interval | Min Mutations | Data at Risk | I/O Overhead | Use Case |
|---|---|---|---|---|
| 1s | 1 | ~1s | High | Critical data, local storage |
| 5s | 1 | ~5s | Moderate | Default |
| 30s | 100 | ~30s | Low | Cost-sensitive cloud workloads |
| None | - | All unflushed | None | Legacy behavior, WAL-only |
WAL Still Provides Crash Recovery
Regardless of flush interval, the Write-Ahead Log ensures no committed data is lost on crash. Time-based flush determines when data reaches L1/cloud storage for query visibility and durability.
Delta Layer¶
When L0 fills up, it flushes to the Delta layer as Lance datasets.
pub struct DeltaDataset {
/// Lance dataset for deltas
dataset: Arc<Dataset>,
/// Edge type this delta is for
edge_type: EdgeTypeId,
/// Direction (outgoing or incoming)
direction: Direction,
}
pub struct L0Manager {
/// Current L0 buffer
buffer: Arc<RwLock<L0Buffer>>,
/// Storage backend
store: Arc<dyn ObjectStore>,
/// Schema manager reference
schema: Arc<SchemaManager>,
/// Configuration
config: L0ManagerConfig,
}
Flush Process:
L2: Base Layer¶
The L2 layer contains fully compacted, indexed data.
pub struct L2Layer {
/// Main vertex dataset (per label)
vertex_datasets: HashMap<LabelId, Dataset>,
/// Main edge dataset (per type)
edge_datasets: HashMap<EdgeTypeId, Dataset>,
/// Adjacency datasets (per edge type + direction)
adjacency_datasets: HashMap<(EdgeTypeId, Direction), Dataset>,
/// Vector indexes
vector_indexes: HashMap<IndexId, VectorIndex>,
/// Scalar indexes
scalar_indexes: HashMap<IndexId, ScalarIndex>,
}
Compaction Process:
Lance Integration¶
Uni uses Lance as its core columnar format.
Why Lance?¶
| Feature | Benefit |
|---|---|
| Native Vector Support | Built-in HNSW, IVF_PQ indexes |
| Versioning | Time-travel, ACID transactions |
| Fast Random Access | O(1) row lookup by index |
| Columnar Scans | Efficient analytical queries |
| Object Store Native | S3/GCS support built-in |
| Zero-Copy | Arrow-compatible memory layout |
Lance File Format¶
Lance Dataset Structure:
data/
โโโ _versions/ # Version metadata
โ โโโ 1.manifest # Version 1 manifest
โ โโโ 2.manifest # Version 2 manifest
โ โโโ ...
โโโ _indices/ # Index data
โ โโโ vector_idx_001/ # Vector index
โ โ โโโ index.idx
โ โ โโโ ...
โ โโโ scalar_idx_002/ # Scalar index
โโโ data/ # Column data
โโโ part-0.lance # Data fragment
โโโ part-1.lance
โโโ ...
Data Fragment Structure¶
pub struct LanceFragment {
/// Fragment ID
id: u64,
/// Row range in this fragment
row_range: Range<u64>,
/// Physical files for each column
columns: HashMap<String, ColumnFiles>,
/// Fragment-level statistics
stats: FragmentStatistics,
}
pub struct FragmentStatistics {
/// Row count
num_rows: u64,
/// Per-column statistics
column_stats: HashMap<String, ColumnStats>,
}
pub struct ColumnStats {
null_count: u64,
min_value: Option<ScalarValue>,
max_value: Option<ScalarValue>,
distinct_count: Option<u64>,
}
Dataset Organization¶
Vertex Datasets¶
One Lance dataset per vertex label:
storage/
โโโ vertices/
โ โโโ Paper/ # :Paper vertices
โ โ โโโ _versions/
โ โ โโโ _indices/
โ โ โ โโโ embedding_hnsw/ # Vector index
โ โ โ โโโ year_btree/ # Scalar index
โ โ โโโ data/
โ โโโ Author/ # :Author vertices
โ โ โโโ ...
โ โโโ Venue/ # :Venue vertices
โ โโโ ...
Vertex Schema:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ VERTEX DATASET SCHEMA โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ System Columns: โ
โ โโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ Column โ Type โ Description โโ
โ โโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโ
โ โ _vid โ UInt64 โ Internal vertex ID (label bits + ID) โโ
โ โ _uid โ Binary(32) โ UniId (32-byte SHA3-256) - optional โโ
โ โ _deleted โ Bool โ Tombstone marker (soft delete) โโ
โ โ _version โ UInt64 โ Last modified version โโ
โ โ ext_id โ String โ External ID (extracted from props) โโ
โ โ _created_at โ Timestamp โ Creation timestamp โโ
โ โ _updated_at โ Timestamp โ Last update timestamp โโ
โ โโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โ User Properties (schema-defined): โ
โ โโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ title โ String โ Paper title โ โ
โ โ year โ Int32 โ Publication year โ โ
โ โ abstract โ String โ Paper abstract (nullable) โ โ
โ โ embeddingโ Vector โ 768-dimensional embedding โ โ
โ โ _doc โ Json โ Document mode flexible fields (deprecated) โ โ
โ โโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ Schemaless Properties: โ
โ โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ overflow_json โ LargeBinary โ JSONB binary for non-schema props โ โ
โ โ โ (JSONB) โ - Automatically queryable โ โ
โ โ โ โ - Query rewriting to JSON functions โ โ
โ โ โ โ - PostgreSQL-compatible format โ โ
โ โโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Edge Datasets¶
One Lance dataset per edge type:
storage/
โโโ edges/
โ โโโ CITES/ # :CITES edges
โ โ โโโ ...
โ โโโ AUTHORED_BY/ # :AUTHORED_BY edges
โ โ โโโ ...
โ โโโ PUBLISHED_IN/
โ โโโ ...
Edge Schema:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ EDGE DATASET SCHEMA โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ System Columns: โ
โ โโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ Column โ Type โ Description โโ
โ โโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโ
โ โ _eid โ UInt64 โ Internal edge ID (type bits + ID) โโ
โ โ _src_vid โ UInt64 โ Source vertex VID โโ
โ โ _dst_vid โ UInt64 โ Destination vertex VID โโ
โ โ _deleted โ Bool โ Tombstone marker โโ
โ โ _version โ UInt64 โ Last modified version โโ
โ โ _created_at โ Timestamp โ Creation timestamp โโ
โ โ _updated_at โ Timestamp โ Last update timestamp โโ
โ โโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โ Edge Properties (schema-defined): โ
โ โโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ weight โ Float64 โ Edge weight/score โ โ
โ โ position โ Int32 โ Author position (for AUTHORED_BY) โ โ
โ โ timestampโ Timestampโ When the edge was created โ โ
โ โโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ Schemaless Properties: โ
โ โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ overflow_json โ LargeBinary โ JSONB binary for non-schema props โ โ
โ โ โ (JSONB) โ - Same as vertex overflow support โ โ
โ โโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Adjacency Datasets¶
Optimized for graph traversal (CSR-style):
storage/
โโโ adjacency/
โ โโโ CITES_OUT/ # Outgoing CITES edges
โ โ โโโ ...
โ โโโ CITES_IN/ # Incoming CITES edges (reverse)
โ โ โโโ ...
โ โโโ ...
Adjacency Schema:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ADJACENCY DATASET SCHEMA โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Chunked CSR Format (one row per chunk of vertices): โ
โ โโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Column โ Type โ Description โ โ
โ โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ
โ โ chunk_id โ UInt64 โ Chunk identifier โ โ
โ โ vid_start โ UInt64 โ First VID in chunk โ โ
โ โ vid_end โ UInt64 โ Last VID in chunk (exclusive) โ โ
โ โ offsets โ List<UInt64> โ CSR offsets (chunk_size + 1 elements) โ โ
โ โ neighbors โ List<UInt64> โ Neighbor VIDs (flattened) โ โ
โ โ edge_ids โ List<UInt64> โ Edge IDs (parallel to neighbors) โ โ
โ โโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ Example (chunk_size=1000): โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ chunk_id: 0 โ โ
โ โ vid_start: 0, vid_end: 1000 โ โ
โ โ offsets: [0, 3, 3, 7, 10, ...] (1001 elements) โ โ
โ โ neighbors: [v5, v12, v99, v4, v6, v8, v42, ...] โ โ
โ โ edge_ids: [e1, e2, e3, e4, e5, e6, e7, ...] โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Write-Ahead Log (WAL)¶
The WAL ensures durability for uncommitted L0 data.
WAL Structure¶
pub struct WriteAheadLog {
/// Object store-backed WAL
store: Arc<dyn ObjectStore>,
/// WAL prefix/path
prefix: Path,
/// In-memory state (buffer + LSN tracking)
state: Mutex<WalState>,
}
struct WalState {
buffer: Vec<Mutation>,
next_lsn: u64,
flushed_lsn: u64,
}
/// WAL segment with LSN for idempotent recovery
pub struct WalSegment {
pub lsn: u64,
pub mutations: Vec<Mutation>,
}
WAL Mutation Types¶
The WAL records mutations using the following enum:
pub enum Mutation {
/// Insert a new edge
InsertEdge {
src_vid: Vid,
dst_vid: Vid,
edge_type: u16,
eid: Eid,
version: u64,
properties: Properties,
},
/// Delete an existing edge
DeleteEdge {
eid: Eid,
src_vid: Vid,
dst_vid: Vid,
edge_type: u16,
version: u64,
},
/// Insert a new vertex
InsertVertex {
vid: Vid,
properties: Properties,
},
/// Delete an existing vertex
DeleteVertex {
vid: Vid,
},
}
Note: The label_id is encoded in the VID itself, so InsertVertex doesn't need a separate label_id field.
Recovery Process¶
impl Wal {
pub async fn recover(&self, l0: &mut L0Buffer) -> Result<()> {
// Find all WAL segments
let segments = self.list_segments()?;
for segment in segments {
let reader = WalReader::open(&segment)?;
while let Some(entry) = reader.next_entry()? {
// Verify CRC
if !entry.verify_crc() {
// Truncate at corruption point
break;
}
// Replay entry
match entry.entry_type {
EntryType::InsertVertex { vid, label_id, props } => {
l0.insert_vertex(vid, label_id, props)?;
}
EntryType::InsertEdge { eid, src, dst, type_id, props } => {
l0.insert_edge(eid, src, dst, type_id, props)?;
}
// ... handle other types
}
}
}
Ok(())
}
}
Snapshot Management¶
Snapshots provide consistent point-in-time views.
Manifest Structure¶
{
"version": 42,
"timestamp": "2024-01-15T10:30:00Z",
"schema_version": 1,
"vertex_datasets": {
"Paper": {
"lance_version": 15,
"row_count": 1000000,
"size_bytes": 524288000
},
"Author": {
"lance_version": 8,
"row_count": 250000,
"size_bytes": 62500000
}
},
"edge_datasets": {
"CITES": {
"lance_version": 12,
"row_count": 5000000,
"size_bytes": 200000000
}
},
"adjacency_datasets": {
"CITES_OUT": { "lance_version": 12 },
"CITES_IN": { "lance_version": 12 }
},
"l1_runs": [
{ "sequence": 100, "created_at": "2024-01-15T10:25:00Z" },
{ "sequence": 101, "created_at": "2024-01-15T10:28:00Z" }
],
"indexes": {
"paper_embeddings": {
"type": "hnsw",
"version": 5,
"row_count": 1000000
},
"paper_year": {
"type": "btree",
"version": 3
}
}
}
Snapshot Operations¶
impl StorageManager {
/// Create a new snapshot (after flush)
pub async fn create_snapshot(&self) -> Result<Snapshot> {
let manifest = Manifest {
version: self.next_version(),
timestamp: Utc::now(),
vertex_datasets: self.collect_vertex_versions(),
edge_datasets: self.collect_edge_versions(),
adjacency_datasets: self.collect_adjacency_versions(),
l1_runs: self.l1_manager.list_runs(),
indexes: self.collect_index_versions(),
};
// Write manifest atomically
self.write_manifest(&manifest).await?;
Ok(Snapshot::new(manifest))
}
/// Open a specific snapshot for reading
pub async fn open_snapshot(&self, version: u64) -> Result<SnapshotReader> {
let manifest = self.read_manifest(version).await?;
Ok(SnapshotReader {
manifest,
vertex_readers: self.open_vertex_readers(&manifest).await?,
edge_readers: self.open_edge_readers(&manifest).await?,
adjacency_cache: self.load_adjacency(&manifest).await?,
})
}
}
Index Storage¶
Vector Index Storage¶
Vector indexes (HNSW, IVF_PQ) are stored within Lance datasets:
pub struct VectorIndexStorage {
/// Lance dataset with index
dataset: Dataset,
/// Index configuration
config: VectorIndexConfig,
}
impl VectorIndexStorage {
pub async fn create_index(
dataset: &mut Dataset,
column: &str,
config: VectorIndexConfig,
) -> Result<()> {
match config.index_type {
IndexType::Hnsw => {
dataset.create_index()
.column(column)
.index_type("IVF_HNSW_SQ")
.metric_type(config.metric.to_lance())
.build()
.await?;
}
IndexType::IvfPq => {
dataset.create_index()
.column(column)
.index_type("IVF_PQ")
.metric_type(config.metric.to_lance())
.num_partitions(config.num_partitions)
.num_sub_vectors(config.num_sub_vectors)
.build()
.await?;
}
}
Ok(())
}
}
Scalar Index Storage¶
Scalar indexes use Lance's built-in index support:
pub struct ScalarIndexStorage {
/// Index metadata
metadata: ScalarIndexMetadata,
}
impl ScalarIndexStorage {
pub async fn create_index(
dataset: &mut Dataset,
column: &str,
index_type: ScalarIndexType,
) -> Result<()> {
dataset.create_index()
.column(column)
.index_type(match index_type {
ScalarIndexType::BTree => "BTREE",
ScalarIndexType::Hash => "HASH",
ScalarIndexType::Bitmap => "BITMAP",
})
.build()
.await?;
Ok(())
}
}
Schemaless Properties and Query Rewriting¶
Overview¶
Uni supports schemaless properties - properties not defined in the schema that can be stored and queried dynamically. This enables flexible data modeling without requiring schema migrations.
Storage Mechanism¶
Properties are stored differently based on whether they're in the schema:
| Property Type | Storage Location | Format | Performance |
|---|---|---|---|
| Schema-defined | Typed Arrow columns | Native type (Int32, String, etc.) | โก Fast (columnar) |
| Non-schema | overflow_json column |
JSONB binary | ๐ก Good (parsed) |
JSONB Binary Format¶
The overflow_json column uses JSONB binary format (PostgreSQL-compatible):
// Schema definition
overflow_json: LargeBinary // Not Utf8!
// Encoding (on write)
let overflow_props = build_overflow_json_column(vertices, schema)?;
// Returns JSONB binary, not JSON string
// Decoding (on read)
let raw_jsonb = jsonb::RawJsonb::new(bytes);
let json_string = raw_jsonb.to_string();
let props: HashMap<String, Value> = serde_json::from_str(&json_string)?;
Benefits: - ๐ Faster than JSON strings (binary representation) - ๐ PostgreSQL-compatible (same JSONB format) - โ๏ธ Enables Lance's optimized built-in JSON UDFs
Automatic Query Rewriting¶
When queries access overflow properties, the planner automatically rewrites them to use Lance's JSON functions:
Original Cypher:
If city and age are not in schema, rewritten to:
WHERE json_get_string(p.overflow_json, 'city') = 'NYC'
RETURN p.name, json_get_string(p.overflow_json, 'age')
Supported JSON Functions:
- json_get_string(overflow_json, key) - Extract string value
- json_get_int(overflow_json, key) - Extract integer value
- json_get_float(overflow_json, key) - Extract float value
- json_get_bool(overflow_json, key) - Extract boolean value
Rewriting Algorithm¶
Mixed Schema + Overflow Queries¶
Queries seamlessly mix both types:
// Schema: Person has 'name' property
// Overflow: 'city' and 'verified' not in schema
MATCH (p:Person)
WHERE p.name = 'Alice' -- Typed column (fast)
AND p.city = 'NYC' -- overflow_json (rewritten)
AND p.verified = true -- overflow_json (rewritten)
RETURN p.name, p.city, p.age -- Mixed access
Planner automatically:
1. Identifies schema vs overflow properties
2. Uses typed columns for schema properties
3. Rewrites overflow property access to JSON functions
4. Ensures overflow_json column is materialized in scan
Performance Characteristics¶
| Metric | Schema Properties | Overflow Properties |
|---|---|---|
| Filtering | โก Fast (columnar predicate pushdown) | ๐ก Good (JSONB parsing) |
| Sorting | โก Fast (native type sorting) | ๐ก Slower (extract then sort) |
| Compression | โก Type-specific (5-20x) | ๐ด Limited (binary blob) |
| Indexing | โ Supported (BTree, Vector, etc.) | โ Not indexed |
| Schema Evolution | โ ๏ธ Requires migration | โ No migration needed |
Usage Guidelines¶
Use Schema Properties for: - โ Frequently queried fields (filters, sorts, aggregations) - โ Core data model properties - โ Properties requiring indexes - โ Performance-critical fields
Use Overflow Properties for: - โ Optional/rare properties - โ User-defined metadata - โ Rapidly evolving schemas - โ Prototyping and experimentation
Example: Schemaless Label¶
// Create label without property definitions
db.schema().label("Document").apply().await?;
// Create with arbitrary properties
db.execute("CREATE (:Document {
title: 'Article',
author: 'Alice',
tags: ['tech', 'ai'],
year: 2024
})").await?;
// All properties stored in overflow_json
db.flush().await?;
// Query works transparently (automatic rewriting)
let results = db.query("
MATCH (d:Document)
WHERE d.author = 'Alice' AND d.year > 2020
RETURN d.title, d.tags
").await?;
Example: Mixed Schema + Overflow¶
// Define core properties in schema
db.schema()
.label("Person")
.property("name", DataType::String) // Schema property
.property("age", DataType::Int) // Schema property
.apply().await?;
// Create with schema + overflow properties
db.execute("CREATE (:Person {
name: 'Bob', -- Schema (typed column)
age: 25, -- Schema (typed column)
city: 'NYC', -- Overflow (overflow_json)
verified: true -- Overflow (overflow_json)
})").await?;
db.flush().await?;
// Query mixing both (transparent to user)
let results = db.query("
MATCH (p:Person)
WHERE p.name = 'Bob' -- Fast: typed column
AND p.city = 'NYC' -- Rewritten: json_get_string(...)
RETURN p.age, p.verified -- Mixed: typed + overflow
").await?;
Implementation Details¶
Write Path:
1. Properties split into schema vs non-schema
2. Schema properties โ Typed Arrow columns
3. Non-schema properties โ Serialized to JSONB binary
4. overflow_json column written to Lance dataset
Read Path:
1. Query planner identifies overflow properties
2. Rewrites expressions to json_get_* functions
3. Ensures overflow_json in scan projection
4. PropertyManager decodes JSONB binary
5. Results merged with typed properties
Flush Behavior:
- L0 buffer stores all properties equally (no distinction)
- During flush, properties partitioned by schema
- build_overflow_json_column() serializes non-schema props
- Both typed columns and overflow_json written to Lance
Compaction:
- Overflow properties preserved through L1 โ L2 compaction
- Main vertices table includes all properties in props_json
- Per-label tables maintain separation (typed + overflow)
Object Store Integration¶
Uni uses the object_store crate for storage abstraction, supporting local filesystems and major cloud providers.
Supported Backends¶
| Backend | URI Scheme | Status |
|---|---|---|
| Local filesystem | file:// or path |
Stable |
| Amazon S3 | s3://bucket/path |
Stable |
| Google Cloud Storage | gs://bucket/path |
Stable |
| Azure Blob Storage | az://container/path |
Stable |
| Memory | (in-memory) | Stable (testing) |
Using Local Storage¶
// Standard local storage
let db = Uni::open("./my-database").build().await?;
// Explicit file:// URI
let db = Uni::open("file:///var/data/uni").build().await?;
// In-memory for testing
let db = Uni::in_memory().build().await?;
Cloud Storage¶
Open databases directly from cloud object stores:
// Amazon S3
let db = Uni::open("s3://my-bucket/graph-data").build().await?;
// Google Cloud Storage
let db = Uni::open("gs://my-bucket/graph-data").build().await?;
// Azure Blob Storage
let db = Uni::open("az://my-container/graph-data").build().await?;
Credential Resolution:
Cloud credentials are resolved automatically using standard environment variables and configuration files:
| Provider | Environment Variables | Config Files |
|---|---|---|
| AWS S3 | AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN, AWS_REGION, AWS_ENDPOINT_URL |
~/.aws/credentials |
| GCS | GOOGLE_APPLICATION_CREDENTIALS |
Application Default Credentials |
| Azure | AZURE_STORAGE_ACCOUNT, AZURE_STORAGE_ACCESS_KEY, AZURE_STORAGE_SAS_TOKEN |
Azure CLI credentials |
Hybrid Mode (Local + Cloud)¶
For optimal performance with cloud storage, use hybrid mode to maintain a local write cache:
use uni_common::CloudStorageConfig;
let cloud = CloudStorageConfig::s3_from_env("my-bucket");
let db = Uni::open("./local-cache")
.hybrid("./local-cache", "s3://my-bucket/graph-data")
.cloud_config(cloud)
.build()
.await?;
Hybrid Mode Operation:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ HYBRID MODE โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ WRITES READS โ
โ โโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Client โ โ Query โ โ
โ โโโโโโฌโโโโโ โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ
โ โผ โผ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Local L0 โโโโโโโโโโโโโโโโโโโบโ Merge View โ โ
โ โ (WAL+Buffer)โ โ (Local L0 โช Cloud Storage) โ โ
โ โโโโโโโโฌโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โฒ โ
โ โ flush โ โ
โ โผ โ โ
โ โโโโโโโโโโโโโโโ โ โ
โ โ Cloud โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ (S3/GCS) โ โ
โ โโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Benefits: - Low-latency writes: All mutations go to local WAL + L0 buffer - Durable storage: Data flushed to cloud storage on interval/threshold - Cost efficiency: Minimize cloud API calls through batching - Crash recovery: WAL provides local durability before cloud sync
Performance Characteristics¶
Write Performance¶
| Operation | Latency | Throughput | Notes |
|---|---|---|---|
| L0 insert (vertex) | ~50ยตs | ~20K/sec | Memory only |
| L0 insert (edge) | ~30ยตs | ~33K/sec | Memory only |
| L0 batch insert (1K) | ~550ยตs | ~1.8M/sec | Amortized |
| L0 โ L1 flush | ~6ms/1K | - | Lance write |
| L1 โ L2 compact | ~1s/100K | - | Background |
Read Performance¶
| Operation | Latency | Notes |
|---|---|---|
| Point lookup (indexed) | ~2.9ms | BTree index |
| Range scan (indexed) | ~5ms + 0.1ms/row | B-tree index |
| Full scan | ~50ms/100K rows | Columnar |
| Vector KNN (k=10) | ~1.8ms | HNSW index |
Storage Efficiency¶
| Data Type | Compression | Ratio |
|---|---|---|
| Integers | Dictionary + RLE | 5-20x |
| Strings | Dictionary + LZ4 | 3-10x |
| Vectors | No compression | 1x |
| Booleans | Bitmap | 8x |
Configuration Reference¶
Storage behavior is configured via UniConfig (see the Configuration reference). Key storage-related fields include:
auto_flush_threshold,auto_flush_interval,auto_flush_min_mutationswal_enabledcompaction.*(e.g.,max_l1_runs,max_l1_size_bytes)object_store.*index_rebuild.*
Example:
use std::time::Duration;
use uni_db::UniConfig;
let mut config = UniConfig::default();
config.auto_flush_threshold = 10_000;
config.auto_flush_interval = Some(Duration::from_secs(5));
config.compaction.max_l1_runs = 4;
config.compaction.max_l1_size_bytes = 256 * 1024 * 1024;
Next Steps¶
- Vectorized Execution โ Query execution engine
- Query Planning โ From Cypher to physical plan
- Benchmarks โ Performance measurements