Identity Model¶
Uni uses a sophisticated identity system to balance performance, flexibility, and distributed computing requirements. This document explains the three identity types and their roles.
Overview¶
Every entity in Uni has multiple identifiers serving different purposes:
| Identity | Bits | Purpose | Locality |
|---|---|---|---|
| VID/EID | 64 | Internal array indexing | Local to database |
| ext_id | Variable | User-provided external ID | User-defined |
| UniId | 256 | Content-addressed provenance | Global / distributed |
┌─────────────────────────────────────────────────────────────────────────────┐
│ VERTEX IDENTITY STACK │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ User-Facing (ext_id) │ │
│ │ "paper_abc123" │ │
│ │ Human-readable, stable across imports │ │
│ └───────────────────────────────┬─────────────────────────────────────┘ │
│ │ resolves to │
│ ┌───────────────────────────────▼─────────────────────────────────────┐ │
│ │ Internal (VID) │ │
│ │ 0x0001_0000_0000_002A │ │
│ │ Fast array indexing, label-encoded │ │
│ └───────────────────────────────┬─────────────────────────────────────┘ │
│ │ content-hashes to │
│ ┌───────────────────────────────▼─────────────────────────────────────┐ │
│ │ Provenance (UniId) │ │
│ │ bafkreihdwdcefgh4dqkjv67uzcmw7o... │ │
│ │ Content-addressed, CRDT-compatible │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Vertex ID (VID)¶
The Vertex ID is a 64-bit packed integer optimized for O(1) array indexing.
Encoding¶
┌────────────────────────────────────────────────────────────────────────────┐
│ VID (64 bits) │
├──────────────────┬─────────────────────────────────────────────────────────┤
│ label_id (16) │ local_offset (48) │
├──────────────────┴─────────────────────────────────────────────────────────┤
│ │
│ Example: 0x0001_0000_0000_002A │
│ ────── ────────────── │
│ label offset │
│ (Paper) (42) │
│ │
└────────────────────────────────────────────────────────────────────────────┘
| Component | Bits | Range | Purpose |
|---|---|---|---|
label_id |
16 | 0 - 65,535 | Identifies vertex type (label) |
local_offset |
48 | 0 - 281 trillion | Per-label sequential offset |
Usage in Code¶
use uni::core::Vid;
// Create a VID
let vid = Vid::new(1, 42); // label_id=1 (Paper), offset=42
// Access components
assert_eq!(vid.label_id(), 1);
assert_eq!(vid.local_offset(), 42);
assert_eq!(vid.as_u64(), 0x0001_0000_0000_002A);
// Parse from u64
let vid = Vid::from_u64(0x0001_0000_0000_002A);
Why This Design?¶
- O(1) Array Indexing: The offset directly indexes into per-label property arrays
- Label Partitioning: Queries on a single label only scan that label's data
- Dense Storage: Offsets are sequential, enabling compact columnar storage
- Type Safety: Label ID embedded in VID prevents cross-label confusion
Capacity¶
| Component | Maximum | Practical Limit |
|---|---|---|
| Labels | 65,535 | Typically 10-100 |
| Vertices per label | 281 trillion | Limited by storage |
| Total vertices | 18 quintillion | Theoretical max |
Edge ID (EID)¶
Edge IDs follow the same packed 64-bit structure as VIDs.
Encoding¶
┌────────────────────────────────────────────────────────────────────────────┐
│ EID (64 bits) │
├──────────────────┬─────────────────────────────────────────────────────────┤
│ type_id (16) │ local_offset (48) │
├──────────────────┴─────────────────────────────────────────────────────────┤
│ │
│ Example: 0x0002_0000_0000_0015 │
│ ────── ────────────── │
│ type offset │
│ (CITES) (21) │
│ │
└────────────────────────────────────────────────────────────────────────────┘
Usage¶
use uni::core::Eid;
let eid = Eid::new(2, 21); // type_id=2 (CITES), offset=21
assert_eq!(eid.type_id(), 2);
assert_eq!(eid.local_offset(), 21);
External ID (ext_id)¶
The external ID is a user-provided string identifier for human readability.
Characteristics¶
| Property | Description |
|---|---|
| Type | UTF-8 String |
| Uniqueness | Unique within a label |
| Mutability | Immutable after creation |
| Indexing | Automatically indexed for lookups |
Usage¶
// Create with external ID
CREATE (p:Paper {id: "arxiv:2106.09685", title: "LoRA"})
// Query by external ID
MATCH (p:Paper {id: "arxiv:2106.09685"})
RETURN p.title
Resolution¶
External IDs are resolved to VIDs at query time:
ext_id "arxiv:2106.09685"
│
▼
┌─────────────────┐
│ ext_id Index │ (BTree index on ext_id column)
└────────┬────────┘
│
▼
VID 0x0001_0000_0000_002A
UniId¶
The UniId is a content-addressed identifier for distributed systems and provenance tracking.
Characteristics¶
| Property | Description |
|---|---|
| Algorithm | SHA3-256 |
| Encoding | Multibase (base32) |
| Length | 44 characters |
| Determinism | Same content → same UID |
Format¶
┌────────────────────────────────────────────────────────────────────────────┐
│ UniId Structure │
├────────────────────────────────────────────────────────────────────────────┤
│ │
│ Multibase prefix: 'b' (base32 lowercase) │
│ │
│ Example: bafkreihdwdcefgh4dqkjv67uzcmw7ojee6xedzdetojuzjevtenxquvyku │
│ ─ │
│ └── multibase prefix │
│ ──────────────────────────────────────────────────────── │
│ └── base32 encoded SHA3-256 hash (43 chars) │
│ │
└────────────────────────────────────────────────────────────────────────────┘
Generation¶
UniId is computed from vertex content:
use uni::core::UniId;
use sha3::{Sha3_256, Digest};
// Content to hash
let content = serde_json::json!({
"label": "Paper",
"properties": {
"title": "Attention Is All You Need",
"year": 2017
}
});
// Compute SHA3-256
let mut hasher = Sha3_256::new();
hasher.update(content.to_string().as_bytes());
let hash = hasher.finalize();
// Create UniId
let uid = UniId::from_bytes(&hash);
println!("{}", uid.to_multibase()); // bafkrei...
Use Cases¶
- Content Deduplication: Same content always produces same UID
- Distributed Sync: UIDs are globally unique without coordination
- Audit Trail: Track data provenance across systems
- CRDT Integration: UIDs enable conflict-free replication across distributed nodes
ID Resolution¶
VID Lookup by ext_id¶
Resolution path: 1. Parse pattern → extract ext_id value 2. Query ext_id index → get VID 3. Load vertex data using VID offset
VID Lookup by UniId¶
Resolution path: 1. Query UID index (separate Lance dataset) 2. Get VID from index 3. Load vertex data using VID
Direction Enum¶
For edge traversal, Uni uses a Direction enum:
pub enum Direction {
Outgoing, // Source → Destination
Incoming, // Destination ← Source
Both, // Either direction
}
Cypher Syntax Mapping¶
| Cypher Pattern | Direction |
|---|---|
(a)-[:TYPE]->(b) |
Outgoing from a |
(a)<-[:TYPE]-(b) |
Incoming to a |
(a)-[:TYPE]-(b) |
Both |
ID Allocation¶
IDs are allocated by the IdAllocator:
pub struct IdAllocator {
label_counters: DashMap<u16, AtomicU64>,
edge_type_counters: DashMap<u16, AtomicU64>,
}
impl IdAllocator {
pub fn allocate_vid(&self, label_id: u16) -> Vid {
let offset = self.label_counters
.entry(label_id)
.or_insert(AtomicU64::new(0))
.fetch_add(1, Ordering::SeqCst);
Vid::new(label_id, offset)
}
pub fn allocate_eid(&self, type_id: u16) -> Eid {
// Similar for edges
}
}
Allocation Properties: - Thread-safe via atomic operations - Sequential within each label/type - Persisted on flush for recovery - Never reuses IDs (even after deletes)
Storage Layout¶
UID Index Structure¶
indexes/uid_to_vid/{label}/index.lance
├── _uid: FixedSizeBinary(32) // SHA3-256 hash bytes
└── _vid: UInt64 // Corresponding VID
Resolution Performance¶
| Lookup Type | Index | Complexity | Typical Latency |
|---|---|---|---|
| VID direct | None | O(1) | ~10µs |
| ext_id | BTree | O(log n) | ~100µs |
| UniId | BTree | O(log n) | ~100µs |
| Full scan | None | O(n) | Varies |
Best Practices¶
Choosing External IDs¶
✓ Good: "user_12345", "arxiv:2106.09685", "isbn:978-0134685991"
✗ Bad: Sequential integers (use VID instead), UUIDs (use UniId)
When to Use Each ID¶
| Use Case | Recommended ID |
|---|---|
| Internal operations | VID |
| API responses | ext_id |
| Cross-system sync | UniId |
| Human debugging | ext_id |
| Array indexing | VID offset |
Next Steps¶
- Data Model — Vertices, edges, and properties
- Indexing — Index types and configuration
- Architecture — System overview