Data Model¶
Uni combines Property Graph, Document, and Vector concepts into a unified data model. This document explains the core entities, their relationships, and how to define schemas.
Core Concepts¶
Uni's data model has three primary entity types:
Vertices (Nodes)¶
Vertices represent entities in your domain. Each vertex has:
| Component | Description | Example |
|---|---|---|
| VID | Internal 64-bit identifier | 0x0001_0000_0000_002A |
| Label(s) | Type classification | :Paper, :Author, :Venue |
| Properties | Key-value attributes | {title: "...", year: 2023} |
| External ID | User-provided identifier | "paper_abc123" |
Labels¶
Labels categorize vertices and determine their schema. A vertex has exactly one primary label (stored in VID encoding).
// Create vertices with labels
CREATE (p:Paper {title: "Attention Is All You Need"})
CREATE (a:Author {name: "Ashish Vaswani"})
CREATE (v:Venue {name: "NeurIPS", year: 2017})
Label Best Practices:
- Use singular nouns: :Paper not :Papers
- Use PascalCase: :ResearchPaper not :research_paper
- Be specific: :AcademicPaper vs generic :Document
Properties¶
Properties are strongly-typed key-value pairs defined in the schema.
{
"Paper": {
"title": { "type": "String", "nullable": false },
"year": { "type": "Int32", "nullable": true },
"abstract": { "type": "String", "nullable": true },
"embedding": { "type": "Vector", "dimensions": 768 },
"metadata": { "type": "Json", "nullable": true }
}
}
Edges (Relationships)¶
Edges connect vertices with typed, directed relationships.
| Component | Description | Example |
|---|---|---|
| EID | Internal 64-bit identifier | 0x0002_0000_0000_0015 |
| Type | Relationship classification | :CITES, :AUTHORED_BY |
| Source | Origin vertex (VID) | Paper vertex |
| Destination | Target vertex (VID) | Author vertex |
| Properties | Edge attributes | {position: 1, role: "lead"} |
Edge Direction¶
Edges are always directed (source → destination):
// Paper cites another Paper
(paper1:Paper)-[:CITES]->(paper2:Paper)
// Paper is authored by Author
(paper:Paper)-[:AUTHORED_BY]->(author:Author)
// Query in either direction
MATCH (a:Author)<-[:AUTHORED_BY]-(p:Paper) // Incoming to Author
MATCH (p:Paper)-[:AUTHORED_BY]->(a:Author) // Outgoing from Paper
Edge Type Constraints¶
Edge types can constrain which label combinations are valid:
{
"edge_types": {
"CITES": {
"id": 1,
"src_labels": ["Paper"],
"dst_labels": ["Paper"]
},
"AUTHORED_BY": {
"id": 2,
"src_labels": ["Paper"],
"dst_labels": ["Author"]
}
}
}
Edge Properties¶
Edges can carry their own properties:
CREATE (p:Paper)-[:AUTHORED_BY {position: 1, contribution: "lead"}]->(a:Author)
MATCH (p:Paper)-[e:AUTHORED_BY]->(a:Author)
WHERE e.position = 1
RETURN p.title, a.name
Data Types¶
Uni supports a rich set of data types for properties:
Primitive Types¶
| Type | Size | Range / Description | Example |
|---|---|---|---|
String |
Variable | UTF-8 text | "Hello, World" |
Int32 |
4 bytes | -2³¹ to 2³¹-1 | 42 |
Int64 |
8 bytes | -2⁶³ to 2⁶³-1 | 9223372036854775807 |
Float32 |
4 bytes | IEEE 754 single precision | 3.14159 |
Float64 |
8 bytes | IEEE 754 double precision | 3.141592653589793 |
Bool |
1 byte | true / false | true |
Timestamp |
8 bytes | Microsecond precision UTC | "2024-01-15T10:30:00Z" |
Complex Types¶
| Type | Description | Example |
|---|---|---|
Json |
Structured JSON document | {"nested": {"key": [1, 2, 3]}} |
Vector |
Fixed-dimension float32 array | [0.1, -0.2, 0.3, ...] |
List<T> |
Variable-length array | ["a", "b", "c"] |
Vector Type¶
Vectors are first-class citizens for embedding-based search:
Vector Characteristics: - Fixed dimension (immutable after schema creation) - Float32 elements (for storage efficiency) - Indexable with HNSW, IVF_PQ algorithms - Searchable via Cypher procedures
JSON Type¶
For semi-structured data with flexible schema:
JSON Capabilities: - Store arbitrary nested structures - Query with JSON path expressions - Index specific paths for performance
Schema Definition¶
Uni uses a strict schema for performance and storage efficiency. Schemas are defined in JSON.
Complete Schema Example¶
{
"schema_version": 1,
"labels": {
"Paper": {
"id": 1,
"is_document": true,
"created_at": "2024-01-01T00:00:00Z",
"state": "active"
},
"Author": {
"id": 2,
"is_document": false,
"created_at": "2024-01-01T00:00:00Z",
"state": "active"
},
"Venue": {
"id": 3,
"is_document": false,
"created_at": "2024-01-01T00:00:00Z",
"state": "active"
}
},
"edge_types": {
"CITES": {
"id": 1,
"src_labels": ["Paper"],
"dst_labels": ["Paper"],
"created_at": "2024-01-01T00:00:00Z",
"state": "active"
},
"AUTHORED_BY": {
"id": 2,
"src_labels": ["Paper"],
"dst_labels": ["Author"],
"created_at": "2024-01-01T00:00:00Z",
"state": "active"
},
"PUBLISHED_IN": {
"id": 3,
"src_labels": ["Paper"],
"dst_labels": ["Venue"],
"created_at": "2024-01-01T00:00:00Z",
"state": "active"
}
},
"properties": {
"Paper": {
"title": { "type": "String", "nullable": false },
"abstract": { "type": "String", "nullable": true },
"year": { "type": "Int32", "nullable": true },
"doi": { "type": "String", "nullable": true },
"embedding": { "type": "Vector", "dimensions": 768 },
"metadata": { "type": "Json", "nullable": true }
},
"Author": {
"name": { "type": "String", "nullable": false },
"email": { "type": "String", "nullable": true },
"affiliation": { "type": "String", "nullable": true },
"h_index": { "type": "Int32", "nullable": true }
},
"Venue": {
"name": { "type": "String", "nullable": false },
"type": { "type": "String", "nullable": true },
"location": { "type": "String", "nullable": true }
},
"AUTHORED_BY": {
"position": { "type": "Int32", "nullable": true },
"contribution": { "type": "String", "nullable": true }
}
},
"indexes": {
"paper_embeddings": {
"type": "vector",
"label": "Paper",
"property": "embedding",
"config": {
"index_type": "hnsw",
"metric": "cosine"
}
},
"author_email": {
"type": "scalar",
"label": "Author",
"property": "email",
"config": {
"index_type": "btree"
}
}
}
}
Schema Element States¶
Labels and edge types can be in different lifecycle states:
| State | Description | Queryable | Writable |
|---|---|---|---|
active |
Normal operation | Yes | Yes |
deprecated |
Marked for removal | Yes | Yes |
hidden |
No longer queryable | No | No |
tombstone |
Deleted | No | No |
Document Mode¶
Labels marked as is_document: true enable document-specific features:
JSON Storage¶
Store complex nested data beyond the property schema:
CREATE (p:Paper {
title: "Research Paper",
_doc: {
figures: [
{ id: "fig1", caption: "Architecture diagram" },
{ id: "fig2", caption: "Results chart" }
],
supplementary: {
code_url: "https://github.com/...",
datasets: ["imagenet", "coco"]
}
}
})
JSON Path Indexing¶
Index specific paths within JSON documents:
Identity Model Summary¶
| ID Type | Bits | Purpose | Example |
|---|---|---|---|
| VID | 64 | Internal vertex identifier | 0x0001_0000_0000_002A |
| EID | 64 | Internal edge identifier | 0x0002_0000_0000_0015 |
| ext_id | String | User-provided external ID | "paper_abc123" |
| UniId | 256 | Content-addressed hash | bafkrei... |
Learn more about Identity Model →
Querying the Data Model¶
Pattern Matching¶
// Simple node match
MATCH (p:Paper)
// Node with properties
MATCH (p:Paper {year: 2023})
// Relationships
MATCH (p:Paper)-[:AUTHORED_BY]->(a:Author)
// Multi-hop
MATCH (p1:Paper)-[:CITES]->(p2:Paper)-[:CITES]->(p3:Paper)
Filtering¶
// Property comparisons
WHERE p.year > 2020 AND p.year < 2025
// String operations
WHERE p.title CONTAINS 'Transformer'
// Null checks
WHERE a.email IS NOT NULL
// List membership
WHERE p.venue IN ['NeurIPS', 'ICML', 'ICLR']
Projections¶
// Select properties
RETURN p.title, p.year
// Aliases
RETURN p.title AS paper_title
// Aggregations
RETURN COUNT(p) AS total, AVG(p.year) AS avg_year
Best Practices¶
Schema Design¶
- Choose labels carefully — They're encoded in VIDs and can't change
- Keep properties typed — Avoid overusing JSON for queryable data
- Use edge types — Don't store relationships as vertex properties
- Plan for vectors — Dimension can't change after creation
Performance Considerations¶
- Index frequently queried properties — Especially for WHERE clauses
- Use vector indexes for embeddings — HNSW for quality, IVF_PQ for scale
- Avoid wide vertices — Many properties = larger Lance rows
- Leverage label partitioning — Queries on single label are faster
Next Steps¶
- Identity Model — Deep dive into VID/EID encoding
- Indexing — Vector, scalar, and full-text indexes
- Schema Design Guide — Best practices and patterns