Skip to content

Data Model

Uni combines Property Graph, Document, and Vector concepts into a unified data model. This document explains the core entities, their relationships, and how to define schemas.

Core Concepts

Uni's data model has three primary entity types:

flowchart LR subgraph Source["Source"] V1["VERTEX<br/>(Node)"] P1["Properties<br/>+ Label"] end subgraph Relationship["Relationship"] E["EDGE<br/>(Relationship)"] PE["Properties<br/>+ Type"] end subgraph Target["Target"] V2["VERTEX<br/>(Node)"] P2["Properties<br/>+ Label"] end V1 --> E --> V2 V1 -->|has| P1 E -->|has| PE V2 -->|has| P2

Vertices (Nodes)

Vertices represent entities in your domain. Each vertex has:

Component Description Example
VID Internal 64-bit identifier 0x0001_0000_0000_002A
Label(s) Type classification :Paper, :Author, :Venue
Properties Key-value attributes {title: "...", year: 2023}
External ID User-provided identifier "paper_abc123"

Labels

Labels categorize vertices and determine their schema. A vertex has exactly one primary label (stored in VID encoding).

// Create vertices with labels
CREATE (p:Paper {title: "Attention Is All You Need"})
CREATE (a:Author {name: "Ashish Vaswani"})
CREATE (v:Venue {name: "NeurIPS", year: 2017})

Label Best Practices: - Use singular nouns: :Paper not :Papers - Use PascalCase: :ResearchPaper not :research_paper - Be specific: :AcademicPaper vs generic :Document

Properties

Properties are strongly-typed key-value pairs defined in the schema.

{
  "Paper": {
    "title": { "type": "String", "nullable": false },
    "year": { "type": "Int32", "nullable": true },
    "abstract": { "type": "String", "nullable": true },
    "embedding": { "type": "Vector", "dimensions": 768 },
    "metadata": { "type": "Json", "nullable": true }
  }
}

Edges (Relationships)

Edges connect vertices with typed, directed relationships.

Component Description Example
EID Internal 64-bit identifier 0x0002_0000_0000_0015
Type Relationship classification :CITES, :AUTHORED_BY
Source Origin vertex (VID) Paper vertex
Destination Target vertex (VID) Author vertex
Properties Edge attributes {position: 1, role: "lead"}

Edge Direction

Edges are always directed (source → destination):

// Paper cites another Paper
(paper1:Paper)-[:CITES]->(paper2:Paper)

// Paper is authored by Author
(paper:Paper)-[:AUTHORED_BY]->(author:Author)

// Query in either direction
MATCH (a:Author)<-[:AUTHORED_BY]-(p:Paper)  // Incoming to Author
MATCH (p:Paper)-[:AUTHORED_BY]->(a:Author)  // Outgoing from Paper

Edge Type Constraints

Edge types can constrain which label combinations are valid:

{
  "edge_types": {
    "CITES": {
      "id": 1,
      "src_labels": ["Paper"],
      "dst_labels": ["Paper"]
    },
    "AUTHORED_BY": {
      "id": 2,
      "src_labels": ["Paper"],
      "dst_labels": ["Author"]
    }
  }
}

Edge Properties

Edges can carry their own properties:

CREATE (p:Paper)-[:AUTHORED_BY {position: 1, contribution: "lead"}]->(a:Author)

MATCH (p:Paper)-[e:AUTHORED_BY]->(a:Author)
WHERE e.position = 1
RETURN p.title, a.name

Data Types

Uni supports a rich set of data types for properties:

Primitive Types

Type Size Range / Description Example
String Variable UTF-8 text "Hello, World"
Int32 4 bytes -2³¹ to 2³¹-1 42
Int64 8 bytes -2⁶³ to 2⁶³-1 9223372036854775807
Float32 4 bytes IEEE 754 single precision 3.14159
Float64 8 bytes IEEE 754 double precision 3.141592653589793
Bool 1 byte true / false true
Timestamp 8 bytes Microsecond precision UTC "2024-01-15T10:30:00Z"

Complex Types

Type Description Example
Json Structured JSON document {"nested": {"key": [1, 2, 3]}}
Vector Fixed-dimension float32 array [0.1, -0.2, 0.3, ...]
List<T> Variable-length array ["a", "b", "c"]

Vector Type

Vectors are first-class citizens for embedding-based search:

{
  "embedding": {
    "type": "Vector",
    "dimensions": 768,
    "nullable": false
  }
}

Vector Characteristics: - Fixed dimension (immutable after schema creation) - Float32 elements (for storage efficiency) - Indexable with HNSW, IVF_PQ algorithms - Searchable via Cypher procedures

JSON Type

For semi-structured data with flexible schema:

{
  "metadata": {
    "type": "Json",
    "nullable": true
  }
}

JSON Capabilities: - Store arbitrary nested structures - Query with JSON path expressions - Index specific paths for performance

// Access JSON fields
MATCH (p:Paper)
WHERE p.metadata.venue.location = 'Vancouver'
RETURN p.title

Schema Definition

Uni uses a strict schema for performance and storage efficiency. Schemas are defined in JSON.

Complete Schema Example

{
  "schema_version": 1,

  "labels": {
    "Paper": {
      "id": 1,
      "is_document": true,
      "created_at": "2024-01-01T00:00:00Z",
      "state": "active"
    },
    "Author": {
      "id": 2,
      "is_document": false,
      "created_at": "2024-01-01T00:00:00Z",
      "state": "active"
    },
    "Venue": {
      "id": 3,
      "is_document": false,
      "created_at": "2024-01-01T00:00:00Z",
      "state": "active"
    }
  },

  "edge_types": {
    "CITES": {
      "id": 1,
      "src_labels": ["Paper"],
      "dst_labels": ["Paper"],
      "created_at": "2024-01-01T00:00:00Z",
      "state": "active"
    },
    "AUTHORED_BY": {
      "id": 2,
      "src_labels": ["Paper"],
      "dst_labels": ["Author"],
      "created_at": "2024-01-01T00:00:00Z",
      "state": "active"
    },
    "PUBLISHED_IN": {
      "id": 3,
      "src_labels": ["Paper"],
      "dst_labels": ["Venue"],
      "created_at": "2024-01-01T00:00:00Z",
      "state": "active"
    }
  },

  "properties": {
    "Paper": {
      "title": { "type": "String", "nullable": false },
      "abstract": { "type": "String", "nullable": true },
      "year": { "type": "Int32", "nullable": true },
      "doi": { "type": "String", "nullable": true },
      "embedding": { "type": "Vector", "dimensions": 768 },
      "metadata": { "type": "Json", "nullable": true }
    },
    "Author": {
      "name": { "type": "String", "nullable": false },
      "email": { "type": "String", "nullable": true },
      "affiliation": { "type": "String", "nullable": true },
      "h_index": { "type": "Int32", "nullable": true }
    },
    "Venue": {
      "name": { "type": "String", "nullable": false },
      "type": { "type": "String", "nullable": true },
      "location": { "type": "String", "nullable": true }
    },
    "AUTHORED_BY": {
      "position": { "type": "Int32", "nullable": true },
      "contribution": { "type": "String", "nullable": true }
    }
  },

  "indexes": {
    "paper_embeddings": {
      "type": "vector",
      "label": "Paper",
      "property": "embedding",
      "config": {
        "index_type": "hnsw",
        "metric": "cosine"
      }
    },
    "author_email": {
      "type": "scalar",
      "label": "Author",
      "property": "email",
      "config": {
        "index_type": "btree"
      }
    }
  }
}

Schema Element States

Labels and edge types can be in different lifecycle states:

State Description Queryable Writable
active Normal operation Yes Yes
deprecated Marked for removal Yes Yes
hidden No longer queryable No No
tombstone Deleted No No

Document Mode

Labels marked as is_document: true enable document-specific features:

JSON Storage

Store complex nested data beyond the property schema:

{
  "Paper": {
    "id": 1,
    "is_document": true
  }
}
CREATE (p:Paper {
  title: "Research Paper",
  _doc: {
    figures: [
      { id: "fig1", caption: "Architecture diagram" },
      { id: "fig2", caption: "Results chart" }
    ],
    supplementary: {
      code_url: "https://github.com/...",
      datasets: ["imagenet", "coco"]
    }
  }
})

JSON Path Indexing

Index specific paths within JSON documents:

{
  "json_indexes": {
    "Paper": [
      "$.supplementary.datasets[*]",
      "$.figures[*].id"
    ]
  }
}

Identity Model Summary

ID Type Bits Purpose Example
VID 64 Internal vertex identifier 0x0001_0000_0000_002A
EID 64 Internal edge identifier 0x0002_0000_0000_0015
ext_id String User-provided external ID "paper_abc123"
UniId 256 Content-addressed hash bafkrei...

Learn more about Identity Model →


Querying the Data Model

Pattern Matching

// Simple node match
MATCH (p:Paper)

// Node with properties
MATCH (p:Paper {year: 2023})

// Relationships
MATCH (p:Paper)-[:AUTHORED_BY]->(a:Author)

// Multi-hop
MATCH (p1:Paper)-[:CITES]->(p2:Paper)-[:CITES]->(p3:Paper)

Filtering

// Property comparisons
WHERE p.year > 2020 AND p.year < 2025

// String operations
WHERE p.title CONTAINS 'Transformer'

// Null checks
WHERE a.email IS NOT NULL

// List membership
WHERE p.venue IN ['NeurIPS', 'ICML', 'ICLR']

Projections

// Select properties
RETURN p.title, p.year

// Aliases
RETURN p.title AS paper_title

// Aggregations
RETURN COUNT(p) AS total, AVG(p.year) AS avg_year

Best Practices

Schema Design

  1. Choose labels carefully — They're encoded in VIDs and can't change
  2. Keep properties typed — Avoid overusing JSON for queryable data
  3. Use edge types — Don't store relationships as vertex properties
  4. Plan for vectors — Dimension can't change after creation

Performance Considerations

  1. Index frequently queried properties — Especially for WHERE clauses
  2. Use vector indexes for embeddings — HNSW for quality, IVF_PQ for scale
  3. Avoid wide vertices — Many properties = larger Lance rows
  4. Leverage label partitioning — Queries on single label are faster

Next Steps