Data Model¶

Uni combines Property Graph, Document, and Vector concepts into a unified data model. This document explains the core entities, their relationships, and how to define schemas.

Core Concepts¶

Uni's data model has three primary entity types:

flowchart LR subgraph Source["Source"] V1["VERTEX (Node)"] P1["Properties + Label"] end subgraph Relationship["Relationship"] E["EDGE (Relationship)"] PE["Properties + Type"] end subgraph Target["Target"] V2["VERTEX (Node)"] P2["Properties + Label"] end V1 --> E --> V2 V1 -->|has| P1 E -->|has| PE V2 -->|has| P2

Vertices (Nodes)¶

Vertices represent entities in your domain. Each vertex has:

Component	Description	Example
VID	Internal 64-bit identifier	`0x0001_0000_0000_002A`
Label(s)	Type classification	`:Paper`, `:Author`, `:Venue`
Properties	Key-value attributes	`{title: "...", year: 2023}`
External ID	User-provided identifier	`"paper_abc123"`

Labels¶

Labels categorize vertices and determine their schema. A vertex has exactly one primary label (stored in VID encoding).

// Create vertices with labels
CREATE (p:Paper {title: "Attention Is All You Need"})
CREATE (a:Author {name: "Ashish Vaswani"})
CREATE (v:Venue {name: "NeurIPS", year: 2017})

Label Best Practices: - Use singular nouns: :Paper not :Papers - Use PascalCase: :ResearchPaper not :research_paper - Be specific: :AcademicPaper vs generic :Document

Properties¶

Properties are strongly-typed key-value pairs defined in the schema.

{
  "Paper": {
    "title": { "type": "String", "nullable": false },
    "year": { "type": "Int32", "nullable": true },
    "abstract": { "type": "String", "nullable": true },
    "embedding": { "type": "Vector", "dimensions": 768 },
    "metadata": { "type": "Json", "nullable": true }
  }
}

Edges (Relationships)¶

Edges connect vertices with typed, directed relationships.

Component	Description	Example
EID	Internal 64-bit identifier	`0x0002_0000_0000_0015`
Type	Relationship classification	`:CITES`, `:AUTHORED_BY`
Source	Origin vertex (VID)	Paper vertex
Destination	Target vertex (VID)	Author vertex
Properties	Edge attributes	`{position: 1, role: "lead"}`

Edge Direction¶

Edges are always directed (source → destination):

// Paper cites another Paper
(paper1:Paper)-[:CITES]->(paper2:Paper)

// Paper is authored by Author
(paper:Paper)-[:AUTHORED_BY]->(author:Author)

// Query in either direction
MATCH (a:Author)<-[:AUTHORED_BY]-(p:Paper)  // Incoming to Author
MATCH (p:Paper)-[:AUTHORED_BY]->(a:Author)  // Outgoing from Paper

Edge Type Constraints¶

Edge types can constrain which label combinations are valid:

{
  "edge_types": {
    "CITES": {
      "id": 1,
      "src_labels": ["Paper"],
      "dst_labels": ["Paper"]
    },
    "AUTHORED_BY": {
      "id": 2,
      "src_labels": ["Paper"],
      "dst_labels": ["Author"]
    }
  }
}

Edge Properties¶

Edges can carry their own properties:

CREATE (p:Paper)-[:AUTHORED_BY {position: 1, contribution: "lead"}]->(a:Author)

MATCH (p:Paper)-[e:AUTHORED_BY]->(a:Author)
WHERE e.position = 1
RETURN p.title, a.name

Data Types¶

Uni supports a rich set of data types for properties:

Primitive Types¶

Type	Size	Range / Description	Example
`String`	Variable	UTF-8 text	`"Hello, World"`
`Int32`	4 bytes	-2³¹ to 2³¹-1	`42`
`Int64`	8 bytes	-2⁶³ to 2⁶³-1	`9223372036854775807`
`Float32`	4 bytes	IEEE 754 single precision	`3.14159`
`Float64`	8 bytes	IEEE 754 double precision	`3.141592653589793`
`Bool`	1 byte	true / false	`true`
`Timestamp`	8 bytes	Microsecond precision UTC	`"2024-01-15T10:30:00Z"`

Complex Types¶

Type	Description	Example
`Json`	Structured JSON document	`{"nested": {"key": [1, 2, 3]}}`
`Vector`	Fixed-dimension float32 array	`[0.1, -0.2, 0.3, ...]`
`List<T>`	Variable-length array	`["a", "b", "c"]`

Vector Type¶

Vectors are first-class citizens for embedding-based search:

{
  "embedding": {
    "type": "Vector",
    "dimensions": 768,
    "nullable": false
  }
}

Vector Characteristics: - Fixed dimension (immutable after schema creation) - Float32 elements (for storage efficiency) - Indexable with HNSW, IVF_PQ algorithms - Searchable via Cypher procedures

JSON Type¶

For semi-structured data with flexible schema:

{
  "metadata": {
    "type": "Json",
    "nullable": true
  }
}

JSON Capabilities: - Store arbitrary nested structures - Query with JSON path expressions - Index specific paths for performance

// Access JSON fields
MATCH (p:Paper)
WHERE p.metadata.venue.location = 'Vancouver'
RETURN p.title

Schema Definition¶

Uni uses a strict schema for performance and storage efficiency. Schemas are defined in JSON.

Complete Schema Example¶

{
  "schema_version": 1,

  "labels": {
    "Paper": {
      "id": 1,
      "is_document": true,
      "created_at": "2024-01-01T00:00:00Z",
      "state": "active"
    },
    "Author": {
      "id": 2,
      "is_document": false,
      "created_at": "2024-01-01T00:00:00Z",
      "state": "active"
    },
    "Venue": {
      "id": 3,
      "is_document": false,
      "created_at": "2024-01-01T00:00:00Z",
      "state": "active"
    }
  },

  "edge_types": {
    "CITES": {
      "id": 1,
      "src_labels": ["Paper"],
      "dst_labels": ["Paper"],
      "created_at": "2024-01-01T00:00:00Z",
      "state": "active"
    },
    "AUTHORED_BY": {
      "id": 2,
      "src_labels": ["Paper"],
      "dst_labels": ["Author"],
      "created_at": "2024-01-01T00:00:00Z",
      "state": "active"
    },
    "PUBLISHED_IN": {
      "id": 3,
      "src_labels": ["Paper"],
      "dst_labels": ["Venue"],
      "created_at": "2024-01-01T00:00:00Z",
      "state": "active"
    }
  },

  "properties": {
    "Paper": {
      "title": { "type": "String", "nullable": false },
      "abstract": { "type": "String", "nullable": true },
      "year": { "type": "Int32", "nullable": true },
      "doi": { "type": "String", "nullable": true },
      "embedding": { "type": "Vector", "dimensions": 768 },
      "metadata": { "type": "Json", "nullable": true }
    },
    "Author": {
      "name": { "type": "String", "nullable": false },
      "email": { "type": "String", "nullable": true },
      "affiliation": { "type": "String", "nullable": true },
      "h_index": { "type": "Int32", "nullable": true }
    },
    "Venue": {
      "name": { "type": "String", "nullable": false },
      "type": { "type": "String", "nullable": true },
      "location": { "type": "String", "nullable": true }
    },
    "AUTHORED_BY": {
      "position": { "type": "Int32", "nullable": true },
      "contribution": { "type": "String", "nullable": true }
    }
  },

  "indexes": {
    "paper_embeddings": {
      "type": "vector",
      "label": "Paper",
      "property": "embedding",
      "config": {
        "index_type": "hnsw",
        "metric": "cosine"
      }
    },
    "author_email": {
      "type": "scalar",
      "label": "Author",
      "property": "email",
      "config": {
        "index_type": "btree"
      }
    }
  }
}

Schema Element States¶

Labels and edge types can be in different lifecycle states:

State	Description	Queryable	Writable
`active`	Normal operation	Yes	Yes
`deprecated`	Marked for removal	Yes	Yes
`hidden`	No longer queryable	No	No
`tombstone`	Deleted	No	No

Document Mode¶

Labels marked as is_document: true enable document-specific features:

JSON Storage¶

Store complex nested data beyond the property schema:

{
  "Paper": {
    "id": 1,
    "is_document": true
  }
}

CREATE (p:Paper {
  title: "Research Paper",
  _doc: {
    figures: [
      { id: "fig1", caption: "Architecture diagram" },
      { id: "fig2", caption: "Results chart" }
    ],
    supplementary: {
      code_url: "https://github.com/...",
      datasets: ["imagenet", "coco"]
    }
  }
})

JSON Path Indexing¶

Index specific paths within JSON documents:

{
  "json_indexes": {
    "Paper": [
      "$.supplementary.datasets[*]",
      "$.figures[*].id"
    ]
  }
}

Identity Model Summary¶

ID Type	Bits	Purpose	Example
VID	64	Internal vertex identifier	`0x0001_0000_0000_002A`
EID	64	Internal edge identifier	`0x0002_0000_0000_0015`
ext_id	String	User-provided external ID	`"paper_abc123"`
UniId	256	Content-addressed hash	`bafkrei...`

Learn more about Identity Model →

Querying the Data Model¶

Pattern Matching¶

// Simple node match
MATCH (p:Paper)

// Node with properties
MATCH (p:Paper {year: 2023})

// Relationships
MATCH (p:Paper)-[:AUTHORED_BY]->(a:Author)

// Multi-hop
MATCH (p1:Paper)-[:CITES]->(p2:Paper)-[:CITES]->(p3:Paper)

Filtering¶

// Property comparisons
WHERE p.year > 2020 AND p.year < 2025

// String operations
WHERE p.title CONTAINS 'Transformer'

// Null checks
WHERE a.email IS NOT NULL

// List membership
WHERE p.venue IN ['NeurIPS', 'ICML', 'ICLR']

Projections¶

// Select properties
RETURN p.title, p.year

// Aliases
RETURN p.title AS paper_title

// Aggregations
RETURN COUNT(p) AS total, AVG(p.year) AS avg_year

Best Practices¶

Schema Design¶

Choose labels carefully — They're encoded in VIDs and can't change
Keep properties typed — Avoid overusing JSON for queryable data
Use edge types — Don't store relationships as vertex properties
Plan for vectors — Dimension can't change after creation

Performance Considerations¶

Index frequently queried properties — Especially for WHERE clauses
Use vector indexes for embeddings — HNSW for quality, IVF_PQ for scale
Avoid wide vertices — Many properties = larger Lance rows
Leverage label partitioning — Queries on single label are faster

Next Steps¶

Identity Model — Deep dive into VID/EID encoding
Indexing — Vector, scalar, and full-text indexes
Schema Design Guide — Best practices and patterns