Skip to content

Glossary

A comprehensive glossary of terms used throughout the Uni documentation.


A

Adjacency Cache

An in-memory cache storing graph topology in Compressed Sparse Row (CSR) format. Enables O(1) neighbor lookups for fast graph traversal. See Vectorized Execution.

Aggregation

A Cypher operation that combines multiple rows into summary values. Supported functions include COUNT, SUM, AVG, MIN, MAX, and COLLECT. See Cypher Querying.

ANN (Approximate Nearest Neighbor)

A class of algorithms that find vectors similar to a query vector without exhaustive comparison. Uni supports HNSW and IVF_PQ for ANN search. See Vector Search.

Apache Arrow

A columnar memory format for flat and hierarchical data. Uni uses Arrow internally for zero-copy data processing and SIMD-accelerated operations. See Architecture.


B

Batch

A group of rows processed together in vectorized execution. Typical batch sizes are 1024-8192 rows, chosen to fit in CPU cache. See Vectorized Execution.

B-Tree Index

A balanced tree data structure for ordered data. Used for range queries (<, >, BETWEEN). See Indexing.


C

Compaction

The process of merging multiple data files into fewer, larger files. In Uni, L1 runs are compacted into L2. See Storage Engine.

CSR (Compressed Sparse Row)

A compact representation of sparse matrices (graphs). Stores vertex neighbors in contiguous arrays with offset pointers. Used in the adjacency cache for efficient traversal.

Cypher

A declarative graph query language using ASCII-art patterns. Originally developed for Neo4j, now standardized as OpenCypher. Uni implements a substantial subset. See Cypher Querying.


D

DataFusion

An Apache Arrow-native query engine. Uni uses DataFusion for columnar processing, aggregations, and some query operations.

Direction

The orientation of an edge traversal: - Outgoing (-[r]->): From source to target - Incoming (<-[r]-): From target to source - Both** (-[r]-): Either direction

Document Mode

A schema option (is_document: true) that enables flexible, semi-structured data storage. Vertices in document mode can have a _doc field containing arbitrary JSON.


E

Edge

A connection between two vertices in a graph. In Uni, edges have: - Type: Category of relationship (e.g., CITES, AUTHORED_BY) - Direction: From source to destination - Properties: Optional key-value attributes

EID (Edge ID)

A 64-bit identifier for edges. Encoded as edge_type_id (16 bits) | local_offset (48 bits). See Identity Model.

Embedding

A dense vector representation of data (text, images, etc.) in a high-dimensional space. Similar items have embeddings close together. See Vector Search.

EXPLAIN

A Cypher command prefix that shows the query plan without executing the query. Useful for understanding how queries will be processed.


F

FastEmbed

A Rust library for generating text embeddings locally. Uni integrates FastEmbed for on-device embedding generation without external API calls. See Vector Search.

Flush

The process of writing in-memory data (L0) to persistent storage (L1). Triggered by size thresholds or explicit calls.


G

Graph Database

A database optimized for storing and querying connected data. Models data as vertices (nodes) and edges (relationships) rather than tables and rows.

gryf

A Rust graph library providing in-memory graph structures and algorithms. Uni uses gryf for the L0 buffer and working graphs.


H

Hash Index

An index structure using hash tables for O(1) equality lookups. Best for exact match queries on high-cardinality columns.

HNSW (Hierarchical Navigable Small World)

A graph-based algorithm for approximate nearest neighbor search. Provides high recall and fast queries at the cost of memory. See Indexing.


I

IVF_PQ (Inverted File with Product Quantization)

A vector index that partitions vectors into clusters and compresses them. Lower memory than HNSW but typically lower recall.


J

JSONL (JSON Lines)

A text format with one JSON object per line. Uni's primary format for bulk data import.


K

KNN (K-Nearest Neighbors)

Finding the K vectors most similar to a query vector. Uni's vector search returns KNN results ordered by distance.


L

L0 Buffer

The in-memory write buffer that accepts all incoming mutations. Contains a gryf graph for topology and Arrow builders for properties. See Storage Engine.

L1 Layer

Immutable Lance datasets created by flushing L0. Contains sorted runs of data not yet compacted into L2.

L2 Layer

The base storage layer containing fully compacted, indexed data. Most data resides here after compaction.

Label

A type classifier for vertices (e.g., Paper, Author). Similar to a table name in relational databases. Encoded in the VID.

Lance

A columnar data format optimized for ML workloads. Features native vector indexing, versioning, and cloud storage support. Uni uses Lance as its primary storage format. See Storage Engine.

Late Materialization

An optimization that delays loading heavy properties until after filtering. Reduces I/O by only loading data for rows that survive filters. See Vectorized Execution.

LSM Tree (Log-Structured Merge Tree)

A storage architecture with tiered levels (L0, L1, L2). Writes go to memory first, then flush to immutable files that are periodically compacted.


M

Manifest

A JSON file describing the state of storage at a point in time. Contains dataset versions, index metadata, and L1 run information. Enables snapshot isolation.

MATCH

The primary Cypher clause for specifying graph patterns to find. Uses ASCII-art syntax like (a)-[r]->(b).

Morsel

A work unit in parallel execution. Source data is divided into morsels that workers process independently. See Vectorized Execution.


N

Node

See Vertex.


O

OpenCypher

An open standard for the Cypher query language. Uni implements a substantial subset of OpenCypher.


P

Predicate Pushdown

An optimization that pushes filter conditions down to the storage layer. Reduces I/O by filtering at scan time rather than after loading. See Query Planning.

Profile

A Cypher command prefix that executes the query and shows detailed timing for each operation. More informative than EXPLAIN.

Property

A key-value attribute on a vertex or edge. Properties have defined types (String, Int32, Vector, etc.) in the schema.

Property Graph

A data model where vertices and edges can have arbitrary properties. More flexible than simple labeled graphs.


Q

Query Plan

The sequence of operations that will execute a query. Includes logical plan (what to do) and physical plan (how to do it).


R

RecordBatch

An Apache Arrow data structure containing a batch of columnar data with a shared schema. The fundamental data unit in vectorized execution.

Relationship

See Edge.


S

Schema

The structure definition for a graph, including: - Labels and their properties - Edge types and constraints - Vector dimensions - Indexes

Selection Vector

A bitmap or index array marking which rows in a batch are "active" after filtering. Avoids copying data when filtering.

SIMD (Single Instruction, Multiple Data)

CPU instructions that operate on multiple data elements simultaneously. Arrow compute kernels use SIMD for fast filtering and arithmetic.

Snapshot

A consistent point-in-time view of the database. Readers see a stable snapshot even as writes occur.

Snapshot Isolation

A concurrency model where each reader sees a consistent snapshot. Readers don't block writers and vice versa.


T

Tombstone

A marker indicating deleted data. Soft deletes mark rows as deleted; compaction removes tombstoned data.

Traversal

Following edges from vertices to discover connected vertices. A fundamental graph operation.


U

UNWIND

A Cypher clause that expands a list into multiple rows. Useful for batch operations and working with array properties.


V

Vector Index

A data structure enabling fast similarity search on high-dimensional vectors. Uni supports HNSW and IVF_PQ vector indexes.

Vectorized Execution

A query processing model that operates on batches of rows rather than one row at a time. Improves performance through better cache utilization and SIMD operations. See Vectorized Execution.

Vertex

A node in the graph representing an entity. In Uni, vertices have: - VID: Unique identifier - Label: Type classification - Properties: Key-value attributes

VID (Vertex ID)

A 64-bit identifier for vertices. Encoded as label_id (16 bits) | local_offset (48 bits). See Identity Model.


W

WAL (Write-Ahead Log)

A durability mechanism that logs mutations before applying them. Enables recovery after crashes.

UniId

A content-addressed identifier using SHA3-256 hash (32 bytes). Used for provenance tracking and distributed synchronization with CRDT systems. See Identity Model.

WITH

A Cypher clause that pipes results from one query part to another. Enables subquery-like behavior and intermediate aggregations.

Working Graph

An in-memory graph materialized from storage for query execution. Backed by gryf.


Y

YIELD

A Cypher clause used with CALL to specify which columns to return from a procedure. Used with vector search and other built-in procedures.


Common Abbreviations

Abbreviation Meaning
ANN Approximate Nearest Neighbor
CSR Compressed Sparse Row
EID Edge ID
HNSW Hierarchical Navigable Small World
IVF Inverted File
KNN K-Nearest Neighbors
L0/L1/L2 Storage layer levels
LSM Log-Structured Merge
PQ Product Quantization
SIMD Single Instruction Multiple Data
VID Vertex ID
WAL Write-Ahead Log

Next Steps