Query Rewriting Framework¶

Uni includes a powerful query rewriting framework that transforms function calls into equivalent predicate expressions at compile time. This enables full predicate pushdown to storage, index utilization, and eliminates runtime function evaluation overhead.

Overview¶

The query rewriting framework is a general-purpose transformation system that operates on the Cypher AST before logical planning. It applies registered rewrite rules to function calls, converting them into simpler expressions that the storage layer can optimize.

Key Insight: Many functions can be expressed as simple predicate expressions. By rewriting at compile time, we eliminate function evaluation overhead and enable the storage layer (Lance/DataFusion) to filter data directly.

Example¶

// Original query with temporal function
MATCH (p:Person)-[e:EMPLOYED_BY]->(c:Company)
WHERE uni.temporal.validAt(e, 'start', 'end', datetime('2021-06-15'))
RETURN c.name

// Automatically rewritten to
MATCH (p:Person)-[e:EMPLOYED_BY]->(c:Company)
WHERE e.start <= datetime('2021-06-15')
  AND (e.end IS NULL OR e.end >= datetime('2021-06-15'))
RETURN c.name

The rewritten form enables: - Predicate pushdown to Lance/DataFusion - Index utilization on start and end columns - Native storage filtering instead of row-by-row evaluation

Architecture¶

Components¶

The framework consists of several key components located in crates/uni-query/src/query/rewrite/:

query/rewrite/
├── mod.rs          # Public API (rewrite_query, rewrite_statement)
├── rule.rs         # RewriteRule trait and constraint types
├── registry.rs     # Global rule registry
├── walker.rs       # Expression tree walker
├── context.rs      # Rewrite context and configuration
├── error.rs        # Error types
└── rules/          # Built-in rule implementations
    ├── mod.rs      # Rule registration
    ├── temporal.rs # Temporal function rewrites
    └── README.md   # Developer guide for adding rules

RewriteRule Trait¶

All rewrite rules implement the RewriteRule trait:

pub trait RewriteRule: Send + Sync {
    /// The fully-qualified function name to match
    fn function_name(&self) -> &str;

    /// Validate arguments before attempting rewrite
    fn validate_args(&self, args: &[Expr]) -> Result<(), RewriteError>;

    /// Perform the transformation
    fn rewrite(&self, args: Vec<Expr>, ctx: &RewriteContext)
        -> Result<Expr, RewriteError>;

    /// Check if rule is applicable in current context (optional)
    fn is_applicable(&self, ctx: &RewriteContext) -> bool {
        true
    }
}

Integration Point¶

The rewriting framework integrates into the query pipeline at the planning stage:

// In planner.rs
pub fn plan_with_scope(&self, query: Query, vars: Vec<String>) -> Result<LogicalPlan> {
    // Apply query rewrites before planning
    let rewritten_query = crate::query::rewrite::rewrite_query(query)?;

    match rewritten_query {
        Query::Single(stmt) => self.plan_single(stmt, vars),
        // ... rest of planning
    }
}

This ensures rewrites happen before logical planning, enabling all downstream optimizations.

Built-in Rewrites¶

Temporal Functions¶

Uni includes several temporal function rewrites that enable efficient time-based queries:

Function	Transformation
`uni.temporal.validAt(e, 'start', 'end', ts)`	`e.start <= ts AND (e.end IS NULL OR e.end >= ts)`
`uni.temporal.overlaps(e, 'start', 'end', rs, re)`	`e.start <= re AND (e.end IS NULL OR e.end >= rs)`
`uni.temporal.precedes(e, 'end', ts)`	`e.end < ts`
`uni.temporal.succeeds(e, 'start', ts)`	`e.start > ts`
`uni.temporal.isOngoing(e, 'end')`	`e.end IS NULL`
`uni.temporal.hasClosed(e, 'end')`	`e.end IS NOT NULL`

All temporal rewrites preserve three-valued logic and handle null values correctly (treating null end dates as "ongoing").

Adding New Rewrite Rules¶

Step 1: Implement the RewriteRule Trait¶

Create a new struct implementing RewriteRule:

use crate::query::rewrite::{RewriteRule, RewriteContext, RewriteError};
use crate::query::rewrite::rule::{Arity, ArgConstraints};
use uni_cypher::ast::{BinaryOp, Expr};
use serde_json::Value;

pub struct InRangeRule;

impl RewriteRule for InRangeRule {
    fn function_name(&self) -> &str {
        "uni.util.inRange"
    }

    fn validate_args(&self, args: &[Expr]) -> Result<(), RewriteError> {
        // Validate: inRange(entity, 'property', min, max)
        let constraints = ArgConstraints {
            arity: Arity::Exact(4),
            literal_args: vec![1],  // Property name must be literal
            entity_arg: Some(0),    // First arg is entity
        };
        constraints.validate(args)
    }

    fn rewrite(&self, args: Vec<Expr>, _ctx: &RewriteContext)
        -> Result<Expr, RewriteError>
    {
        let entity = args[0].clone();
        let prop = extract_string_literal(&args[1])?;
        let min = args[2].clone();
        let max = args[3].clone();

        // Rewrite to: entity.prop >= min AND entity.prop <= max
        Ok(Expr::BinaryOp {
            left: Box::new(Expr::BinaryOp {
                left: Box::new(Expr::Property(Box::new(entity.clone()), prop.clone())),
                op: BinaryOp::GtEq,
                right: Box::new(min),
            }),
            op: BinaryOp::And,
            right: Box::new(Expr::BinaryOp {
                left: Box::new(Expr::Property(Box::new(entity), prop)),
                op: BinaryOp::LtEq,
                right: Box::new(max),
            }),
        })
    }
}

Step 2: Register the Rule¶

Add to rules/mod.rs:

pub fn register_builtin_rules(registry: &mut RewriteRegistry) {
    // Existing rules
    registry.register(Arc::new(temporal::ValidAtRule));

    // Your new rule
    registry.register(Arc::new(util::InRangeRule));
}

Step 3: Write Tests¶

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_in_range_rewrite() {
        let rule = InRangeRule;
        let ctx = RewriteContext::default();

        let args = vec![
            Expr::Variable("p".into()),
            Expr::Literal(Value::String("age".into())),
            Expr::Literal(Value::Number(18.into())),
            Expr::Literal(Value::Number(65.into())),
        ];

        let result = rule.rewrite(args, &ctx).unwrap();

        // Should produce AND expression
        assert!(matches!(result, Expr::BinaryOp { op: BinaryOp::And, .. }));
    }
}

Design Principles¶

1. Semantic Preservation¶

Rewrites must preserve the exact semantics of the original function:

Three-valued logic: SQL/Cypher uses true/false/null
Null handling: Must match original function behavior
Type coercion: Preserve type semantics

Example of correct null handling:

// WRONG: Breaks when end is null
// e.start <= ts AND e.end >= ts

// CORRECT: Treats null end as "ongoing"
// e.start <= ts AND (e.end IS NULL OR e.end >= ts)

2. Declarative Validation¶

Use ArgConstraints for declarative argument validation:

let constraints = ArgConstraints {
    arity: Arity::Exact(4),           // Exact argument count
    literal_args: vec![1, 2],         // Indices that must be literals
    entity_arg: Some(0),              // Index of entity reference
};

constraints.validate(args)?;

3. Graceful Fallback¶

When rewriting cannot be applied, the framework falls back to scalar execution:

Dynamic property names (e.g., parameterized: $propName)
Complex expressions that can't be analyzed statically
Missing context information

The framework automatically handles fallback without manual intervention.

4. Observability¶

The framework tracks detailed statistics:

pub struct RewriteStats {
    pub functions_visited: usize,
    pub functions_rewritten: usize,
    pub functions_skipped: usize,
    pub errors: Vec<RewriteError>,
    pub rule_stats: HashMap<String, RuleStats>,
}

Enable verbose logging to see rewrite operations:

RUST_LOG=uni_query::rewrite=debug cargo test

Performance Impact¶

Rewriting provides significant performance benefits:

Before Rewriting¶

MATCH (p)-[e:EMPLOYED_BY]->()
WHERE uni.temporal.validAt(e, 'start', 'end', datetime('2021-06-15'))
RETURN count(*)

Function evaluated for every edge in the result set
No predicate pushdown to storage
Full table scan required

After Rewriting¶

MATCH (p)-[e:EMPLOYED_BY]->()
WHERE e.start <= datetime('2021-06-15')
  AND (e.end IS NULL OR e.end >= datetime('2021-06-15'))
RETURN count(*)

Predicates pushed down to Lance/DataFusion
Storage-level filtering before materialization
Can use indexes on start and end columns
Significantly reduced data transfer

Testing Strategy¶

Unit Tests¶

Test rules in isolation:

#[test]
fn test_rule_validation() {
    let rule = MyRule;

    // Valid arguments
    assert!(rule.validate_args(&valid_args).is_ok());

    // Invalid arguments
    assert!(rule.validate_args(&invalid_args).is_err());
}

Integration Tests¶

Test semantic equivalence:

#[tokio::test]
async fn test_semantic_equivalence() {
    let db = setup_test_db().await?;

    // Query with function (will be rewritten)
    let result_function = db.query(
        "MATCH (n) WHERE myFunc(n, 'prop', 42) RETURN count(*)"
    ).await?;

    // Query with explicit predicate (baseline)
    let result_predicate = db.query(
        "MATCH (n) WHERE n.prop >= 42 RETURN count(*)"
    ).await?;

    // Results must match
    assert_eq!(result_function, result_predicate);
}

Common Patterns¶

Pattern 1: Property Comparison¶

// Transform: hasValue(e, 'prop', val)
// Into: e.prop = val

let entity = args[0].clone();
let prop = extract_string_literal(&args[1])?;
let value = args[2].clone();

Ok(Expr::BinaryOp {
    left: Box::new(Expr::Property(Box::new(entity), prop)),
    op: BinaryOp::Eq,
    right: Box::new(value),
})

Pattern 2: Range Check¶

// Transform: inRange(e, 'prop', min, max)
// Into: e.prop >= min AND e.prop <= max

Ok(Expr::BinaryOp {
    left: Box::new(Expr::BinaryOp {
        left: Box::new(Expr::Property(Box::new(entity.clone()), prop.clone())),
        op: BinaryOp::GtEq,
        right: Box::new(min),
    }),
    op: BinaryOp::And,
    right: Box::new(Expr::BinaryOp {
        left: Box::new(Expr::Property(Box::new(entity), prop)),
        op: BinaryOp::LtEq,
        right: Box::new(max),
    }),
})

Pattern 3: Null Check¶

// Transform: hasProperty(e, 'prop')
// Into: e.prop IS NOT NULL

Ok(Expr::IsNotNull(Box::new(Expr::Property(
    Box::new(entity),
    prop,
))))

Pattern 4: OR with Null¶

// Transform: validOrNull(e, 'prop', val)
// Into: e.prop IS NULL OR e.prop = val

Ok(Expr::BinaryOp {
    left: Box::new(Expr::IsNull(Box::new(Expr::Property(
        Box::new(entity.clone()),
        prop.clone(),
    )))),
    op: BinaryOp::Or,
    right: Box::new(Expr::BinaryOp {
        left: Box::new(Expr::Property(Box::new(entity), prop)),
        op: BinaryOp::Eq,
        right: Box::new(value),
    }),
})

Best Practices¶

1. Always Validate Arguments¶

Use ArgConstraints to ensure function calls can be safely rewritten:

fn validate_args(&self, args: &[Expr]) -> Result<(), RewriteError> {
    let constraints = ArgConstraints {
        arity: Arity::Exact(3),
        literal_args: vec![1],
        entity_arg: Some(0),
    };
    constraints.validate(args)
}

2. Preserve Null Semantics¶

Always consider how nulls should be handled:

// For temporal "valid at" queries:
// null end date means "ongoing" - should match!
e.start <= ts AND (e.end IS NULL OR e.end >= ts)

3. Keep Rewrites Simple¶

Rewritten expressions should be simple enough for storage layer optimization:

// Good: Simple comparisons
entity.start_date >= datetime('2021-01-01')

// Bad: Complex expressions that block pushdown
year(entity.start_date) >= 2021

4. Write Comprehensive Tests¶

Test all edge cases:

Null values
Boundary conditions
Type mismatches
Dynamic arguments (should fallback gracefully)

5. Document Transformations¶

Clearly document what each rule does:

/// Rewrite rule for uni.temporal.validAt
///
/// Transforms: uni.temporal.validAt(e, 'start', 'end', ts)
/// Into: e.start <= ts AND (e.end IS NULL OR e.end >= ts)
///
/// This preserves the semantics where a null end date means "ongoing".
pub struct ValidAtRule;

Future Extensions¶

The framework is designed to be extensible. Future rewrite rules could include:

Spatial Rewrites¶

// POINT.WITHINBBOX(p, ll, ur)
// → p.x >= ll.x AND p.x <= ur.x AND p.y >= ll.y AND p.y <= ur.y

Datetime Range Rewrites¶

// YEAR(e.created_at) = 2021
// → e.created_at >= datetime('2021-01-01')
//   AND e.created_at < datetime('2022-01-01')

Property Pattern Rewrites¶

// hasProperty(e, 'x')
// → e.x IS NOT NULL

References¶

Source code: crates/uni-query/src/query/rewrite/
Developer guide: crates/uni-query/src/query/rewrite/rules/README.md
Architecture document: docs/ARCH_QUERY_REWRITE.md
Temporal rules: crates/uni-query/src/query/rewrite/rules/temporal.rs

Summary¶

The query rewriting framework is a powerful optimization tool that:

✅ Transforms function calls into predicate expressions at compile time
✅ Enables full predicate pushdown to storage layer
✅ Provides extensible plugin architecture for new rules
✅ Preserves semantic correctness with graceful fallback
✅ Includes comprehensive temporal function rewrites

The framework significantly improves query performance while maintaining a clean separation from the core query engine.