Query Rewriting Framework¶
Uni includes a powerful query rewriting framework that transforms function calls into equivalent predicate expressions at compile time. This enables full predicate pushdown to storage, index utilization, and eliminates runtime function evaluation overhead.
Overview¶
The query rewriting framework is a general-purpose transformation system that operates on the Cypher AST before logical planning. It applies registered rewrite rules to function calls, converting them into simpler expressions that the storage layer can optimize.
Key Insight: Many functions can be expressed as simple predicate expressions. By rewriting at compile time, we eliminate function evaluation overhead and enable the storage layer (Lance/DataFusion) to filter data directly.
Example¶
// Original query with temporal function
MATCH (p:Person)-[e:EMPLOYED_BY]->(c:Company)
WHERE uni.temporal.validAt(e, 'start', 'end', datetime('2021-06-15'))
RETURN c.name
// Automatically rewritten to
MATCH (p:Person)-[e:EMPLOYED_BY]->(c:Company)
WHERE e.start <= datetime('2021-06-15')
AND (e.end IS NULL OR e.end >= datetime('2021-06-15'))
RETURN c.name
The rewritten form enables:
- Predicate pushdown to Lance/DataFusion
- Index utilization on start and end columns
- Native storage filtering instead of row-by-row evaluation
Architecture¶
Components¶
The framework consists of several key components located in crates/uni-query/src/query/rewrite/:
query/rewrite/
├── mod.rs # Public API (rewrite_query, rewrite_statement)
├── rule.rs # RewriteRule trait and constraint types
├── registry.rs # Global rule registry
├── walker.rs # Expression tree walker
├── context.rs # Rewrite context and configuration
├── error.rs # Error types
└── rules/ # Built-in rule implementations
├── mod.rs # Rule registration
├── temporal.rs # Temporal function rewrites
└── README.md # Developer guide for adding rules
RewriteRule Trait¶
All rewrite rules implement the RewriteRule trait:
pub trait RewriteRule: Send + Sync {
/// The fully-qualified function name to match
fn function_name(&self) -> &str;
/// Validate arguments before attempting rewrite
fn validate_args(&self, args: &[Expr]) -> Result<(), RewriteError>;
/// Perform the transformation
fn rewrite(&self, args: Vec<Expr>, ctx: &RewriteContext)
-> Result<Expr, RewriteError>;
/// Check if rule is applicable in current context (optional)
fn is_applicable(&self, ctx: &RewriteContext) -> bool {
true
}
}
Integration Point¶
The rewriting framework integrates into the query pipeline at the planning stage:
// In planner.rs
pub fn plan_with_scope(&self, query: Query, vars: Vec<String>) -> Result<LogicalPlan> {
// Apply query rewrites before planning
let rewritten_query = crate::query::rewrite::rewrite_query(query)?;
match rewritten_query {
Query::Single(stmt) => self.plan_single(stmt, vars),
// ... rest of planning
}
}
This ensures rewrites happen before logical planning, enabling all downstream optimizations.
Built-in Rewrites¶
Temporal Functions¶
Uni includes several temporal function rewrites that enable efficient time-based queries:
| Function | Transformation |
|---|---|
uni.temporal.validAt(e, 'start', 'end', ts) |
e.start <= ts AND (e.end IS NULL OR e.end >= ts) |
uni.temporal.overlaps(e, 'start', 'end', rs, re) |
e.start <= re AND (e.end IS NULL OR e.end >= rs) |
uni.temporal.precedes(e, 'end', ts) |
e.end < ts |
uni.temporal.succeeds(e, 'start', ts) |
e.start > ts |
uni.temporal.isOngoing(e, 'end') |
e.end IS NULL |
uni.temporal.hasClosed(e, 'end') |
e.end IS NOT NULL |
All temporal rewrites preserve three-valued logic and handle null values correctly (treating null end dates as "ongoing").
Adding New Rewrite Rules¶
Step 1: Implement the RewriteRule Trait¶
Create a new struct implementing RewriteRule:
use crate::query::rewrite::{RewriteRule, RewriteContext, RewriteError};
use crate::query::rewrite::rule::{Arity, ArgConstraints};
use uni_cypher::ast::{BinaryOp, Expr};
use serde_json::Value;
pub struct InRangeRule;
impl RewriteRule for InRangeRule {
fn function_name(&self) -> &str {
"uni.util.inRange"
}
fn validate_args(&self, args: &[Expr]) -> Result<(), RewriteError> {
// Validate: inRange(entity, 'property', min, max)
let constraints = ArgConstraints {
arity: Arity::Exact(4),
literal_args: vec![1], // Property name must be literal
entity_arg: Some(0), // First arg is entity
};
constraints.validate(args)
}
fn rewrite(&self, args: Vec<Expr>, _ctx: &RewriteContext)
-> Result<Expr, RewriteError>
{
let entity = args[0].clone();
let prop = extract_string_literal(&args[1])?;
let min = args[2].clone();
let max = args[3].clone();
// Rewrite to: entity.prop >= min AND entity.prop <= max
Ok(Expr::BinaryOp {
left: Box::new(Expr::BinaryOp {
left: Box::new(Expr::Property(Box::new(entity.clone()), prop.clone())),
op: BinaryOp::GtEq,
right: Box::new(min),
}),
op: BinaryOp::And,
right: Box::new(Expr::BinaryOp {
left: Box::new(Expr::Property(Box::new(entity), prop)),
op: BinaryOp::LtEq,
right: Box::new(max),
}),
})
}
}
Step 2: Register the Rule¶
Add to rules/mod.rs:
pub fn register_builtin_rules(registry: &mut RewriteRegistry) {
// Existing rules
registry.register(Arc::new(temporal::ValidAtRule));
// Your new rule
registry.register(Arc::new(util::InRangeRule));
}
Step 3: Write Tests¶
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_in_range_rewrite() {
let rule = InRangeRule;
let ctx = RewriteContext::default();
let args = vec![
Expr::Variable("p".into()),
Expr::Literal(Value::String("age".into())),
Expr::Literal(Value::Number(18.into())),
Expr::Literal(Value::Number(65.into())),
];
let result = rule.rewrite(args, &ctx).unwrap();
// Should produce AND expression
assert!(matches!(result, Expr::BinaryOp { op: BinaryOp::And, .. }));
}
}
Design Principles¶
1. Semantic Preservation¶
Rewrites must preserve the exact semantics of the original function:
- Three-valued logic: SQL/Cypher uses true/false/null
- Null handling: Must match original function behavior
- Type coercion: Preserve type semantics
Example of correct null handling:
// WRONG: Breaks when end is null
// e.start <= ts AND e.end >= ts
// CORRECT: Treats null end as "ongoing"
// e.start <= ts AND (e.end IS NULL OR e.end >= ts)
2. Declarative Validation¶
Use ArgConstraints for declarative argument validation:
let constraints = ArgConstraints {
arity: Arity::Exact(4), // Exact argument count
literal_args: vec![1, 2], // Indices that must be literals
entity_arg: Some(0), // Index of entity reference
};
constraints.validate(args)?;
3. Graceful Fallback¶
When rewriting cannot be applied, the framework falls back to scalar execution:
- Dynamic property names (e.g., parameterized:
$propName) - Complex expressions that can't be analyzed statically
- Missing context information
The framework automatically handles fallback without manual intervention.
4. Observability¶
The framework tracks detailed statistics:
pub struct RewriteStats {
pub functions_visited: usize,
pub functions_rewritten: usize,
pub functions_skipped: usize,
pub errors: Vec<RewriteError>,
pub rule_stats: HashMap<String, RuleStats>,
}
Enable verbose logging to see rewrite operations:
Performance Impact¶
Rewriting provides significant performance benefits:
Before Rewriting¶
MATCH (p)-[e:EMPLOYED_BY]->()
WHERE uni.temporal.validAt(e, 'start', 'end', datetime('2021-06-15'))
RETURN count(*)
- Function evaluated for every edge in the result set
- No predicate pushdown to storage
- Full table scan required
After Rewriting¶
MATCH (p)-[e:EMPLOYED_BY]->()
WHERE e.start <= datetime('2021-06-15')
AND (e.end IS NULL OR e.end >= datetime('2021-06-15'))
RETURN count(*)
- Predicates pushed down to Lance/DataFusion
- Storage-level filtering before materialization
- Can use indexes on
startandendcolumns - Significantly reduced data transfer
Testing Strategy¶
Unit Tests¶
Test rules in isolation:
#[test]
fn test_rule_validation() {
let rule = MyRule;
// Valid arguments
assert!(rule.validate_args(&valid_args).is_ok());
// Invalid arguments
assert!(rule.validate_args(&invalid_args).is_err());
}
Integration Tests¶
Test semantic equivalence:
#[tokio::test]
async fn test_semantic_equivalence() {
let db = setup_test_db().await?;
// Query with function (will be rewritten)
let result_function = db.query(
"MATCH (n) WHERE myFunc(n, 'prop', 42) RETURN count(*)"
).await?;
// Query with explicit predicate (baseline)
let result_predicate = db.query(
"MATCH (n) WHERE n.prop >= 42 RETURN count(*)"
).await?;
// Results must match
assert_eq!(result_function, result_predicate);
}
Common Patterns¶
Pattern 1: Property Comparison¶
// Transform: hasValue(e, 'prop', val)
// Into: e.prop = val
let entity = args[0].clone();
let prop = extract_string_literal(&args[1])?;
let value = args[2].clone();
Ok(Expr::BinaryOp {
left: Box::new(Expr::Property(Box::new(entity), prop)),
op: BinaryOp::Eq,
right: Box::new(value),
})
Pattern 2: Range Check¶
// Transform: inRange(e, 'prop', min, max)
// Into: e.prop >= min AND e.prop <= max
Ok(Expr::BinaryOp {
left: Box::new(Expr::BinaryOp {
left: Box::new(Expr::Property(Box::new(entity.clone()), prop.clone())),
op: BinaryOp::GtEq,
right: Box::new(min),
}),
op: BinaryOp::And,
right: Box::new(Expr::BinaryOp {
left: Box::new(Expr::Property(Box::new(entity), prop)),
op: BinaryOp::LtEq,
right: Box::new(max),
}),
})
Pattern 3: Null Check¶
// Transform: hasProperty(e, 'prop')
// Into: e.prop IS NOT NULL
Ok(Expr::IsNotNull(Box::new(Expr::Property(
Box::new(entity),
prop,
))))
Pattern 4: OR with Null¶
// Transform: validOrNull(e, 'prop', val)
// Into: e.prop IS NULL OR e.prop = val
Ok(Expr::BinaryOp {
left: Box::new(Expr::IsNull(Box::new(Expr::Property(
Box::new(entity.clone()),
prop.clone(),
)))),
op: BinaryOp::Or,
right: Box::new(Expr::BinaryOp {
left: Box::new(Expr::Property(Box::new(entity), prop)),
op: BinaryOp::Eq,
right: Box::new(value),
}),
})
Best Practices¶
1. Always Validate Arguments¶
Use ArgConstraints to ensure function calls can be safely rewritten:
fn validate_args(&self, args: &[Expr]) -> Result<(), RewriteError> {
let constraints = ArgConstraints {
arity: Arity::Exact(3),
literal_args: vec![1],
entity_arg: Some(0),
};
constraints.validate(args)
}
2. Preserve Null Semantics¶
Always consider how nulls should be handled:
// For temporal "valid at" queries:
// null end date means "ongoing" - should match!
e.start <= ts AND (e.end IS NULL OR e.end >= ts)
3. Keep Rewrites Simple¶
Rewritten expressions should be simple enough for storage layer optimization:
// Good: Simple comparisons
entity.start_date >= datetime('2021-01-01')
// Bad: Complex expressions that block pushdown
year(entity.start_date) >= 2021
4. Write Comprehensive Tests¶
Test all edge cases:
- Null values
- Boundary conditions
- Type mismatches
- Dynamic arguments (should fallback gracefully)
5. Document Transformations¶
Clearly document what each rule does:
/// Rewrite rule for uni.temporal.validAt
///
/// Transforms: uni.temporal.validAt(e, 'start', 'end', ts)
/// Into: e.start <= ts AND (e.end IS NULL OR e.end >= ts)
///
/// This preserves the semantics where a null end date means "ongoing".
pub struct ValidAtRule;
Future Extensions¶
The framework is designed to be extensible. Future rewrite rules could include:
Spatial Rewrites¶
Datetime Range Rewrites¶
// YEAR(e.created_at) = 2021
// → e.created_at >= datetime('2021-01-01')
// AND e.created_at < datetime('2022-01-01')
Property Pattern Rewrites¶
References¶
- Source code:
crates/uni-query/src/query/rewrite/ - Developer guide:
crates/uni-query/src/query/rewrite/rules/README.md - Architecture document:
docs/ARCH_QUERY_REWRITE.md - Temporal rules:
crates/uni-query/src/query/rewrite/rules/temporal.rs
Summary¶
The query rewriting framework is a powerful optimization tool that:
- ✅ Transforms function calls into predicate expressions at compile time
- ✅ Enables full predicate pushdown to storage layer
- ✅ Provides extensible plugin architecture for new rules
- ✅ Preserves semantic correctness with graceful fallback
- ✅ Includes comprehensive temporal function rewrites
The framework significantly improves query performance while maintaining a clean separation from the core query engine.