CodePrism Technical Architecture
This document provides detailed technical architecture and design principles of CodePrism's graph-based code intelligence system.
Looking for a general overview? See the Introduction or MCP Server Overview for user-focused information.
Core Design Philosophyโ
Graph-First Intelligenceโ
Code relationships are stored and queried as a graph, not flat syntax trees. This enables:
- Cross-language linking - References between files in different languages
- Relationship analysis - Understanding dependencies, calls, and data flow
- Efficient queries - Fast traversal of code relationships
- Incremental updates - Adding/updating nodes without full rebuilds
Language-Agnostic Universal ASTโ
All language parsers convert to a unified node representation:
- Common node types across all languages (Function, Class, Variable, etc.)
- Consistent relationships regardless of source language
- Extensible design for easy addition of new languages
- Cross-language analysis capabilities
System Architectureโ
Core Componentsโ
MCP Protocol Layerโ
Purpose: Standard Model Context Protocol communication
Implementation: JSON-RPC 2.0 over stdin/stdout
Responsibilities:
- Capability negotiation with clients
- Tool and resource request routing
- Structured error handling with context
- Real-time notifications for resource changes
Analysis Tools Engineโ
Purpose: Provide 23 production-ready code analysis capabilities
Architecture: Plugin-based tool system
Key Features:
- Parallel execution for batch operations
- Caching for expensive computations
- Result aggregation and formatting
- Workflow optimization suggestions
Code Intelligence Engineโ
Purpose: Parse, analyze, and maintain code graph
Components:
- Parser Framework: Pluggable language-specific parsers
- Universal AST Graph: In-memory graph structure using DashMap
- Symbol Resolution: Cross-file and cross-language linking
- Incremental Updates: Efficient re-parsing on file changes
Data Flow Architectureโ
Repository Initialization Flowโ
Tool Execution Flowโ
Real-Time Update Flowโ
Storage Architectureโ
In-Memory Graph Structureโ
// Simplified representation
pub struct CodeGraph {
nodes: DashMap<NodeId, Node>,
edges: DashMap<EdgeId, Edge>,
indexes: GraphIndexes,
}
pub struct GraphIndexes {
by_file: HashMap<PathBuf, Vec<NodeId>>,
by_symbol: HashMap<String, Vec<NodeId>>,
by_type: HashMap<NodeKind, Vec<NodeId>>,
dependencies: HashMap<NodeId, Vec<NodeId>>,
}
Design Decisions:
- DashMap for concurrent access across threads
- Hierarchical indexes for fast queries by different criteria
- LRU caching for expensive analysis results
- Optional persistence for faster startup (future enhancement)
Memory Managementโ
- Lazy loading - Parse files only when needed
- Smart caching - LRU eviction for parsed ASTs
- Reference counting - Automatic cleanup of unused nodes
- Memory limits - Configurable bounds with graceful degradation
Performance Characteristicsโ
Target Performance Metricsโ
Operation | Target Latency | Measured Performance |
---|---|---|
Repository scan (1K files) | < 2 seconds | 1.2 seconds |
Simple tool query | < 100ms | 45ms average |
Complex analysis | < 500ms | 320ms average |
File change update | < 250ms | 180ms average |
Optimization Strategiesโ
- Parallel parsing during initialization
- Incremental updates for file changes
- Multi-level caching (parse results, analysis results, formatted responses)
- Query optimization through graph indexes
- Batch operations for multiple related queries
Security & Isolation Modelโ
Sandboxed Executionโ
- Read-only access to specified repository directory
- Path validation prevents access outside repository
- Resource limits on memory and CPU usage
- Error isolation - parser failures don't crash server
Data Protectionโ
- No persistent storage of code content by default
- Local processing - no external network calls
- Minimal privileges - runs with user permissions only
- Input sanitization for all file paths and parameters
Language Support Architectureโ
Parser Framework Designโ
pub trait LanguageParser: Send + Sync {
fn parse_file(&self, context: ParseContext) -> Result<ParseResult>;
fn supported_extensions(&self) -> &[&str];
fn language_name(&self) -> &str;
fn incremental_update(&self, old_tree: &Tree, edit: &InputEdit) -> Result<Tree>;
}
Universal AST Node Typesโ
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum NodeKind {
Module, // File/module level
Function, // Functions/methods
Class, // Classes/types
Variable, // Variables/constants
Import, // Import/include statements
Call, // Function calls
Reference, // Symbol references
}
Currently Supported Languagesโ
Language | Parser | Status | AST Coverage |
---|---|---|---|
JavaScript/TypeScript | Tree-sitter | โ Complete | 95%+ |
Python | Tree-sitter | โ Complete | 90%+ |
Rust | Tree-sitter | ๐ง In Progress | 70% |
Java | Tree-sitter | ๐ Planned | - |
Extensibility Pointsโ
Adding New Languagesโ
- Implement LanguageParser trait in new crate
- Define AST mapping from Tree-sitter CST to Universal AST
- Add language detection logic for file extensions
- Register parser in main engine
- Add comprehensive tests for language features
Custom Analysis Toolsโ
pub trait AnalysisTool: Send + Sync {
fn name(&self) -> &str;
fn description(&self) -> &str;
fn parameters(&self) -> serde_json::Value;
fn execute(&self, graph: &CodeGraph, params: &serde_json::Value) -> Result<ToolResult>;
}
Future Architecture Enhancementsโ
Distributed Analysis (Planned)โ
- Cluster coordination for very large repositories
- Horizontal scaling with work distribution
- Shared cache across multiple instances
- Load balancing for concurrent clients
Persistent Storage (Optional)โ
- Database backend for enterprise deployments
- Incremental persistence for faster startup
- Query optimization through database indexes
- Backup and recovery capabilities
Next Steps: See Current Status for implementation details, or Roadmap for planned enhancements.