Skip to main content

CodePrism Technical Architecture

This document provides detailed technical architecture and design principles of CodePrism's graph-based code intelligence system.

Looking for a general overview? See the Introduction or MCP Server Overview for user-focused information.

Core Design Philosophyโ€‹

Graph-First Intelligenceโ€‹

Code relationships are stored and queried as a graph, not flat syntax trees. This enables:

  • Cross-language linking - References between files in different languages
  • Relationship analysis - Understanding dependencies, calls, and data flow
  • Efficient queries - Fast traversal of code relationships
  • Incremental updates - Adding/updating nodes without full rebuilds

Language-Agnostic Universal ASTโ€‹

All language parsers convert to a unified node representation:

  • Common node types across all languages (Function, Class, Variable, etc.)
  • Consistent relationships regardless of source language
  • Extensible design for easy addition of new languages
  • Cross-language analysis capabilities

System Architectureโ€‹

Core Componentsโ€‹

MCP Protocol Layerโ€‹

Purpose: Standard Model Context Protocol communication
Implementation: JSON-RPC 2.0 over stdin/stdout
Responsibilities:

  • Capability negotiation with clients
  • Tool and resource request routing
  • Structured error handling with context
  • Real-time notifications for resource changes

Analysis Tools Engineโ€‹

Purpose: Provide 23 production-ready code analysis capabilities
Architecture: Plugin-based tool system
Key Features:

  • Parallel execution for batch operations
  • Caching for expensive computations
  • Result aggregation and formatting
  • Workflow optimization suggestions

Code Intelligence Engineโ€‹

Purpose: Parse, analyze, and maintain code graph
Components:

  • Parser Framework: Pluggable language-specific parsers
  • Universal AST Graph: In-memory graph structure using DashMap
  • Symbol Resolution: Cross-file and cross-language linking
  • Incremental Updates: Efficient re-parsing on file changes

Data Flow Architectureโ€‹

Repository Initialization Flowโ€‹

Tool Execution Flowโ€‹

Real-Time Update Flowโ€‹

Storage Architectureโ€‹

In-Memory Graph Structureโ€‹

// Simplified representation
pub struct CodeGraph {
nodes: DashMap<NodeId, Node>,
edges: DashMap<EdgeId, Edge>,
indexes: GraphIndexes,
}

pub struct GraphIndexes {
by_file: HashMap<PathBuf, Vec<NodeId>>,
by_symbol: HashMap<String, Vec<NodeId>>,
by_type: HashMap<NodeKind, Vec<NodeId>>,
dependencies: HashMap<NodeId, Vec<NodeId>>,
}

Design Decisions:

  • DashMap for concurrent access across threads
  • Hierarchical indexes for fast queries by different criteria
  • LRU caching for expensive analysis results
  • Optional persistence for faster startup (future enhancement)

Memory Managementโ€‹

  • Lazy loading - Parse files only when needed
  • Smart caching - LRU eviction for parsed ASTs
  • Reference counting - Automatic cleanup of unused nodes
  • Memory limits - Configurable bounds with graceful degradation

Performance Characteristicsโ€‹

Target Performance Metricsโ€‹

OperationTarget LatencyMeasured Performance
Repository scan (1K files)< 2 seconds1.2 seconds
Simple tool query< 100ms45ms average
Complex analysis< 500ms320ms average
File change update< 250ms180ms average

Optimization Strategiesโ€‹

  1. Parallel parsing during initialization
  2. Incremental updates for file changes
  3. Multi-level caching (parse results, analysis results, formatted responses)
  4. Query optimization through graph indexes
  5. Batch operations for multiple related queries

Security & Isolation Modelโ€‹

Sandboxed Executionโ€‹

  • Read-only access to specified repository directory
  • Path validation prevents access outside repository
  • Resource limits on memory and CPU usage
  • Error isolation - parser failures don't crash server

Data Protectionโ€‹

  • No persistent storage of code content by default
  • Local processing - no external network calls
  • Minimal privileges - runs with user permissions only
  • Input sanitization for all file paths and parameters

Language Support Architectureโ€‹

Parser Framework Designโ€‹

pub trait LanguageParser: Send + Sync {
fn parse_file(&self, context: ParseContext) -> Result<ParseResult>;
fn supported_extensions(&self) -> &[&str];
fn language_name(&self) -> &str;
fn incremental_update(&self, old_tree: &Tree, edit: &InputEdit) -> Result<Tree>;
}

Universal AST Node Typesโ€‹

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum NodeKind {
Module, // File/module level
Function, // Functions/methods
Class, // Classes/types
Variable, // Variables/constants
Import, // Import/include statements
Call, // Function calls
Reference, // Symbol references
}

Currently Supported Languagesโ€‹

LanguageParserStatusAST Coverage
JavaScript/TypeScriptTree-sitterโœ… Complete95%+
PythonTree-sitterโœ… Complete90%+
RustTree-sitter๐Ÿšง In Progress70%
JavaTree-sitter๐Ÿ“‹ Planned-

Extensibility Pointsโ€‹

Adding New Languagesโ€‹

  1. Implement LanguageParser trait in new crate
  2. Define AST mapping from Tree-sitter CST to Universal AST
  3. Add language detection logic for file extensions
  4. Register parser in main engine
  5. Add comprehensive tests for language features

Custom Analysis Toolsโ€‹

pub trait AnalysisTool: Send + Sync {
fn name(&self) -> &str;
fn description(&self) -> &str;
fn parameters(&self) -> serde_json::Value;
fn execute(&self, graph: &CodeGraph, params: &serde_json::Value) -> Result<ToolResult>;
}

Future Architecture Enhancementsโ€‹

Distributed Analysis (Planned)โ€‹

  • Cluster coordination for very large repositories
  • Horizontal scaling with work distribution
  • Shared cache across multiple instances
  • Load balancing for concurrent clients

Persistent Storage (Optional)โ€‹

  • Database backend for enterprise deployments
  • Incremental persistence for faster startup
  • Query optimization through database indexes
  • Backup and recovery capabilities

Next Steps: See Current Status for implementation details, or Roadmap for planned enhancements.