Rust Parser Implementation Plan
Overviewโ
The Rust parser implementation enables CodePrism to analyze its own source code, providing the ultimate "dogfooding" capability. This parser will handle Rust's unique features like ownership, traits, macros, and complex type system.
๐ฏ Primary Goal: Self-Analysisโ
Use Case: Enable codeprism to analyze its own Rust codebase for:
- Code quality assessment
- Dependency analysis
- Refactoring opportunities
- Architecture understanding
- Performance optimization insights
๐๏ธ Implementation Roadmapโ
Phase 1: Basic Structure (Week 1)โ
-
Crate Setup (
crates/codeprism-lang-rust/
)- Cargo.toml with tree-sitter-rust dependency
- Basic module structure following established pattern
- Initial error handling and types
-
Core Parser Implementation
- Tree-sitter integration with Rust grammar
- Language detection for
.rs
files - Basic incremental parsing support
Phase 2: AST Mapping (Week 2-3)โ
-
Basic Node Types
- Functions (
fn
,async fn
,const fn
,unsafe fn
) - Structs (
struct
,tuple struct
,unit struct
) - Enums (
enum
with variants) - Modules (
mod
,use
declarations) - Constants and static variables
- Functions (
-
Advanced Node Types
- Traits (
trait
,impl
blocks) - Generics and lifetime parameters
- Pattern matching (
match
,if let
,while let
) - Macros (
macro_rules!
, procedural macros)
- Traits (
Phase 3: Relationship Analysis (Week 4)โ
-
Basic Edges
- Function calls
- Module imports (
use
) - Struct field access
- Method calls
-
Advanced Edges
- Trait implementations
- Generic constraints
- Lifetime relationships
- Macro invocations
Phase 4: Rust-Specific Features (Week 5-6)โ
-
Ownership Analysis
- Borrow checker implications
- Move semantics
- Reference relationships
-
Type System
- Type aliases
- Associated types
- Where clauses
- Complex generics
๐ Detailed Implementation Guideโ
Crate Structureโ
crates/codeprism-lang-rust/
โโโ Cargo.toml
โโโ src/
โ โโโ lib.rs # Public API
โ โโโ parser.rs # Main parser implementation
โ โโโ ast_mapper.rs # CST to U-AST conversion
โ โโโ rust_nodes.rs # Rust-specific node handling
โ โโโ traits.rs # Trait and impl analysis
โ โโโ macros.rs # Macro analysis
โ โโโ types.rs # Type system analysis
โ โโโ patterns.rs # Pattern matching analysis
โ โโโ error.rs # Error handling
โโโ tests/
โ โโโ fixtures/
โ โ โโโ simple.rs # Basic Rust features
โ โ โโโ advanced.rs # Complex generics and traits
โ โ โโโ macros.rs # Macro usage
โ โ โโโ patterns.rs # Pattern matching
โ โ โโโ codeprism_sample.rs # Real codeprism code samples
โ โโโ integration_test.rs
โโโ benches/
โโโ parse_benchmark.rs
Cargo.tomlโ
[package]
name = "codeprism-lang-rust"
version.workspace = true
edition.workspace = true
authors.workspace = true
license.workspace = true
repository.workspace = true
rust-version.workspace = true
description = "Rust language support for codeprism - enables self-analysis"
[dependencies]
# Core dependencies
anyhow.workspace = true
thiserror.workspace = true
tracing.workspace = true
serde.workspace = true
serde_json.workspace = true
# Tree-sitter
tree-sitter.workspace = true
tree-sitter-rust.workspace = true
# CodeCodePrism types
blake3.workspace = true
hex.workspace = true
[dev-dependencies]
insta.workspace = true
tempfile.workspace = true
tokio = { workspace = true, features = ["test-util"] }
[build-dependencies]
cc = "1.0"
Key Implementation Challengesโ
1. Macro Analysisโ
// Challenge: Analyze macro invocations and expansions
// Examples from codeprism codebase:
tracing::info!("Starting server");
serde_json::json!({ "key": value });
Approach:
- Extract macro name and arguments
- Track macro definition locations
- Analyze macro usage patterns
2. Trait Implementation Analysisโ
// Challenge: Map trait bounds and implementations
impl<T: Clone + Debug> Display for Wrapper<T>
where
T: Send + Sync,
{
// Implementation
}
Approach:
- Extract trait names and bounds
- Map implementation relationships
- Track generic constraints
3. Pattern Matchingโ
// Challenge: Analyze complex pattern matching
match result {
Ok(ParseResult { nodes, edges, .. }) => {
// Handle success
}
Err(Error::Parse { file, message }) => {
// Handle parse error
}
}
Approach:
- Extract pattern structures
- Map variable bindings
- Track control flow
4. Module Systemโ
// Challenge: Track complex module relationships
use codeprism::{
ast::{Node, Edge},
parser::ParserEngine,
};
Approach:
- Parse
use
declarations - Track module hierarchy
- Map public/private visibility
Rust-Specific Node Typesโ
#[derive(Debug, Clone, PartialEq, Eq, Hash, Serialize, Deserialize)]
pub enum RustNodeKind {
// Basic items
Function,
Struct,
Enum,
Trait,
Impl,
Module,
// Type system
TypeAlias,
AssociatedType,
GenericParam,
LifetimeParam,
// Patterns
MatchArm,
Pattern,
// Macros
MacroDefinition,
MacroInvocation,
// Expressions
MethodCall,
FieldAccess,
TupleAccess,
// Statements
LetBinding,
UseDeclaration,
}
Rust-Specific Edge Typesโ
#[derive(Debug, Clone, PartialEq, Eq, Hash, Serialize, Deserialize)]
pub enum RustEdgeKind {
// Trait relationships
Implements, // impl Trait for Type
TraitBound, // T: Trait
// Ownership
Borrows, // &value
MutBorrows, // &mut value
Moves, // ownership transfer
// Type relationships
HasType, // variable: Type
GenericArg, // Vec<T>
// Macro relationships
Expands, // macro expansion
Invokes, // macro call
// Module system
ReExports, // pub use
Imports, // use path
}
๐งช Testing Strategyโ
Unit Testsโ
-
Parser Tests
- Basic Rust syntax parsing
- Error recovery
- Incremental updates
-
AST Mapper Tests
- Node extraction accuracy
- Edge relationship correctness
- Rust-specific feature handling
Integration Testsโ
-
Real Code Analysis
- Parse actual codeprism source files
- Verify extracted relationships
- Performance benchmarks
-
Self-Analysis Tests
- Analyze codeprism-lang-rust itself
- Cross-reference with known structure
- Validate completeness
Test Fixturesโ
tests/fixtures/simple.rs
โ
// Basic Rust features for testing
use std::collections::HashMap;
pub struct User {
pub name: String,
age: u32,
}
impl User {
pub fn new(name: String, age: u32) -> Self {
Self { name, age }
}
pub fn greet(&self) -> String {
format!("Hello, I'm {}", self.name)
}
}
pub fn create_user(name: &str, age: u32) -> User {
User::new(name.to_string(), age)
}
tests/fixtures/advanced.rs
โ
// Advanced Rust features
use std::marker::PhantomData;
pub trait Parser<T> {
type Error;
type Output;
fn parse(&self, input: T) -> Result<Self::Output, Self::Error>;
}
pub struct LanguageParser<L>
where
L: Language + Clone,
{
language: L,
_phantom: PhantomData<L>,
}
impl<L> Parser<&str> for LanguageParser<L>
where
L: Language + Clone + Send + Sync,
{
type Error = ParseError;
type Output = ParseResult;
fn parse(&self, input: &str) -> Result<Self::Output, Self::Error> {
// Implementation
todo!()
}
}
tests/fixtures /codeprism_sample.rs
โ
// Real codeprism code sample for testing
use anyhow::Result;
use std::collections::HashMap;
use serde::{Deserialize, Serialize};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Node {
pub id: NodeId,
pub kind: NodeKind,
pub name: String,
pub span: Span,
}
impl Node {
pub fn new(
repo_id: &str,
kind: NodeKind,
name: String,
span: Span,
) -> Self {
let id = NodeId::generate(repo_id, &span, &kind);
Self { id, kind, name, span }
}
}
๐ Integration with Existing Systemโ
Registry Integrationโ
// In crates /codeprism/src/parser/mod.rs
impl LanguageRegistry {
pub fn new() -> Self {
let mut registry = Self::default();
// Register existing parsers
#[cfg(feature = "javascript")]
registry.register_javascript();
#[cfg(feature = "python")]
registry.register_python();
// Register Rust parser
#[cfg(feature = "rust")]
registry.register_rust();
registry
}
#[cfg(feature = "rust")]
fn register_rust(&mut self) {
use codeprism_lang_rust::RustLanguageParser;
self.register(Box::new(RustLanguageParser::new()));
}
}
MCP Server Integrationโ
The Rust parser will automatically be available through the MCP server for:
- Repository analysis including Rust files
- Cross-language dependency tracking
- Self-analysis capabilities
CLI Integrationโ
# Analyze codeprism itself
export REPOSITORY_PATH=/path/to/codeprism && ./target/release/codeprism --mcp
# Focus on Rust files only
prism analyze --language rust /path/to /codeprism
๐ Success Metricsโ
Functionality Metricsโ
- Parse 100% of codeprism Rust source files without errors
- Extract 95%+ of function/struct/trait definitions
- Correctly identify 90%+ of function calls and dependencies
- Handle complex generics and trait bounds
Performance Metricsโ
- Parse codeprism codebase (~50k LOC) in < 2 seconds
- Incremental updates < 10ms for typical file changes
- Memory usage < 100MB for full codeprism analysis
Self-Analysis Capabilitiesโ
- Generate accurate module dependency graph
- Identify circular dependencies
- Extract trait implementation hierarchy
- Analyze macro usage patterns
๐ฏ Future Enhancementsโ
Advanced Analysisโ
-
Ownership Analysis
- Track borrow checker implications
- Identify potential memory issues
- Suggest ownership optimizations
-
Performance Analysis
- Identify allocation patterns
- Suggest performance improvements
- Track async/await usage
-
Architecture Analysis
- Module cohesion metrics
- Trait design patterns
- API surface analysis
Integration Featuresโ
-
IDE Integration
- Real-time analysis in IDEs
- Refactoring suggestions
- Code quality metrics
-
CI/CD Integration
- Automated architecture checks
- Dependency drift detection
- Code quality gates
๐ Benefits for codeprism Projectโ
Immediate Benefitsโ
- Self-Analysis: Understand codeprism's own architecture
- Quality Assurance: Automated code quality checks
- Refactoring Support: Safe restructuring with dependency awareness
Long-term Benefitsโ
- Architecture Evolution: Track and guide architectural changes
- Performance Optimization: Data-driven performance improvements
- Educational Value: Demonstrate codeprism capabilities on complex Rust code
Community Benefitsโ
- Reference Implementation: Example of advanced Rust parsing
- Open Source Contribution: Enhance tree-sitter-rust ecosystem
- Tool Validation: Real-world validation of codeprism capabilities
This implementation plan provides a comprehensive roadmap for adding Rust parser support to codeprism, enabling powerful self-analysis capabilities while following established patterns and maintaining high code quality standards.