Skip to main content

Large Repository Handling Guide

This guide explains how to handle large repositories with the CodePrism MCP server, including memory management, filtering options, and performance optimization.

New Improved Defaults โœจโ€‹

Good news! The MCP server now has much better defaults based on real-world usage:

  • Memory limit: 4GB (was 1GB)
  • Batch size: 30 files (was 50, optimized for memory efficiency)
  • File filtering: Automatically includes common programming languages
  • Directory exclusion: Common build/dependency directories excluded by default

This means for most repositories, you can simply run:

export CODEPRISM_PROFILE=development
export REPOSITORY_PATH=/path/to/your/repository
./target/release/codeprism --mcp

Default Configurationโ€‹

Memory & Performanceโ€‹

  • Memory limit: 4GB (suitable for modern development machines)
  • Batch size: 30 files (optimized balance of speed and memory usage)
  • Streaming mode: Automatically enabled for repositories with >10,000 files

File Filtering (New Defaults)โ€‹

Excluded directories (automatic):

  • Version Control: .git
  • Package Management: node_modules, vendor
  • Build Artifacts: target, build, dist, coverage
  • Python Virtual Environments: .venv, venv, .tox, .env, env
  • Python Caches: __pycache__, .pytest_cache, .mypy_cache, .ruff_cache
  • Web Build Artifacts: .next, .nuxt
  • IDE/Editor: .vscode, .idea
  • OS Files: .DS_Store, Thumbs.db

Included file extensions (automatic):

  • Web: js, ts, jsx, tsx
  • Systems: rs, c, cpp, h, hpp, go
  • Enterprise: java, kt, swift
  • Scripting: py, rb, php

When to Override Defaultsโ€‹

Dependency Analysis Optionsโ€‹

New feature! ๐ŸŽฏ You can now control how dependencies are handled:

Decision Tree: Choose Your Dependency Modeโ€‹

1. Minimal (Default) - Fast but Limitedโ€‹

export CODEPRISM_PROFILE=development
export REPOSITORY_PATH=/path/to/repo
./target/release/codeprism --mcp
  • Excludes: All dependency directories (.tox, venv, node_modules, etc.)
  • Pros: Fast indexing, low memory usage, focuses on your code
  • Cons: Can't follow imports into dependencies, missing external API intelligence

2. Smart Dependency Scanning - Balancedโ€‹

export CODEPRISM_PROFILE=development
export CODEPRISM_DEPENDENCY_MODE=smart
export REPOSITORY_PATH=/path/to/repo
./target/release/codeprism --mcp
  • Includes: Public APIs and commonly used dependency files only
  • Excludes: Internal implementation details, tests, documentation
  • Includes: __init__.py, index.js, lib.rs, top-level modules
  • Excludes: */tests/, */internal/, */_private/, deep nested files
  • Best for: Following imports while keeping performance reasonable

3. Complete Analysis - Comprehensive but Slowโ€‹

export CODEPRISM_PROFILE=production
export CODEPRISM_DEPENDENCY_MODE=include_all
export REPOSITORY_PATH=/path/to/repo
./target/release/codeprism --mcp
  • Includes: Everything including full dependency source code
  • Pros: Complete code intelligence, full import following, comprehensive analysis
  • Cons: Much slower indexing, high memory usage, may hit memory limits

Comparison Examplesโ€‹

Your specific use case: Following imports in your Agent class

# Your code can now follow these imports in smart mode:
from rustic_ai.core.agents.commons.message_formats import ErrorMessage โœ…
from pydantic import BaseModel โœ… (public API)
from some_lib.internal._private import InternalHelper โŒ (excluded in smart mode)
  1. Start with smart mode for most development:
export CODEPRISM_PROFILE=development
export CODEPRISM_MEMORY_LIMIT_MB=4096
export CODEPRISM_DEPENDENCY_MODE=smart
export REPOSITORY_PATH=/path/to/repo
./target/release/codeprism --mcp
  1. Use complete analysis when you need full dependency intelligence:
export CODEPRISM_PROFILE=production
export CODEPRISM_MEMORY_LIMIT_MB=8192
export CODEPRISM_DEPENDENCY_MODE=include_all
export REPOSITORY_PATH=/path/to/repo
./target/release/codeprism --mcp
  1. Use minimal mode for CI/CD or when performance is critical:
export CODEPRISM_PROFILE=development
export REPOSITORY_PATH=/path/to/repo
./target/release/codeprism --mcp # Default

Very Large Repositories (>50,000 files)โ€‹

export CODEPRISM_PROFILE=enterprise
export CODEPRISM_MEMORY_LIMIT_MB=8192
export CODEPRISM_BATCH_SIZE=20
export REPOSITORY_PATH=/path/to/huge/repo
./target/release/codeprism --mcp

Specific Language Focusโ€‹

# Only Python and JavaScript
export CODEPRISM_PROFILE=development
export CODEPRISM_INCLUDE_EXTENSIONS="py,js,ts"
export REPOSITORY_PATH=/path/to/repo
./target/release/codeprism --mcp

# Only Rust projects
export CODEPRISM_PROFILE=development
export CODEPRISM_INCLUDE_EXTENSIONS="rs,toml"
export REPOSITORY_PATH=/path/to/repo
./target/release/codeprism --mcp

Memory-Constrained Systemsโ€‹

# For systems with limited RAM
export CODEPRISM_PROFILE=development
export CODEPRISM_MEMORY_LIMIT_MB=2048
export CODEPRISM_BATCH_SIZE=15
export REPOSITORY_PATH=/path/to/repo
./target/release/codeprism --mcp

Include All File Typesโ€‹

# Override defaults to include all files (not recommended for large repos)
export CODEPRISM_PROFILE=development
export CODEPRISM_INCLUDE_EXTENSIONS="*"
export REPOSITORY_PATH=/path/to/repo
./target/release/codeprism --mcp

Environment Variables Referenceโ€‹

Environment VariableDefaultDescription
CODEPRISM_PROFILEdevelopmentConfiguration profile (development/production/enterprise)
CODEPRISM_MEMORY_LIMIT_MB4096Memory limit in MB
CODEPRISM_BATCH_SIZE30Files processed in parallel
CODEPRISM_MAX_FILE_SIZE_MB10Skip files larger than this
CODEPRISM_EXCLUDE_DIRSSmart defaultsComma-separated directories to exclude
CODEPRISM_INCLUDE_EXTENSIONSProgramming languagesComma-separated file extensions
CODEPRISM_DEPENDENCY_MODEminimalDependency analysis mode (minimal/smart/include_all)
REPOSITORY_PATHRequiredPath to repository to analyze
RUST_LOGinfoLogging level (debug/info/warn/error)

Default Excluded Directoriesโ€‹

  • .git, node_modules, target, .venv, __pycache__, build, dist, vendor
  • .tox, venv, .env, env, .pytest_cache, .mypy_cache, .ruff_cache
  • .next, .nuxt, coverage, .coverage, .vscode, .idea, .DS_Store, Thumbs.db

Default Included Extensionsโ€‹

  • py, js, ts, jsx, tsx, rs, java, cpp, c, h, hpp, go, php, rb, kt, swift

Migration from Old Command-Line Interfaceโ€‹

If you were using the old command-line interface, here's how to migrate:

# Old command-line approach
# ./target/release/codeprism-mcp --memory-limit 1024 --include-extensions "*" /path/to/repo

# New environment variable approach
export CODEPRISM_PROFILE=development
export CODEPRISM_MEMORY_LIMIT_MB=1024
export CODEPRISM_INCLUDE_EXTENSIONS="*"
export REPOSITORY_PATH=/path/to/repo
./target/release/codeprism --mcp

Examplesโ€‹

# Uses optimized defaults - works for most repositories
export CODEPRISM_PROFILE=development
export REPOSITORY_PATH=/path/to/your/repository
./target/release/codeprism --mcp

Custom Configuration Examplesโ€‹

# Minimal memory usage
export CODEPRISM_PROFILE=development
export CODEPRISM_MEMORY_LIMIT_MB=1024
export CODEPRISM_BATCH_SIZE=15
export REPOSITORY_PATH=/path/to/repo
./target/release/codeprism --mcp

# Maximum performance (if you have lots of RAM)
export CODEPRISM_PROFILE=enterprise
export CODEPRISM_MEMORY_LIMIT_MB=16384
export CODEPRISM_BATCH_SIZE=50
export REPOSITORY_PATH=/path/to/repo
./target/release/codeprism --mcp

# Use production profile for complete analysis
export CODEPRISM_PROFILE=production
export REPOSITORY_PATH=/path/to/repo
./target/release/codeprism --mcp

The new defaults should handle most real-world repositories without any configuration needed!