Troubleshooting Guide

Comprehensive guide for diagnosing and resolving common issues with the MCP Test Harness.

📋 Overview

This guide covers common problems, error messages, debugging techniques, and solutions to help you quickly resolve issues with the MCP Test Harness.

New to the Test Harness? Check the Installation Guide for setup instructions, or see the Quick Start Guide for basic usage.

🚨 Quick Diagnosis

Is your issue in this category?

⚙️ Configuration - YAML syntax, missing fields, invalid values
🖥️ Server Connection - Cannot connect to or start MCP server
🧪 Test Execution - Tests failing, timeouts, validation errors
📊 Performance - Slow execution, memory issues, resource limits
🔐 Security - Authentication, authorization, compliance problems

Emergency Checklist

If tests are completely failing, check these first:

Basic connectivity: Can you connect to your MCP server manually?
Configuration syntax: Is your YAML configuration valid?
Required fields: Are all required configuration fields present?
File paths: Do all referenced files and directories exist?
Permissions: Does the test harness have necessary file/network permissions?
Resource availability: Is there enough memory/disk/network capacity?

⚙️ Configuration Issues

Issue: Invalid YAML Syntax

Symptoms:

❌ Configuration Error: Invalid YAML syntax in test-harness.yaml at line 15
   Error: mapping values are not allowed here

Solutions:

Check YAML Indentation:

# ❌ Incorrect indentation
global:
max_global_concurrency: 4
  global_timeout_seconds: 300

# ✅ Correct indentation
global:
  max_global_concurrency: 4
  global_timeout_seconds: 300

Validate YAML Online:
- Use online YAML validator: https://yamlchecker.com/
- Copy your configuration and check for syntax errors
Use YAML Linter:

# Install yamllint
pip install yamllint

# Check configuration
yamllint test-harness.yaml

Issue: Missing Required Fields

Symptoms:

❌ Configuration Error: Missing required field 'server.start_command'

Solutions:

Add Missing Fields:

server:
  start_command: "cargo run --bin my-mcp-server"  # Add this line
  args: ["stdio"]

Use Configuration Template:

# Generate a valid template
mcp-test-harness template --minimal --output template.yaml
# Copy required fields from template

Check Configuration Reference:
- See Configuration Reference for all required fields

Issue: Invalid Field Values

Symptoms:

❌ Configuration Error: Invalid value for 'global.max_global_concurrency': 0
   Expected: Integer between 1 and 32

Solutions:

Fix Value Ranges:

global:
  max_global_concurrency: 4  # Must be 1-32
  global_timeout_seconds: 300  # Must be 1-3600

Check Data Types:

# ❌ Incorrect types
global:
  max_global_concurrency: "4"    # String instead of integer
  fail_fast: "false"             # String instead of boolean

# ✅ Correct types
global:
  max_global_concurrency: 4      # Integer
  fail_fast: false               # Boolean

Issue: File Path Not Found

Symptoms:

❌ Configuration Error: Script file not found: 'scripts/setup.py'

Solutions:

Check File Exists:

ls -la scripts/setup.py

Use Correct Relative Paths:

# ❌ Incorrect path
setup_script: "setup.py"

# ✅ Correct path
setup_script: "scripts/setup.py"
# or use absolute path
setup_script: "/home/user/project/scripts/setup.py"

Create Missing Files:

mkdir -p scripts
touch scripts/setup.py
chmod +x scripts/setup.py

🖥️ Server Connection Issues

Issue: Cannot Connect to MCP Server

Symptoms:

❌ Connection Error: Failed to connect to MCP server
   Transport: stdio
   Command: cargo run --bin my-mcp-server
   Error: No such file or directory

Solutions:

Verify Server Command:

# Test server command manually
cargo run --bin my-mcp-server stdio

# If command fails, check:
# - Is the binary name correct?
# - Is the project built?
# - Are you in the right directory?

Check Working Directory:

server:
  start_command: "cargo run --bin my-mcp-server"
  working_dir: "/path/to/server/project"  # Add this

Use Absolute Paths:

server:
  start_command: "/home/user/.cargo/bin/cargo"
  args: ["run", "--bin", "my-mcp-server", "stdio"]

Issue: Server Startup Timeout

Symptoms:

❌ Server Error: Server startup timeout after 30 seconds
   Server command: cargo run --bin my-mcp-server
   Status: Still starting

Solutions:

Increase Startup Timeout:

server:
  startup_timeout_seconds: 60  # Increase from default 30

Check Server Logs:

# Run server manually to see error messages
cargo run --bin my-mcp-server stdio

Optimize Server Build:

# Use release build for faster startup
cargo build --release

server:
  start_command: "cargo run --release --bin my-mcp-server"

Issue: HTTP Connection Refused

Symptoms:

❌ Connection Error: Connection refused
   URL: http://localhost:3000
   Error: Cannot connect to server

Solutions:

Check Server is Running:

# Check if server is listening
netstat -tlnp | grep 3000
# or
ss -tlnp | grep 3000

# Test connection manually
curl http://localhost:3000/health

Verify URL and Port:

server:
  transport: "http"
  url: "http://localhost:3000"  # Check port is correct

Check Firewall Settings:

# Check firewall rules
sudo ufw status
sudo iptables -L

# Allow port if needed
sudo ufw allow 3000

Issue: WebSocket Connection Failed

Symptoms:

❌ WebSocket Error: Connection failed
   URL: ws://localhost:3000
   Error: HTTP 404 Not Found

Solutions:

Check WebSocket Endpoint:

server:
  transport: "websocket"
  url: "ws://localhost:3000/ws"  # Add correct path

Test WebSocket Manually:

# Use websocat to test
websocat ws://localhost:3000

# Or use browser dev tools

Check Server WebSocket Support:
- Verify your MCP server supports WebSocket transport
- Check server documentation for correct WebSocket URL format

🧪 Test Execution Issues

Issue: Test Timeout

Symptoms:

❌ FAIL: repository_stats
   Duration: 30000ms (TIMEOUT)
   Error: Test exceeded maximum execution time

Solutions:

Increase Test Timeout:

test_cases:
  - id: "repository_stats"
    timeout_override_seconds: 60  # Increase timeout

Check Server Performance:

# Monitor server resource usage
top -p $(pgrep my-mcp-server)
htop

Optimize Test Data:

# Use smaller test datasets
input_params:
  path: "test-projects/small"  # Instead of large project

Issue: Validation Failure

Symptoms:

❌ FAIL: repository_stats
   Validation Failure:
   - Expected field 'result.total_files' not found
   - Response: {"error": "Path not found"}

Solutions:

Check Server Response:

# Debug server response manually
echo '{"tool": "repository_stats", "arguments": {"path": "/test/project"}}' | cargo run --bin my-mcp-server stdio

Update Validation Patterns:

# Handle error responses
expected:
  patterns:
    - key: "result.total_files"
      validation: { type: "exists" }
      required: false  # Make optional if server might return errors

Fix Input Parameters:

input_params:
  path: "${default_project_path}"  # Use valid path

Issue: Script Execution Failed

Symptoms:

❌ Script Error: Custom validation script failed
   Script: scripts/validate_response.py
   Exit Code: 1
   Error: ModuleNotFoundError: No module named 'json'

Solutions:

Check Script Dependencies:

# Test script manually
python3 scripts/validate_response.py < test_input.json

# Install missing dependencies
pip3 install -r scripts/requirements.txt

Fix Script Permissions:

chmod +x scripts/validate_response.py

Use Correct Python Path:

custom_scripts:
  - script: "scripts/validate_response.py"
    engine: "python3"  # Specify interpreter

Issue: Concurrent Test Failures

Symptoms:

❌ Multiple tests failing with resource exhaustion
   Error: Cannot create thread: Resource temporarily unavailable

Solutions:

Reduce Concurrency:

global:
  max_global_concurrency: 2  # Reduce from default 4

test_suites:
  - name: "my-suite"
    parallel_execution: false  # Disable parallel execution

Increase Resource Limits:

# Increase file descriptor limit
ulimit -n 4096

# Increase process limit
ulimit -u 2048

Add Resource Monitoring:

environment:
  limits:
    max_memory_mb: 512  # Set reasonable limits
    max_open_files: 256

📊 Performance Issues

Issue: Slow Test Execution

Symptoms:

Performance Warning: Test suite taking longer than expected
Average response time: 15 seconds (baseline: 2 seconds)

Solutions:

Profile Server Performance:

# Use profiling tools
perf record -g cargo run --bin my-mcp-server
perf report

# Monitor system resources
iostat 1
vmstat 1

Optimize Test Configuration:

# Reduce test parallelism if causing contention
global:
  max_global_concurrency: 1

# Use smaller test datasets
input_params:
  max_results: 100  # Limit result size

Update Performance Baselines:

# Re-establish baselines if hardware changed
mcp-test-harness benchmark --establish-baseline --config prod.yaml

Issue: Memory Exhaustion

Symptoms:

❌ Resource Error: Memory limit exceeded
   Used: 2048 MB
   Limit: 1024 MB
   Test: large_repository_analysis

Solutions:

Increase Memory Limits:

environment:
  limits:
    max_memory_mb: 4096  # Increase limit

# Or per-test basis
performance:
  max_memory_usage_mb: 2048

Optimize Memory Usage:

# Process data in chunks
input_params:
  batch_size: 100
  stream_results: true

Monitor Memory Usage:

# Check system memory
free -h
htop

# Check for memory leaks
valgrind --tool=memcheck cargo run --bin my-mcp-server

Issue: Disk Space Exhaustion

Symptoms:

❌ Storage Error: No space left on device
   Available: 0 MB
   Required: 500 MB

Solutions:

Clean Up Temporary Files:

# Clean test artifacts
rm -rf test-reports/old-reports/
rm -rf /tmp/mcp-test-*

# Clean up Docker if using containers
docker system prune -a

Configure Cleanup:

global:
  cleanup_on_success: true
  cleanup_on_failure: false  # Keep for debugging

reporting:
  retention_days: 30  # Automatically clean old reports

Use External Storage:

reporting:
  output_dir: "/mnt/external/test-reports"  # Use external disk

🔐 Security Issues

Issue: Authentication Failed

Symptoms:

❌ Authentication Error: Invalid credentials
   Status: 401 Unauthorized
   Error: Authentication token invalid or expired

Solutions:

Check Token Configuration:

server:
  headers:
    Authorization: "Bearer ${API_TOKEN}"  # Verify token format

# Set environment variable
export API_TOKEN="your-actual-token"

Verify Token Validity:

# Test authentication manually
curl -H "Authorization: Bearer $API_TOKEN" http://localhost:3000/auth/verify

Update Authentication Method:

# Use different auth method if needed
server:
  transport: "http"
  headers:
    X-API-Key: "${API_KEY}"  # Alternative auth header

Issue: Certificate Verification Failed

Symptoms:

❌ TLS Error: Certificate verification failed
   Error: self signed certificate in certificate chain

Solutions:

Disable Certificate Verification (Development Only):

server:
  tls:
    verify_certificates: false  # ⚠️ Only for development

Add Custom CA Bundle:

server:
  tls:
    ca_bundle_path: "/path/to/custom-ca.pem"

Use Self-Signed Certificate:

# Create self-signed certificate for testing
openssl req -x509 -newkey rsa:4096 -keyout server.key -out server.crt -days 365 -nodes

Issue: Audit Log Errors

Symptoms:

❌ Audit Error: Cannot write to audit log
   Path: /var/log/mcp-test-harness/audit.log
   Error: Permission denied

Solutions:

Fix Log Directory Permissions:

sudo mkdir -p /var/log/mcp-test-harness
sudo chown $USER:$USER /var/log/mcp-test-harness
chmod 755 /var/log/mcp-test-harness

Use Alternative Log Path:

security:
  audit_log_path: "./logs/audit.log"  # Use local directory

Disable Audit Logging (Temporary):

security:
  enable_audit_logging: false  # Temporary workaround

📈 Monitoring Issues

Issue: Metrics Collection Failed

Symptoms:

❌ Metrics Error: Cannot connect to Prometheus push gateway
   URL: http://localhost:9091
   Error: Connection refused

Solutions:

Start Prometheus Push Gateway:

# Start push gateway
docker run -d -p 9091:9091 prom/pushgateway

# Or install locally
prometheus-pushgateway --web.listen-address=:9091

Disable Metrics Collection:

monitoring:
  enabled: false  # Temporary workaround

Use Alternative Metrics Endpoint:

monitoring:
  prometheus:
    push_gateway:
      url: "http://monitoring-server:9091"  # Different server

Issue: Alert Rules Not Working

Symptoms:

⚠️ Alert Warning: No alerts triggered despite test failures
Expected: Failure rate alert
Actual: No alerts sent

Solutions:

Check Alert Configuration:

monitoring:
  alerting:
    enabled: true  # Ensure alerting is enabled
    rules:
      - name: "test_failure_rate"
        condition: "test_failure_rate > 0.1"  # Check condition logic

Test Alert Channels:

# Test email configuration
echo "Test email" | mail -s "Alert Test" team@company.com

# Test Slack webhook
curl -X POST -H 'Content-type: application/json' \
  --data '{"text":"Test alert"}' \
  $SLACK_WEBHOOK_URL

Check Alert Thresholds:

# Lower thresholds for testing
monitoring:
  alerts:
    error_rate_threshold: 0.01  # 1% instead of 5%

🔧 Advanced Debugging

Enable Debug Logging

# Maximum debug output
RUST_LOG=debug mcp-test-harness test --verbose --config my-config.yaml

# Trace level logging
RUST_LOG=trace mcp-test-harness test -vvv --config my-config.yaml

Analyze Log Files

# View recent logs
tail -f ~/.mcp-test-harness/logs/test-harness.log

# Search for specific errors
grep "ERROR" ~/.mcp-test-harness/logs/test-harness.log

# Analyze performance
grep "PERF" ~/.mcp-test-harness/logs/test-harness.log | tail -20

Network Debugging

# Monitor network traffic
sudo tcpdump -i lo -A port 3000

# Check network connections
netstat -tulpn | grep mcp

# Test connectivity
telnet localhost 3000

Process Debugging

# Monitor process tree
pstree -p mcp-test-harness

# Check file descriptors
lsof -p $(pgrep mcp-test-harness)

# Monitor system calls
strace -p $(pgrep mcp-test-harness)

Configuration Debugging

# Validate configuration
mcp-test-harness validate --comprehensive --config my-config.yaml

# Dry run to see execution plan
mcp-test-harness test --dry-run --config my-config.yaml

# Export effective configuration
mcp-test-harness config export --config my-config.yaml

🆘 Getting Additional Help

Collecting Diagnostic Information

When reporting issues, collect this information:

#!/bin/bash
# diagnostic-info.sh

echo "=== System Information ==="
uname -a
cat /etc/os-release

echo "=== MCP Test Harness Version ==="
mcp-test-harness --version

echo "=== Configuration Validation ==="
mcp-test-harness validate --config test-harness.yaml

echo "=== Resource Usage ==="
free -h
df -h

echo "=== Network Connectivity ==="
netstat -tulpn | grep -E ":(3000|8080|9090)"

echo "=== Recent Errors ==="
tail -20 ~/.mcp-test-harness/logs/test-harness.log | grep ERROR

echo "=== Environment Variables ==="
env | grep -E "MCP|RUST|PATH"

Creating Minimal Reproduction

Create a minimal configuration that reproduces the issue:

# minimal-repro.yaml
global:
  max_global_concurrency: 1
  log_level: "debug"

server:
  start_command: "echo"  # Simple command for testing
  args: ["hello"]
  transport: "stdio"

test_suites:
  - name: "simple-test"
    test_cases:
      - id: "basic-test"
        tool_name: "echo"
        input_params: {}
        expected:
          patterns:
            - key: "result"
              validation: { type: "exists" }

Support Channels

GitHub Issues: Report bugs and issues
GitHub Discussions: Community Q&A
Documentation: Complete reference
Email Support: Direct technical support

Issue Template

When reporting issues, use this template:

## Problem Description
Brief description of the issue

## Environment
- OS: Ubuntu 22.04
- MCP Test Harness Version: 1.0.0
- Server Type: HTTP/stdio/WebSocket
- Server Implementation: Custom/CodePrism/Other

## Configuration
```yaml
# Your configuration (sanitized)

Expected Behavior

What you expected to happen

Actual Behavior

What actually happened

Error Messages

Complete error messages and stack traces

Steps to Reproduce

Step 1
Step 2
Step 3

Additional Context

Any other relevant information

---

## 📚 Additional Resources

- [User Guide](user-guide.md) - Complete user manual
- [Configuration Reference](configuration-reference.md) - Complete configuration documentation
- [Production Deployment](production-deployment.md) - Enterprise deployment guide
- [Developer Guide](developer-guide.md) - Extending the test harness

**Last Updated**: 2025-01-07  
**Version**: 1.0.0 

📋 Overview​

🚨 Quick Diagnosis​

Is your issue in this category?​

Emergency Checklist​

⚙️ Configuration Issues​

Issue: Invalid YAML Syntax​

Issue: Missing Required Fields​

Issue: Invalid Field Values​

Issue: File Path Not Found​

🖥️ Server Connection Issues​

Issue: Cannot Connect to MCP Server​

Issue: Server Startup Timeout​

Issue: HTTP Connection Refused​

Issue: WebSocket Connection Failed​

🧪 Test Execution Issues​

Issue: Test Timeout​

Issue: Validation Failure​

Issue: Script Execution Failed​

Issue: Concurrent Test Failures​

📊 Performance Issues​

Issue: Slow Test Execution​

Issue: Memory Exhaustion​

Issue: Disk Space Exhaustion​

🔐 Security Issues​

Issue: Authentication Failed​

Issue: Certificate Verification Failed​

Issue: Audit Log Errors​

📈 Monitoring Issues​

Issue: Metrics Collection Failed​

Issue: Alert Rules Not Working​

🔧 Advanced Debugging​

Enable Debug Logging​

Analyze Log Files​

Network Debugging​

Process Debugging​

Configuration Debugging​

🆘 Getting Additional Help​

Collecting Diagnostic Information​

Creating Minimal Reproduction​

Support Channels​

Issue Template​

Expected Behavior​

Actual Behavior​

Error Messages​

Steps to Reproduce​

Additional Context​

📋 Overview

🚨 Quick Diagnosis

Is your issue in this category?

Emergency Checklist

⚙️ Configuration Issues

Issue: Invalid YAML Syntax

Issue: Missing Required Fields

Issue: Invalid Field Values

Issue: File Path Not Found

🖥️ Server Connection Issues

Issue: Cannot Connect to MCP Server

Issue: Server Startup Timeout

Issue: HTTP Connection Refused

Issue: WebSocket Connection Failed

🧪 Test Execution Issues

Issue: Test Timeout

Issue: Validation Failure

Issue: Script Execution Failed

Issue: Concurrent Test Failures

📊 Performance Issues

Issue: Slow Test Execution

Issue: Memory Exhaustion

Issue: Disk Space Exhaustion

🔐 Security Issues

Issue: Authentication Failed

Issue: Certificate Verification Failed

Issue: Audit Log Errors

📈 Monitoring Issues

Issue: Metrics Collection Failed

Issue: Alert Rules Not Working

🔧 Advanced Debugging

Enable Debug Logging

Analyze Log Files

Network Debugging

Process Debugging

Configuration Debugging

🆘 Getting Additional Help

Collecting Diagnostic Information

Creating Minimal Reproduction

Support Channels

Issue Template

Expected Behavior

Actual Behavior

Error Messages

Steps to Reproduce

Additional Context