MarvinAgent¶

The MarvinAgent is a specialized agent that provides text classification and data extraction capabilities using Prefect's Marvin library, enabling structured information retrieval from unstructured text.

Purpose¶

This agent serves as a bridge to Marvin's AI-powered classification and extraction capabilities. It can categorize text into predefined categories and extract structured data based on specified schemas, turning unstructured text into structured information.

When to Use¶

Use the MarvinAgent when your application needs to:

Classify text into predefined categories
Extract structured information from unstructured text
Convert natural language to structured data
Implement intent recognition in conversational systems
Parse complex text into standardized formats
Extract entities, attributes, or key information from text

Message Types¶

Input Messages¶

ClassifyRequest¶

A request to classify text into one of several categories:

class ClassifyRequest(BaseModel):
    source_text: str  # The text to classify
    categories: List[str]  # List of possible categories
    instructions: Optional[str] = None  # Optional additional instructions

ExtractRequest¶

A request to extract structured data from text:

class ExtractRequest(BaseModel):
    source_text: str  # The text to extract data from
    extraction_spec: ExtractionSpec  # Specification for extraction

Where ExtractionSpec defines:

class ExtractionSpec(BaseModel):
    extraction_class: Type  # Pydantic model defining the data structure to extract
    extraction_instructions: Optional[str] = None  # Optional instructions

ClassifyAndExtractRequest¶

A request to both classify text and extract relevant data based on the classification:

class ClassifyAndExtractRequest(BaseModel):
    source_text: str  # The text to process
    categories: List[str]  # Possible categories for classification
    classification_instructions: Optional[str] = None  # Instructions for classification
    category_to_extraction_class: Dict[str, Type]  # Mapping categories to extraction models
    category_to_extraction_instructions: Dict[str, str] = {}  # Instructions for each category

Output Messages¶

ClassifyResponse¶

Response to a classification request:

class ClassifyResponse(BaseModel):
    source_text: str  # The original text
    category: str  # The identified category

ExtractResponse¶

Response to an extraction request:

class ExtractResponse(BaseModel):
    source_text: str  # The original text
    extracted_data: Any  # The extracted structured data

ClassifyAndExtractResponse¶

Response to a combined classify and extract request:

class ClassifyAndExtractResponse(BaseModel):
    source_text: str  # The original text
    category: str  # The identified category
    extracted_data: Any  # The extracted data

Behavior¶

Classification¶

The agent receives a ClassifyRequest with text and possible categories
It calls Marvin's classification function to identify the most appropriate category
The agent returns a ClassifyResponse with the identified category

Extraction¶

The agent receives an ExtractRequest with text and an extraction specification
It calls Marvin's extraction function to parse the text into the specified structure
The agent returns an ExtractResponse with the extracted data

Combined Classification and Extraction¶

The agent receives a ClassifyAndExtractRequest with text, categories, and extraction specifications
It first classifies the text into one of the provided categories
Based on the classification, it selects the appropriate extraction model
It extracts structured data using the selected model
The agent returns a ClassifyAndExtractResponse with both the category and extracted data

Sample Usage¶

from rustic_ai.core.guild.builders import AgentBuilder
from rustic_ai.marvin.classifier_agent import MarvinAgent

# Create the agent spec
marvin_agent_spec = (
    AgentBuilder(MarvinAgent)
    .set_id("classifier_extractor")
    .set_name("Text Analyzer")
    .set_description("Classifies text and extracts structured data")
    .build_spec()
)

# Add to guild
guild_builder.add_agent_spec(marvin_agent_spec)

Example Classification Request¶

from rustic_ai.core.agents.commons import ClassifyRequest

# Define some categories
categories = ["complaint", "inquiry", "feedback", "support_request"]

# Create a classification request
classify_request = ClassifyRequest(
    source_text="I've been trying to access my account for two days but keep getting an error message. Can someone help me?",
    categories=categories,
    instructions="Classify customer service messages"
)

# Send to the agent
client.publish("default_topic", classify_request)

Example Extraction Request¶

from pydantic import BaseModel
from rustic_ai.core.agents.commons import ExtractRequest, ExtractionSpec

# Define an extraction model
class CustomerIssue(BaseModel):
    problem_type: str
    duration: str
    urgency_level: str
    requires_account_access: bool

# Create an extraction request
extract_request = ExtractRequest(
    source_text="I've been trying to access my account for two days but keep getting an error message. Can someone help me?",
    extraction_spec=ExtractionSpec(
        extraction_class=CustomerIssue,
        extraction_instructions="Extract details about customer service issues"
    )
)

# Send to the agent
client.publish("default_topic", extract_request)

Technical Details¶

The agent uses: - Prefect's Marvin library for classification and extraction - Asynchronous processing for classification requests - Synchronous processing for combined classification and extraction

Notes and Limitations¶

Requires the Marvin library, which in turn uses LLMs for its functionality
Quality of classification and extraction depends on the clarity of instructions
Classification works best with well-defined, distinct categories
Extraction is more reliable with clearly structured data in the source text
More complex extraction schemas may require more detailed instructions
For best results, provide clear examples in extraction instructions
Marvin may use API calls to external LLMs, which could have rate limits or costs
Classification and extraction quality depend on the underlying LLM used by Marvin