MarvinAgent¶
The MarvinAgent
is a specialized agent that provides text classification and data extraction capabilities using Prefect's Marvin library, enabling structured information retrieval from unstructured text.
Purpose¶
This agent serves as a bridge to Marvin's AI-powered classification and extraction capabilities. It can categorize text into predefined categories and extract structured data based on specified schemas, turning unstructured text into structured information.
When to Use¶
Use the MarvinAgent
when your application needs to:
- Classify text into predefined categories
- Extract structured information from unstructured text
- Convert natural language to structured data
- Implement intent recognition in conversational systems
- Parse complex text into standardized formats
- Extract entities, attributes, or key information from text
Message Types¶
Input Messages¶
ClassifyRequest¶
A request to classify text into one of several categories:
class ClassifyRequest(BaseModel):
source_text: str # The text to classify
categories: List[str] # List of possible categories
instructions: Optional[str] = None # Optional additional instructions
ExtractRequest¶
A request to extract structured data from text:
class ExtractRequest(BaseModel):
source_text: str # The text to extract data from
extraction_spec: ExtractionSpec # Specification for extraction
Where ExtractionSpec
defines:
class ExtractionSpec(BaseModel):
extraction_class: Type # Pydantic model defining the data structure to extract
extraction_instructions: Optional[str] = None # Optional instructions
ClassifyAndExtractRequest¶
A request to both classify text and extract relevant data based on the classification:
class ClassifyAndExtractRequest(BaseModel):
source_text: str # The text to process
categories: List[str] # Possible categories for classification
classification_instructions: Optional[str] = None # Instructions for classification
category_to_extraction_class: Dict[str, Type] # Mapping categories to extraction models
category_to_extraction_instructions: Dict[str, str] = {} # Instructions for each category
Output Messages¶
ClassifyResponse¶
Response to a classification request:
class ClassifyResponse(BaseModel):
source_text: str # The original text
category: str # The identified category
ExtractResponse¶
Response to an extraction request:
class ExtractResponse(BaseModel):
source_text: str # The original text
extracted_data: Any # The extracted structured data
ClassifyAndExtractResponse¶
Response to a combined classify and extract request:
class ClassifyAndExtractResponse(BaseModel):
source_text: str # The original text
category: str # The identified category
extracted_data: Any # The extracted data
Behavior¶
Classification¶
- The agent receives a
ClassifyRequest
with text and possible categories - It calls Marvin's classification function to identify the most appropriate category
- The agent returns a
ClassifyResponse
with the identified category
Extraction¶
- The agent receives an
ExtractRequest
with text and an extraction specification - It calls Marvin's extraction function to parse the text into the specified structure
- The agent returns an
ExtractResponse
with the extracted data
Combined Classification and Extraction¶
- The agent receives a
ClassifyAndExtractRequest
with text, categories, and extraction specifications - It first classifies the text into one of the provided categories
- Based on the classification, it selects the appropriate extraction model
- It extracts structured data using the selected model
- The agent returns a
ClassifyAndExtractResponse
with both the category and extracted data
Sample Usage¶
from rustic_ai.core.guild.builders import AgentBuilder
from rustic_ai.marvin.classifier_agent import MarvinAgent
# Create the agent spec
marvin_agent_spec = (
AgentBuilder(MarvinAgent)
.set_id("classifier_extractor")
.set_name("Text Analyzer")
.set_description("Classifies text and extracts structured data")
.build_spec()
)
# Add to guild
guild_builder.add_agent_spec(marvin_agent_spec)
Example Classification Request¶
from rustic_ai.core.agents.commons import ClassifyRequest
# Define some categories
categories = ["complaint", "inquiry", "feedback", "support_request"]
# Create a classification request
classify_request = ClassifyRequest(
source_text="I've been trying to access my account for two days but keep getting an error message. Can someone help me?",
categories=categories,
instructions="Classify customer service messages"
)
# Send to the agent
client.publish("default_topic", classify_request)
Example Extraction Request¶
from pydantic import BaseModel
from rustic_ai.core.agents.commons import ExtractRequest, ExtractionSpec
# Define an extraction model
class CustomerIssue(BaseModel):
problem_type: str
duration: str
urgency_level: str
requires_account_access: bool
# Create an extraction request
extract_request = ExtractRequest(
source_text="I've been trying to access my account for two days but keep getting an error message. Can someone help me?",
extraction_spec=ExtractionSpec(
extraction_class=CustomerIssue,
extraction_instructions="Extract details about customer service issues"
)
)
# Send to the agent
client.publish("default_topic", extract_request)
Technical Details¶
The agent uses: - Prefect's Marvin library for classification and extraction - Asynchronous processing for classification requests - Synchronous processing for combined classification and extraction
Notes and Limitations¶
- Requires the Marvin library, which in turn uses LLMs for its functionality
- Quality of classification and extraction depends on the clarity of instructions
- Classification works best with well-defined, distinct categories
- Extraction is more reliable with clearly structured data in the source text
- More complex extraction schemas may require more detailed instructions
- For best results, provide clear examples in extraction instructions
- Marvin may use API calls to external LLMs, which could have rate limits or costs
- Classification and extraction quality depend on the underlying LLM used by Marvin