VectorAgent¶
The VectorAgent
provides document indexing and vector search capabilities within a Rustic AI guild, enabling semantic retrieval of information from text documents.
Purpose¶
This agent serves as a bridge between your application and vector databases. It handles document processing, indexing, and vector similarity search, making it a core component for knowledge retrieval systems.
When to Use¶
Use the VectorAgent
when your application needs to:
- Store and retrieve documents based on semantic similarity
- Create a knowledge base for LLM context augmentation
- Implement retrieval-augmented generation (RAG) patterns
- Build search functionality based on meaning rather than keywords
- Process and index documents from various sources
Dependencies¶
The VectorAgent
requires:
- vectorstore: A vector database implementation for storing embeddings
- textsplitter: A text splitting implementation for breaking documents into chunks
- embeddings: An embeddings provider for converting text to vectors
Message Types¶
Input Messages¶
IngestDocuments¶
A request to index documents for later retrieval:
class IngestDocuments(BaseModel):
links: List[MediaLink] # List of documents to ingest
namespace: Optional[str] = None # Optional namespace for organizing documents
metadata: Optional[Dict[str, Any]] = None # Additional metadata for all documents
chunk_metadata: Optional[Dict[str, Any]] = None # Metadata for individual chunks
VectorSearchQuery¶
A request to search for documents by similarity:
class VectorSearchQuery(BaseModel):
query: str # The search query text
namespace: Optional[str] = None # Namespace to search within
limit: int = 5 # Maximum number of results to return
filter: Optional[Dict[str, Any]] = None # Filter criteria for search
Output Messages¶
VectorSearchResults¶
The results of a vector search:
class VectorSearchResults(BaseModel):
query: str # The original query
results: List[Document] # The matching documents
namespace: Optional[str] = None # The namespace that was searched
Each Document
includes:
- content: The text content
- metadata: Document metadata
- score: Similarity score to the query
IngestCompleted¶
Sent when document ingestion is completed:
class IngestCompleted(BaseModel):
namespace: Optional[str] = None # The namespace documents were added to
count: int # Number of documents ingested
chunk_count: int # Number of chunks created from documents
Behavior¶
Document Ingestion¶
- The agent receives an
IngestDocuments
message with document links - For each document:
- The content is loaded from the file or URL
- The text is split into chunks using the text splitter dependency
- Each chunk is converted to embeddings using the embeddings dependency
- The embeddings are stored in the vector store with associated metadata
- The agent emits an
IngestCompleted
message with statistics
Vector Search¶
- The agent receives a
VectorSearchQuery
message - The query text is converted to embeddings
- A similarity search is performed in the vector store
- The most similar documents are returned in a
VectorSearchResults
message
Sample Usage¶
from rustic_ai.core.guild.builders import AgentBuilder
from rustic_ai.core.guild.agent_ext.depends.dependency_resolver import DependencySpec
from rustic_ai.core.agents.indexing.vector_agent import VectorAgent
# Set up dependencies
vectorstore = DependencySpec(
class_name="rustic_ai.chroma.agent_ext.vectorstore.ChromaResolver",
properties={}
)
embeddings = DependencySpec(
class_name="rustic_ai.langchain.agent_ext.embeddings.openai.OpenAIEmbeddingsResolver",
properties={}
)
textsplitter = DependencySpec(
class_name="rustic_ai.langchain.agent_ext.text_splitter.recursive_splitter.RecursiveSplitterResolver",
properties={
"conf": {
"chunk_size": 1000,
"chunk_overlap": 200
}
}
)
# Create the agent spec
vector_agent_spec = (
AgentBuilder(VectorAgent)
.set_id("vector_agent")
.set_name("Vector Agent")
.set_description("Agent for document indexing and semantic search")
.build_spec()
)
# Add dependencies and agent to guild
guild_builder.add_dependency("vectorstore", vectorstore)
guild_builder.add_dependency("embeddings", embeddings)
guild_builder.add_dependency("textsplitter", textsplitter)
guild_builder.add_agent_spec(vector_agent_spec)
Example Document Ingestion¶
from rustic_ai.core.agents.commons.media import MediaLink
from rustic_ai.core.agents.indexing.vector_agent import IngestDocuments
# Create document links
documents = [
MediaLink(
url="docs/introduction.md",
name="Introduction",
on_filesystem=True,
mimetype="text/markdown"
),
MediaLink(
url="docs/api_reference.md",
name="API Reference",
on_filesystem=True,
mimetype="text/markdown"
)
]
# Create ingestion request
ingest_request = IngestDocuments(
links=documents,
namespace="documentation",
metadata={"source": "official_docs", "version": "1.0"}
)
# Send to the agent
client.publish("default_topic", ingest_request)
Example Vector Search¶
from rustic_ai.core.agents.indexing.vector_agent import VectorSearchQuery
# Create search query
search_query = VectorSearchQuery(
query="How do I create a custom agent?",
namespace="documentation",
limit=3
)
# Send to the agent
client.publish("default_topic", search_query)
Integration with RAG Pattern¶
The VectorAgent is commonly used as part of a Retrieval-Augmented Generation (RAG) pattern:
- Documents are ingested and indexed using the VectorAgent
- User queries are sent to a coordinator agent
- The coordinator agent:
- Sends a vector search query to find relevant context
- Combines the retrieved context with the user query
- Sends the augmented query to an LLM agent
- The LLM agent generates a response using the retrieved context
This pattern significantly improves the quality and factuality of LLM responses for domain-specific applications.
Notes and Limitations¶
- The quality of vector search depends on the embeddings model used
- Performance depends on the vector store implementation
- Document chunking strategies significantly impact search quality
- Large document collections require more storage and compute resources
- Consider metadata filtering to improve search efficiency in large collections