Image2ImageAgent¶
The Image2ImageAgent
provides image-to-image transformation capabilities within a Rustic AI guild using the Pix2Pix model, enabling conditional image generation and editing.
Purpose¶
This agent allows transformation of source images based on text instructions, performing tasks like style transfer, image modification, or feature addition. It uses the Pix2Pix model, which specializes in conditional image-to-image translation.
When to Use¶
Use the Image2ImageAgent
when your application needs to:
- Transform existing images based on text instructions
- Edit images in a controlled manner
- Apply specific modifications to visual content
- Generate variations of existing images
- Implement controlled image manipulation based on natural language
Configuration¶
The Image2ImageAgent
is configured through the Pix2PixProps
class, which allows setting:
class Pix2PixProps(PyTorchAgentProps):
model_id: str = "timbrooks/instruct-pix2pix" # The model ID to load from Hugging Face
The agent inherits from the PyTorchAgentProps
base class, which provides:
class PyTorchAgentProps(BaseAgentProps):
torch_device: str = "cuda" if torch.cuda.is_available() else "cpu" # Device to run the model on
Dependencies¶
The Image2ImageAgent
requires:
- filesystem (Guild-level dependency): A file system implementation for storing transformed images
Message Types¶
Input Messages¶
ImageEditRequest¶
A request to transform an image:
class ImageEditRequest(BaseModel):
source_image: MediaLink # The source image to transform
instruction: str # Text instruction describing the desired transformation
num_inference_steps: int = 20 # Number of inference steps (higher = better quality)
image_guidance_scale: float = 1.5 # How closely to follow the source image
guidance_scale: float = 7.5 # How closely to follow the instruction
strength: float = 0.8 # Amount of noise added (higher = more transformation)
image_format: str = "png" # Output format for the image
width: Optional[int] = None # Optional width to resize to
height: Optional[int] = None # Optional height to resize to
Output Messages¶
ImageEditResponse¶
Sent when image transformation is completed:
class ImageEditResponse(BaseModel):
files: List[MediaLink] # List of transformed image files
errors: List[str] # Any errors that occurred during processing
request: str # The original request (JSON string)
Each MediaLink
contains:
- url: Path to the generated image file
- name: Filename (UUID-based)
- mimetype: Content type (image/png)
- on_filesystem: Always True for generated images
- metadata: Contains information about the transformation
ErrorMessage¶
Sent when image transformation fails:
class ErrorMessage(BaseModel):
agent_type: str
error_type: str # "ImageTransformationError"
error_message: str
Behavior¶
- The agent receives an
ImageEditRequest
with a source image and instruction - The source image is loaded and preprocessed
- The image is transformed according to the instruction using the Pix2Pix model
- The transformed image is saved to a file with a generated UUID filename
- A
MediaLink
is created for the transformed image - An
ImageEditResponse
containing the image link is sent - If any errors occur, an
ErrorMessage
is sent
Sample Usage¶
from rustic_ai.core.guild.builders import AgentBuilder
from rustic_ai.core.guild.agent_ext.depends.dependency_resolver import DependencySpec
from rustic_ai.huggingface.agents.diffusion.pix2pix import Image2ImageAgent, Pix2PixProps
# Define a file system dependency
filesystem = DependencySpec(
class_name="rustic_ai.core.guild.agent_ext.depends.filesystem.FileSystemResolver",
properties={
"path_base": "/tmp",
"protocol": "file",
"storage_options": {
"auto_mkdir": True,
},
},
)
# Create the agent spec
pix2pix_agent_spec = (
AgentBuilder(Image2ImageAgent)
.set_id("image_editor")
.set_name("Image Editor")
.set_description("Transforms images based on text instructions using Pix2Pix")
.set_properties(
Pix2PixProps(
model_id="timbrooks/instruct-pix2pix", # Default model
torch_device="cuda:0" # Specify GPU device if needed
)
)
.build_spec()
)
# Add dependency to guild when launching
guild_builder.add_dependency("filesystem", filesystem)
guild_builder.add_agent_spec(pix2pix_agent_spec)
Example Request¶
from rustic_ai.core.agents.commons.media import MediaLink
from rustic_ai.huggingface.agents.models import ImageEditRequest
# Create an image edit request
edit_request = ImageEditRequest(
source_image=MediaLink(
url="/path/to/source/image.jpg",
name="source_image.jpg",
on_filesystem=True,
mimetype="image/jpeg"
),
instruction="Make it look like a winter scene with snow",
guidance_scale=9.0, # Higher guidance for more faithful adherence to instruction
strength=0.7 # Slightly lower strength to preserve more of the original
)
# Send to the agent
client.publish("default_topic", edit_request)
Technical Details¶
The agent uses:
- Hugging Face's diffusers
library with the StableDiffusionInstructPix2PixPipeline
- Tim Brooks' Instruct-Pix2Pix model
- PyTorch for tensor operations
- PIL for image manipulation
- Automatic hardware detection to use GPU when available
Notes and Limitations¶
- Requires significant VRAM to run efficiently (at least 8GB recommended)
- Performance is much better with a GPU
- Quality of transformations depends heavily on the clarity of instructions
- Works best with certain types of transformations (e.g., style changes, weather effects)
- First-time initialization may take longer as models are downloaded
- Larger images require more memory and processing time
- The
strength
parameter controls how much of the original image is preserved versus changed - The
image_guidance_scale
controls how closely the output adheres to the input image structure