Working with Images
Send images to your HUMA agent for visual understanding and context.
Overview
HUMA agents can process images using vision-capable AI models. There are three ways to send images:
Inline Message Images
Attach images directly to messages. The agent sees them in the conversation context, just like a human would see images shared in chat.
Context Images
Persistent images stored in agent state. Useful for reference images the agent should always have access to (e.g., game board, UI screenshot).
Tool Result Images
Attach images to tool results. Useful for tools that capture screenshots, generate images, or fetch visual content.
Image Format
HUMA accepts images as base64 data URLs. This is the simplest format - no file hosting required.
// Format: data:[mediaType];base64,[base64Data]
// Example PNG
"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAAB..."
// Example JPEG
"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD..."
// Example WebP
"data:image/webp;base64,UklGRiQAAABXRUJQVlA4IBgAAAAw..."Supported Formats
PNG, JPEG, WebP, and GIF are supported. For best results, keep images under 4MB and use reasonable dimensions (under 4096px).
Converting Files to Data URLs
Use the browser's FileReader API to convert uploaded files:
// Convert a File to data URL
function fileToDataUrl(file) {
return new Promise((resolve, reject) => {
const reader = new FileReader();
reader.onload = () => resolve(reader.result);
reader.onerror = reject;
reader.readAsDataURL(file);
});
}
// Usage with file input
const input = document.querySelector('input[type="file"]');
input.addEventListener('change', async (e) => {
const file = e.target.files[0];
const dataUrl = await fileToDataUrl(file);
// dataUrl is ready to send to HUMA
console.log(dataUrl); // "data:image/png;base64,iVBOR..."
});React Example
function ImageUploader({ onImageReady }) {
const handleFileChange = async (e) => {
const file = e.target.files?.[0];
if (!file) return;
const reader = new FileReader();
reader.onload = () => {
onImageReady(reader.result); // data URL string
};
reader.readAsDataURL(file);
};
return (
type="file"
accept="image/*"
onChange={handleFileChange}
/>
);
}Node.js Example
import fs from 'fs';
import path from 'path';
function fileToDataUrl(filePath) {
const buffer = fs.readFileSync(filePath);
const base64 = buffer.toString('base64');
const ext = path.extname(filePath).slice(1);
const mimeType = ext === 'jpg' ? 'jpeg' : ext;
return `data:image/${mimeType};base64,${base64}`;
}
const dataUrl = fileToDataUrl('./screenshot.png');Inline Message Images
Attach images directly to new-message events. The agent sees these images alongside the message text.
socket.emit('message', {
type: 'new-message',
content: {
id: 'msg-123',
content: 'What do you see in this image?',
images: [
{
url: 'data:image/png;base64,iVBORw0KGgo...',
alt: 'Screenshot of game board' // Optional description
}
],
participant: {
id: 'user-1',
nickname: 'Alice'
},
createdAt: new Date().toISOString(),
type: 'message'
}
});Image Attachment Schema
interface ImageAttachment {
url: string; // Base64 data URL (required)
alt?: string; // Description for accessibility/context (optional)
}
// Multiple images are supported
const message = {
content: 'Compare these two screenshots',
images: [
{ url: 'data:image/png;base64,...', alt: 'Before' },
{ url: 'data:image/png;base64,...', alt: 'After' }
]
};Image Limit
The agent sees the 5 most recent images from conversation history. Older images are excluded to manage context size.
Context Images
Context images are persistent images stored in the agent's state. Use them for reference images the agent should always have access to.
- Game board state
- UI screenshot for reference
- Map or diagram
- Character appearance
- User-shared photos
- One-time screenshots
- Conversation attachments
Adding Context Images
Send an add-context-image event:
socket.emit('message', {
type: 'add-context-image',
content: {
imageId: 'board-state', // Unique identifier
url: 'data:image/png;base64,...', // Base64 data URL
description: 'Current game board state', // Required description
category: 'game' // Optional category for organization
}
});Removing Context Images
Send a remove-context-image event:
socket.emit('message', {
type: 'remove-context-image',
content: {
imageId: 'board-state' // Same ID used when adding
}
});Context Image Schema
interface ContextImage {
imageId: string; // Unique identifier for updates/removal
url: string; // Base64 data URL
description: string; // Description shown to the agent
category?: string; // Optional category (e.g., 'game', 'ui', 'reference')
addedAt: string; // ISO timestamp (set automatically)
}Updating Context Images
To update an existing context image (e.g., game board changed), simply add a new image with the same imageId:
// Initial board state
socket.emit('message', {
type: 'add-context-image',
content: {
imageId: 'board-state',
url: 'data:image/png;base64,...', // Initial screenshot
description: 'Current game board'
}
});
// Later: board changed, update the image
socket.emit('message', {
type: 'add-context-image',
content: {
imageId: 'board-state', // Same ID = update
url: 'data:image/png;base64,...', // New screenshot
description: 'Current game board (updated)'
}
});Tool Result Images
Tools that capture screenshots, generate images, or fetch visual content can include images in their results. These images are persisted in the session and made available to the AI in subsequent interactions.
socket.emit('message', {
type: 'huma-0.1-event',
content: {
type: 'tool-result',
toolCallId: 'tc_abc123',
status: 'completed',
success: true,
result: 'Screenshot captured successfully',
images: [
{
url: 'data:image/png;base64,iVBORw0KGgo...',
alt: 'Current game board state' // Optional description
}
]
}
});Use Cases
Capture and return the current screen state so the agent can see what's happening.
Return generated graphics, charts, or visualizations to the agent.
Capture webpage screenshots as part of browsing or research tools.
Return photos or visual data from cameras or visual sensors.
Session Image Limit
The agent sees up to 5 most recent images from the session, regardless of source (events, context, or tool results). Older images are excluded to manage context size.
Complete Example
Here's a complete example showing both inline and context images:
import { io } from 'socket.io-client';
// Connect to HUMA
const socket = io('wss://api.humalike.com', {
query: { agentId: 'your-agent-id' }
});
// Helper to convert file to data URL
async function fileToDataUrl(file) {
return new Promise((resolve, reject) => {
const reader = new FileReader();
reader.onload = () => resolve(reader.result);
reader.onerror = reject;
reader.readAsDataURL(file);
});
}
// Set up persistent context image (e.g., game board)
async function setGameBoard(boardScreenshot) {
const dataUrl = await fileToDataUrl(boardScreenshot);
socket.emit('message', {
type: 'add-context-image',
content: {
imageId: 'game-board',
url: dataUrl,
description: 'Current game board showing all pieces',
category: 'game'
}
});
}
// Send a message with an inline image
async function sendMessageWithImage(text, imageFile) {
const dataUrl = await fileToDataUrl(imageFile);
socket.emit('message', {
type: 'new-message',
content: {
id: `msg-${Date.now()}`,
content: text,
images: [{
url: dataUrl,
alt: 'User uploaded image'
}],
participant: { id: 'user-1', nickname: 'Player' },
createdAt: new Date().toISOString(),
type: 'message'
}
});
}
// Clean up context image when no longer needed
function removeGameBoard() {
socket.emit('message', {
type: 'remove-context-image',
content: { imageId: 'game-board' }
});
}Best Practices
1Keep Images Reasonably Sized
Large images increase latency and token usage. Resize images to 1024px max dimension when possible. Consider quality/size tradeoffs.
2Use Meaningful Descriptions
The description and alt fields help the agent understand image context before processing. Be specific: "Game board after Alice's move" is better than "Screenshot".
3Use Consistent Image IDs
For context images that get updated (like game board state), always use the same imageId. This replaces the old image instead of accumulating duplicates.
4Clean Up Unused Context Images
Remove context images when they're no longer relevant. Stale images waste context space and may confuse the agent.
5Prefer Inline for User Messages
When users share images in conversation, use inline message images. Reserve context images for persistent references that the agent needs across multiple interactions.