Integration Guide

Working with Images

Send images to your HUMA agent for visual understanding and context.

Overview

HUMA agents can process images using vision-capable AI models. There are three ways to send images:

Inline Message Images

Attach images directly to messages. The agent sees them in the conversation context, just like a human would see images shared in chat.

Context Images

Persistent images stored in agent state. Useful for reference images the agent should always have access to (e.g., game board, UI screenshot).

Tool Result Images

Attach images to tool results. Useful for tools that capture screenshots, generate images, or fetch visual content.

Image Format

HUMA accepts images as base64 data URLs. This is the simplest format - no file hosting required.

Data URL Format

// Format: data:[mediaType];base64,[base64Data]

// Example PNG
"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAAB..."

// Example JPEG
"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD..."

// Example WebP
"data:image/webp;base64,UklGRiQAAABXRUJQVlA4IBgAAAAw..."

Supported Formats

PNG, JPEG, WebP, and GIF are supported. For best results, keep images under 4MB and use reasonable dimensions (under 4096px).

Converting Files to Data URLs

Use the browser's FileReader API to convert uploaded files:

Browser JavaScript

// Convert a File to data URL
function fileToDataUrl(file) {
  return new Promise((resolve, reject) => {
    const reader = new FileReader();
    reader.onload = () => resolve(reader.result);
    reader.onerror = reject;
    reader.readAsDataURL(file);
  });
}

// Usage with file input
const input = document.querySelector('input[type="file"]');
input.addEventListener('change', async (e) => {
  const file = e.target.files[0];
  const dataUrl = await fileToDataUrl(file);
  // dataUrl is ready to send to HUMA
  console.log(dataUrl); // "data:image/png;base64,iVBOR..."
});

React Example

React

function ImageUploader({ onImageReady }) {
  const handleFileChange = async (e) => {
    const file = e.target.files?.[0];
    if (!file) return;

    const reader = new FileReader();
    reader.onload = () => {
      onImageReady(reader.result); // data URL string
    };
    reader.readAsDataURL(file);
  };

  return (
    type="file"
      accept="image/*"
      onChange={handleFileChange}
    />
  );
}

Node.js Example

Node.js

import fs from 'fs';
import path from 'path';

function fileToDataUrl(filePath) {
  const buffer = fs.readFileSync(filePath);
  const base64 = buffer.toString('base64');
  const ext = path.extname(filePath).slice(1);
  const mimeType = ext === 'jpg' ? 'jpeg' : ext;
  return `data:image/${mimeType};base64,${base64}`;
}

const dataUrl = fileToDataUrl('./screenshot.png');

Inline Message Images

Attach images to context-update events via the context.images array. The agent sees these images alongside the message text.

context-update with image

socket.emit('message', {
  type: 'context-update',
  triggering: true,
  name: 'new-message',
  description: 'What do you see in this image?',
  context: {
    images: [
      {
        url: 'data:image/png;base64,iVBORw0KGgo...',
        alt: 'Screenshot of game board'  // Optional description
      }
    ]
  }
});

Image Attachment Schema

TypeScript

interface ImageAttachment {
  url: string;    // Base64 data URL (required)
  alt?: string;   // Description for accessibility/context (optional)
}

// Multiple images are supported
socket.emit('message', {
  type: 'context-update',
  triggering: true,
  name: 'new-message',
  description: 'Compare these two screenshots',
  context: {
    images: [
      { url: 'data:image/png;base64,...', alt: 'Before' },
      { url: 'data:image/png;base64,...', alt: 'After' }
    ]
  }
});

Image Limit

The agent sees the 5 most recent images from conversation history. Older images are excluded to manage context size.

Context Images

Context images are persistent images stored in the agent's state. Use them for reference images the agent should always have access to.

Good Use Cases

Game board state
UI screenshot for reference
Map or diagram
Character appearance

Use Inline Instead

User-shared photos
One-time screenshots
Conversation attachments

Adding Context Images

Send a context-update event with images in the context. Use a descriptive name to identify the update, and set triggering: false if the image shouldn't trigger agent processing on its own:

context-update with context image

socket.emit('message', {
  type: 'context-update',
  triggering: false,
  name: 'add-context-image',
  description: 'Updated game board state',
  context: {
    imageId: 'board-state',
    images: [
      {
        url: 'data:image/png;base64,...',
        alt: 'Current game board state'
      }
    ],
    category: 'game'
  }
});

Context Image Schema

TypeScript

// Images are sent as part of context-update events
interface ImageAttachment {
  url: string;    // Base64 data URL (required)
  alt?: string;   // Description for accessibility/context (optional)
}

// The context object carries the images array
interface ContextUpdateEvent {
  type: 'context-update';
  triggering: boolean;
  name: string;
  description: string;
  context: {
    images?: ImageAttachment[];
    [key: string]: unknown;
  };
}

Updating Context Images

To update an existing context image (e.g., game board changed), simply add a new image with the same imageId:

Update existing image

// Initial board state
socket.emit('message', {
  type: 'context-update',
  triggering: false,
  name: 'add-context-image',
  description: 'Current game board',
  context: {
    imageId: 'board-state',
    images: [{
      url: 'data:image/png;base64,...',
      alt: 'Current game board'
    }]
  }
});

// Later: board changed, send a new context-update
socket.emit('message', {
  type: 'context-update',
  triggering: false,
  name: 'add-context-image',
  description: 'Game board after move',
  context: {
    imageId: 'board-state',
    images: [{
      url: 'data:image/png;base64,...',
      alt: 'Game board after latest move'
    }]
  }
});

Tool Result Images

Tools that capture screenshots, generate images, or fetch visual content can return results via tool-result events. The result text is included in the conversation context for the agent to process.

tool-result from a screenshot tool

socket.emit('message', {
  type: 'tool-result',
  toolCallId: 'tc_abc123',
  toolName: 'take_screenshot',
  outcome: 'success',
  triggering: true,
  result: 'Screenshot captured successfully'
});

Use Cases

Screenshot Tools

Capture and return the current screen state so the agent can see what's happening.

Image Generation

Return generated graphics, charts, or visualizations to the agent.

Web Scraping

Capture webpage screenshots as part of browsing or research tools.

Camera/Sensors

Return photos or visual data from cameras or visual sensors.

Session Image Limit

The agent sees up to 5 most recent images from the session, regardless of source (events, context, or tool results). Older images are excluded to manage context size.

Complete Example

Here's a complete example showing both inline and context images:

Full Integration

import { io } from 'socket.io-client';

// Connect to HUMA
const socket = io('wss://api.humalike.com', {
  query: { agentId: 'your-agent-id' }
});

// Helper to convert file to data URL
async function fileToDataUrl(file) {
  return new Promise((resolve, reject) => {
    const reader = new FileReader();
    reader.onload = () => resolve(reader.result);
    reader.onerror = reject;
    reader.readAsDataURL(file);
  });
}

// Set up persistent context image (e.g., game board)
async function setGameBoard(boardScreenshot) {
  const dataUrl = await fileToDataUrl(boardScreenshot);

  socket.emit('message', {
    type: 'context-update',
    triggering: false,
    name: 'add-context-image',
    description: 'Current game board showing all pieces',
    context: {
      imageId: 'game-board',
      images: [{ url: dataUrl, alt: 'Game board' }],
      category: 'game'
    }
  });
}

// Send a message with an inline image
async function sendMessageWithImage(text, imageFile) {
  const dataUrl = await fileToDataUrl(imageFile);

  socket.emit('message', {
    type: 'context-update',
    triggering: true,
    name: 'new-message',
    description: text,
    context: {
      images: [{
        url: dataUrl,
        alt: 'User uploaded image'
      }]
    }
  });
}

Best Practices

1Keep Images Reasonably Sized

Large images increase latency and token usage. Resize images to 1024px max dimension when possible. Consider quality/size tradeoffs.

2Use Meaningful Descriptions

The description and alt fields help the agent understand image context before processing. Be specific: "Game board after Alice's move" is better than "Screenshot".

3Use Consistent Image IDs

For context images that get updated (like game board state), always use the same imageId. This replaces the old image instead of accumulating duplicates.

4Clean Up Unused Context Images

Remove context images when they're no longer relevant. Stale images waste context space and may confuse the agent.

5Prefer Inline for User Messages

When users share images in conversation, use inline message images. Reserve context images for persistent references that the agent needs across multiple interactions.

Voice Lifecycle API Reference

Working with Images

Overview

Inline Message Images

Context Images

Tool Result Images

Image Format

Converting Files to Data URLs

React Example

Node.js Example

Inline Message Images

Image Attachment Schema

Context Images

Adding Context Images

Context Image Schema

Updating Context Images

Tool Result Images

Use Cases

Complete Example

Best Practices

1Keep Images Reasonably Sized

2Use Meaningful Descriptions

3Use Consistent Image IDs

4Clean Up Unused Context Images

5Prefer Inline for User Messages

On this page