Streaming
Stream the browser viewport via WebSocket for live preview or "pair browsing" where a human can watch and interact alongside an AI agent.
Enable streaming
Set the AGENT_BROWSER_STREAM_PORT environment variable to start a WebSocket server:
AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.comThe server streams viewport frames and accepts input events (mouse, keyboard, touch).
WebSocket protocol
Connect to ws://localhost:9223 to receive frames and send input.
Frame messages
The server sends frame messages with base64-encoded images:
{
"type": "frame",
"data": "<base64-encoded-jpeg>",
"metadata": {
"deviceWidth": 1280,
"deviceHeight": 720,
"pageScaleFactor": 1,
"offsetTop": 0,
"scrollOffsetX": 0,
"scrollOffsetY": 0
}
}Status messages
Connection and screencast status:
{
"type": "status",
"connected": true,
"screencasting": true,
"viewportWidth": 1280,
"viewportHeight": 720
}Input injection
Send input events to control the browser remotely.
Mouse events
// Click
{
"type": "input_mouse",
"eventType": "mousePressed",
"x": 100,
"y": 200,
"button": "left",
"clickCount": 1
}
// Release
{
"type": "input_mouse",
"eventType": "mouseReleased",
"x": 100,
"y": 200,
"button": "left"
}
// Move
{
"type": "input_mouse",
"eventType": "mouseMoved",
"x": 150,
"y": 250
}
// Scroll
{
"type": "input_mouse",
"eventType": "mouseWheel",
"x": 100,
"y": 200,
"deltaX": 0,
"deltaY": 100
}Keyboard events
// Key down
{
"type": "input_keyboard",
"eventType": "keyDown",
"key": "Enter",
"code": "Enter"
}
// Key up
{
"type": "input_keyboard",
"eventType": "keyUp",
"key": "Enter",
"code": "Enter"
}
// Type character
{
"type": "input_keyboard",
"eventType": "char",
"text": "a"
}
// With modifiers (1=Alt, 2=Ctrl, 4=Meta, 8=Shift)
{
"type": "input_keyboard",
"eventType": "keyDown",
"key": "c",
"code": "KeyC",
"modifiers": 2
}Touch events
// Touch start
{
"type": "input_touch",
"eventType": "touchStart",
"touchPoints": [{ "x": 100, "y": 200 }]
}
// Touch move
{
"type": "input_touch",
"eventType": "touchMove",
"touchPoints": [{ "x": 150, "y": 250 }]
}
// Touch end
{
"type": "input_touch",
"eventType": "touchEnd",
"touchPoints": []
}
// Multi-touch (pinch zoom)
{
"type": "input_touch",
"eventType": "touchStart",
"touchPoints": [
{ "x": 100, "y": 200, "id": 0 },
{ "x": 200, "y": 200, "id": 1 }
]
}Programmatic API
For advanced use, control streaming directly via the TypeScript API:
import { BrowserManager } from 'agent-browser';
const browser = new BrowserManager();
await browser.launch({ headless: true });
await browser.navigate('https://example.com');
// Start screencast with callback
await browser.startScreencast((frame) => {
console.log('Frame:', frame.metadata.deviceWidth, 'x', frame.metadata.deviceHeight);
// frame.data is base64-encoded image
}, {
format: 'jpeg', // or 'png'
quality: 80, // 0-100, jpeg only
maxWidth: 1280,
maxHeight: 720,
everyNthFrame: 1
});
// Inject mouse event
await browser.injectMouseEvent({
type: 'mousePressed',
x: 100,
y: 200,
button: 'left',
clickCount: 1
});
// Inject keyboard event
await browser.injectKeyboardEvent({
type: 'keyDown',
key: 'Enter',
code: 'Enter'
});
// Inject touch event
await browser.injectTouchEvent({
type: 'touchStart',
touchPoints: [{ x: 100, y: 200 }]
});
// Check if screencasting
console.log('Active:', browser.isScreencasting());
// Stop screencast
await browser.stopScreencast();Use cases
- Pair browsing - Human watches and assists AI agent in real-time
- Remote preview - View browser output in a separate UI
- Recording - Capture frames for video generation
- Mobile testing - Inject touch events for mobile emulation
- Accessibility testing - Manual interaction during automated tests