We Built a "Brain" That Controls Dozens of AI Agents From a Single Screen. Here's How.

Why single-gateway orchestration is yesterday. And why multi-gateway orchestration is how you should be running your AI company right now.

Most people using OpenClaw are still stuck in the old mental model: one gateway, a few agents, done.

They set up 3-4 agents in a single process, connect to Discord or Telegram, and say "I've got an AI team." And honestly — that's already decent. OpenClaw is genuinely powerful at that level.

But there's a layer above it that almost nobody talks about.

Multi-gateway orchestration.

Not just multi-agent in one gateway. Not just sub-agent spawning. But multiple independent gateways — different servers, different regions, different agent teams — all controlled from one dashboard. One screen. One brain.

We're building this. And once you see the architecture, you'll understand why this is a different level entirely.

This is the part OpenClaw docs don't cover. Not because they're hiding it — but because it's not their responsibility. It's yours. And almost nobody is doing it.

OpenClaw Has 3 Patterns. But Nobody's Talking About Pattern #4.

Before we get into what we're building, you need to understand the foundation. OpenClaw natively supports 3 multi-agent patterns:

1. Role-Based — "The Startup Team"

All agents persistent. They have names, personalities, their own memory. They talk via sessions_send. Like a small startup team that knows each other.

2. Orchestrator + Workers — "PM + Freelancers"

One smart brain (Orchestrator) that spawns workers on-demand via sessions_spawn. Workers are ephemeral — they live only for the duration of a task, report back, and die. Like a project manager who hires freelancers per project.

3. Hybrid — "Team + Freelancers"

Both combined. A persistent team that can spawn workers. This is the most recommended for production.

But all three patterns share one fundamental limitation: they all live inside one gateway process.

One gateway. One host. One single point of failure.

And the majority of OpenClaw users — even the advanced ones — stop here. They optimize inside the gateway. They never think about what exists outside it.

Now Imagine This.

You have an AI company. A real company — with departments, hierarchy, scheduled meetings, C-level coordination.

You have:

Gateway 1 in US East — CEO and CTO agents, handling strategic decisions and tech architecture
Gateway 2 in Singapore — CMO and Content team agents, handling marketing 24/7 for the Asian timezone
Gateway 3 on-premise — CFO and Finance agents, because financial data cannot leave your own server

Each gateway has its own agent team. Each agent has its own workspace, memory, personality, heartbeat schedule. They're autonomous.

But you need to see everything from one place.

How many total tasks are running across all gateways? Which agent is overloaded? Which gateway is responding slowly? Is anything down?

OpenClaw itself doesn't have an answer for this. It's not designed for cross-gateway communication. Each gateway is isolated by design.

And this isn't a bug — it's an architectural boundary. OpenClaw says: "I handle what's inside the gateway. What's outside? That's on you."

The problem? Almost nobody takes that responsibility.

Unless you build the layer above it.

Introducing: Multi-Gateway Orchestration Layer

This is what we call Orchestration. Not agent-to-agent orchestration (that already exists in OpenClaw). This is gateway-to-gateway orchestration — a dashboard that serves as a single pane of glass for your entire AI company.

┌─────────────────────────────────────────────────────────┐
│           HIVIN DASHBOARD — The Orchestrator             │
│           "Traffic Controller Across Gateways"           │
│                                                          │
│   GatewayManager                                         │
│   ├── ws://us-east.hivin.ai:18789    (CEO, CTO)         │
│   │   ├── Agent: CEO    [healthy]  tasks: 3 running      │
│   │   └── Agent: CTO    [healthy]  tasks: 7 running      │
│   │                                                      │
│   ├── ws://singapore.hivin.ai:18789  (CMO, Content)     │
│   │   ├── Agent: CMO    [warning]  tasks: 12 running     │
│   │   └── Agent: Writer [idle]     tasks: 0              │
│   │                                                      │
│   └── ws://onprem.internal:18789     (CFO, Finance)     │
│       ├── Agent: CFO    [healthy]  tasks: 2 running      │
│       └── Agent: Analyst [healthy] tasks: 5 running      │
│                                                          │
│   Unified View:                                          │
│   Total agents: 6  │  Active tasks: 29  │  Gateways: 3  │
└─────────────────────────────────────────────────────────┘

One screen. All gateways. All agents. Real-time.

Architecture: How It Works

WebSocket Per Gateway, Unified Event Stream

The core of this orchestration layer is GatewayManager — a class that manages multiple persistent WebSocket connections to every OpenClaw gateway.

Each gateway is an independent OpenClaw process running on its own. It has agents, sessions, tasks, heartbeat — all self-contained. The dashboard doesn't change how the gateway operates. It only observes and aggregates.

class GatewayManager {
  private connections = new Map<string, GatewayConnection>();
  private states = new Map<string, ConnectionState>();

  addGateway(config: GatewayConfig): void { ... }
  removeGateway(id: string): void { ... }
  send(gatewayId: string, method: string, params?: unknown): Promise<unknown>;
  getAllStates(): Map<string, ConnectionState>;
}

Each GatewayConnection is a full implementation of OpenClaw's Gateway Protocol v3 — including device authentication, auto-reconnect with exponential backoff, and proper frame handling (req/res/event).

Not HTTP polling. Persistent WebSocket connections. Every event, every state change, real-time.

Device Authentication: No Casual Connections

Every gateway connection uses device auth — identical to the official OpenClaw Control UI. The dashboard generates a keypair (Ed25519), signs a nonce, and authenticates per gateway. Each gateway can have its own auth policy.

interface GatewayConfig {
  id: string;
  name: string;
  url: string; // ws://host:port
  token?: string; // explicit gateway token
  password?: string;
  role?: string;
  scopes?: string[];
}

Gateway configs are persisted to PostgreSQL — so when you restart the dashboard, all connections auto-reconnect.

Data Flow: From Raw Gateway to Unified Dashboard

Once connected, the dashboard fetches data from each gateway via RPC:

gateway.health   → agent list, heartbeat status, session counts
gateway.status   → runtime version, tasks summary, session details, audit
tasks.list       → background tasks (sub-agents, ACP, cron, CLI)

Data is polled every 30 seconds + immediate fetch when a connection is established. Raw responses from OpenClaw are mapped to unified dashboard types:

RawHealthResponse  →  AgentInfo[]     (per-agent health, model, load)
RawStatusResponse  →  DashboardStats  (aggregate metrics)
                   →  GatewayInfo     (per-gateway status)
RawTasksResponse   →  TaskEntry[]     (unified task ledger)

From the user's perspective, you don't think "which gateway is this data from." You see one unified view: total agents, total tasks, overall health. But you can drill down to per-gateway when needed.

Why This Is a Different Level

Level 1: Single Agent (Most people stop here)

One gateway, one agent, connected to Telegram. The agent can answer questions, run tools. Already useful, but this is just a smart chatbot.

Level 2: Multi-Agent, Single Gateway (OpenClaw's sweet spot)

Multiple agents in one gateway. Role-Based (persistent team) or Orchestrator+Workers (spawn on demand). Agents can communicate via sessions_send, spawn sub-agents via sessions_spawn. This is already powerful.

At this level you can:

Spawn workers in parallel — 10 coding agents at once
Run heartbeat automation — agents wake up every 4 hours, review tasks, spawn workers
Set up overnight pipelines — start working at 11 PM, brief the boss at 7 AM
Nested orchestration — main → sub-orchestrator → workers (depth 2)

But all of this is still one process, one host.

Level 3: Multi-Gateway Orchestration (Where we are)

Multiple independent gateways, each with its own agent team, all orchestrated from one dashboard. This is not an OpenClaw feature — it's a layer we built on top of it.

The analogy: OpenClaw is the operating system per server. Our dashboard is the mission control that monitors all servers.

If you're still at Level 2, you're playing the same game as everyone else. At Level 3, you're playing a game that doesn't have a rule book yet — because you're the one writing it.

Aspek	Level 2 (Single Gateway)	Level 3 (Multi-Gateway)
Gateway count	1	N (unlimited)
Agent visibility	Within 1 gateway	Across all gateways
Task tracking	Per gateway	Unified ledger
Failure domain	Single point of failure	Isolated per gateway
Geo distribution	Single location	Multi-region
Security boundary	Shared process	Per-gateway isolation
Monitoring	`openclaw status` CLI	Real-time dashboard
Scale model	Vertical (bigger server)	Horizontal (more gateways)

What OpenClaw Gives You vs. What We Built On Top

Important to understand: we don't replace OpenClaw. We extend it. Every native OpenClaw feature still runs 100% inside each gateway.

OpenClaw Native (Per-Gateway)

All of this still exists and is still used in every gateway:

Sub-agent spawning — sessions_spawn for parallel workers:

sessions_spawn({
  task: "Implement user registration endpoint",
  model: "anthropic/claude-sonnet-4-6"
})

Inter-agent communication — sessions_send for persistent teams:

sessions_send(agentId: "bob", message: "Fix auth bug. Priority HIGH.")

ACP workers — spawn external coding agents (Codex, Claude Code, Cursor, Gemini CLI, and 10+ other harnesses):

sessions_spawn({
  task: "Create PR for rate limiting",
  runtime: "acp",
  agentId: "codex"
})

Background tasks — tracked lifecycle: queued → running → succeeded/failed/timed_out/cancelled/lost

Heartbeat & Cron — autonomous scheduling. Agents wake up on their own, review the task queue, spawn workers, report results.

Nested orchestration — maxSpawnDepth: 2 enables Main → Sub-orchestrator → Leaf workers. Announce chain flows bottom-up.

Memory — per-agent MEMORY.md, daily notes, cross-agent memory search.

Session tools — sessions_list, sessions_history, sessions_spawn, sessions_send — all scoped per visibility level (self, tree, agent, all).

Task Flow — durable multi-step pipeline orchestration on top of background tasks.

All of this still runs. Per gateway. Independent.

What We Built: The Orchestration Layer

What doesn't exist in OpenClaw — and what we added:

Multi-gateway connection management — GatewayManager that holds persistent WebSocket connections to every gateway. Add, remove, reconnect. Auto-recovery with exponential backoff.

Unified agent view — all agents from all gateways in a single table. Sort by health, filter by department, search by name. You don't need to SSH into every server to check who's healthy.

Aggregate metrics — total tasks across gateways, active agents count, average response time, failure rates. One number that represents your entire AI company's health.

Real-time event stream — every gateway emits events (agent starts, tasks complete, errors). The dashboard merges everything into a unified event log.

Gateway health monitoring — connection state per gateway: disconnected → connecting → authenticating → connected → reconnecting. Visual indicator on the dashboard. If a gateway goes down, you know instantly — no waiting for someone to complain.

Persistent config — gateway configs in PostgreSQL. Dashboard restarts, all connections auto-reconnect. Add a new gateway, it's live immediately.

You won't find any of this at docs.openclaw.ai. Not because it's a secret — but because this is the layer they leave for the community to build. And right now, you're reading the blueprint.

Concrete Setup: How Hivin Uses This

Hivin isn't a concept — it's a running system. An AI company with corporate structure.

Gateway 1: Executive Suite

// Server: us-east, openclaw.json
{
  "agents": {
    "list": [
      {
        "id": "ceo",
        "name": "CEO",
        "default": true,
        "model": "anthropic/claude-opus-4-6",
        "workspace": "~/.openclaw/workspace-ceo",
      },
      {
        "id": "cto",
        "name": "CTO",
        "model": "anthropic/claude-opus-4-6",
        "workspace": "~/.openclaw/workspace-cto",
      },
    ],
  },
  "tools": {
    "agentToAgent": { "enabled": true, "allow": ["ceo", "cto"] },
    "subAgents": { "enabled": true, "maxConcurrent": 10 },
  },
  "heartbeat": {
    "agents": {
      "ceo": {
        "intervalHours": 6,
        "message": "Review company KPIs and strategic pipeline.",
      },
      "cto": {
        "intervalHours": 4,
        "message": "Check tech stack health, review PRs, audit tasks.",
      },
    },
  },
}

CEO and CTO on one gateway — they frequently communicate via sessions_send. CTO can spawn coding sub-agents via sessions_spawn. CEO reviews results and sets strategic direction.

Gateway 2: Marketing Division

// Server: singapore, openclaw.json
{
  "agents": {
    "list": [
      {
        "id": "cmo",
        "name": "CMO",
        "model": "google/gemini-2.5-pro",
        "workspace": "~/.openclaw/workspace-cmo",
      },
      {
        "id": "writer",
        "name": "Content Writer",
        "model": "anthropic/claude-sonnet-4-6",
        "workspace": "~/.openclaw/workspace-writer",
      },
    ],
  },
  "tools": {
    "agentToAgent": { "enabled": true, "allow": ["cmo", "writer"] },
    "subAgents": { "enabled": true, "maxConcurrent": 8 },
  },
  "cron": {
    "jobs": [
      {
        "name": "content-pipeline",
        "schedule": "0 9 * * 1,3,5",
        "agentId": "cmo",
        "message": "Content day. Spawn writers for: blog post, Twitter thread, LinkedIn post. Review all before publish.",
      },
    ],
  },
}

CMO in Singapore — different timezone, so the content pipeline runs while the US sleeps. Writer agent as a persistent specialist, CMO can spawn additional ephemeral writers when it needs to scale.

Gateway 3: Finance (On-premise)

// Server: on-premise internal, openclaw.json
{
  "agents": {
    "list": [
      {
        "id": "cfo",
        "name": "CFO",
        "model": "anthropic/claude-opus-4-6",
        "workspace": "~/.openclaw/workspace-cfo",
      },
      {
        "id": "analyst",
        "name": "Financial Analyst",
        "model": "anthropic/claude-sonnet-4-6",
        "workspace": "~/.openclaw/workspace-analyst",
      },
    ],
  },
  "tools": {
    "agentToAgent": { "enabled": true, "allow": ["cfo", "analyst"] },
    "subAgents": { "enabled": true, "maxConcurrent": 4 },
  },
}

Financial data stays on-prem. Nothing leaves. But the dashboard can still monitor health and task status via WebSocket — because what's being sent is metadata, not actual financial data.

Dashboard: The Orchestrator

// Dashboard connects to all three
manager.addGateway({
  id: "exec",
  name: "Executive Suite",
  url: "ws://us-east:18789",
});
manager.addGateway({
  id: "marketing",
  name: "Marketing Div",
  url: "ws://singapore:18789",
});
manager.addGateway({
  id: "finance",
  name: "Finance",
  url: "ws://internal:18789",
});

One screen. 6 agents. 3 gateways. 3 regions. Real-time.

Overnight Automation — But Across Gateways

This is the part that makes people say "wait, this is actually running?"

The pattern the OpenClaw community already uses frequently at the single-gateway level:

11 PM: Orchestrator spawn workers
Every 2 hours: Heartbeat check progress
7 AM: Compile overnight results, brief the boss

Now scale that to multi-gateway:

11 PM US  → Gateway 1: CTO spawns coding workers, overnight code review
9 AM SGP  → Gateway 2: CMO runs content pipeline, 3 writers parallel
6 AM US   → Gateway 3: CFO analyst runs financial reconciliation
7 AM US   → Dashboard: ALL overnight results from every gateway,
             compiled into one morning brief

Each gateway is autonomous. The dashboard just observes and aggregates. But you can see everything from one place.

While others are opening 3 terminals, SSHing into 3 servers, running openclaw status one by one — you open one tab. That's the difference.

Why Not Just Use 1 Big Gateway?

"Why not put all 6 agents in one gateway?"

Fair question. Here's the answer:

1. Failure isolation. Finance gateway crashes? Marketing and Executive are still running. In a single gateway, one crash = everything dies.

2. Security boundaries. Financial data on-prem, marketing in the cloud. Different compliance requirements, different network policies. Can't mix them.

3. Geo-distribution. Content team in Asia needs low latency to Asian social media APIs. Executive in the US needs low latency to GitHub and internal tools. Can't serve both from a single location.

4. Independent scaling. Marketing needs 8 concurrent sub-agents on content day. Finance only needs 4. In a single gateway, maxConcurrent is global — all agents compete for the same pool.

5. Team autonomy. Each gateway has its own config. CMO can tweak the cron schedule without affecting CTO's heartbeat config. No coordination needed.

6. Operational flexibility. Upgrade gateways one by one. Rolling deploys. No maintenance window that takes all agents offline simultaneously.

Single gateway is your MVP. Multi-gateway is production mindset. And the only difference is whether you think of your AI agents as a side project or as infrastructure.

OpenClaw Agent Deep Dive: What Runs Inside Each Gateway

To appreciate why multi-gateway orchestration is powerful, you need to understand what runs inside each gateway first.

Sub-Agent Spawning (`sessions_spawn`)

The tool that lets agents delegate work to workers:

sessions_spawn({
  task: string,                  // instruction (REQUIRED)
  model?: string,                // model override
  agentId?: string,              // target agent
  runtime?: "subagent" | "acp", // native or external harness
  thread?: boolean,              // bind to channel thread
  mode?: "run" | "session",     // one-shot or persistent
  sandbox?: "inherit" | "require",
  runTimeoutSeconds?: number,
  thinking?: string
})

Always non-blocking. Returns { status: "accepted", runId, childSessionKey } immediately.

Workers run in isolated sessions (agent:<agentId>:subagent:<uuid>), get all tools except session tools, and announce results back when finished. Status, runtime stats, token usage, estimated cost — all included in the announce payload.

Spawning Patterns

5 ways agents manage workers — and all of these can run in each gateway independently:

Parallel — independent sub-tasks, spawn all at once:

├── Worker 1: Build registration endpoint
├── Worker 2: Build login endpoint
├── Worker 3: Build profile CRUD
└── All running simultaneously

Sequential — output A feeds input B:

Research → Outline → Draft → Review (each spawns next)

Fan-out / Fan-in — parallel research, then compile:

3 researchers parallel → 1 compiler merges → final report

Spawn + Self-Review — worker executes, agent QAs before delivering.

Pipeline — code → tests → docs, chained via worker outputs.

ACP: External Coding Agents as Workers

The most underrated feature. Agents can spawn external coding harnesses — not just OpenClaw native sub-agents:

Codex, Claude Code, Cursor, Gemini CLI, Copilot, Pi, OpenCode, Kiro, and a dozen others. Set runtime: "acp" and agentId to the target harness.

The CTO agent on Gateway 1 can spawn Codex to create a PR, Claude Code for deep debugging, Cursor for IDE-integrated refactoring — all at once, in parallel.

Nested Orchestration

maxSpawnDepth: 2 enables a three-level hierarchy:

Main Agent (depth 0) → Sub-orchestrator (depth 1) → Leaf workers (depth 2)

Depth-1 gets sessions_spawn + management tools. Depth-2 is a pure executor. Results flow bottom-up via the announce chain. /stop cascades down — no orphaned workers.

Heartbeat & Cron — Autonomous Operations

Agents don't need a human trigger:

Heartbeat: The agent wakes up periodically, reviews the task queue, spawns workers if needed, checks running workers, updates memory. Heartbeat turns don't create task records — but if a heartbeat triggers sessions_spawn, that creates a task record.

Cron: Scheduled pipelines. Morning research, midday code sweep, weekly reports, nightly maintenance. Every cron execution creates a task record.

Overnight pattern: 11PM spawn workers, heartbeat every 2 hours to check progress, 7AM compile brief. Agents work while you sleep.

Background Tasks & Task Flow

Every sessions_spawn and cron execution automatically creates a task record:

Lifecycle: queued → running → succeeded/failed/timed_out/cancelled/lost
Task audit: stale_queued, stale_running, lost, delivery_failed
Auto-maintenance: sweeper every 60 seconds, reconciliation, pruning after 7 days

Task Flow sits above tasks — durable multi-step pipelines with managed mode (Task Flow owns the lifecycle) or mirrored mode (observes external tasks).

All of this? Just the foundation. All of this runs inside the gateway. Powerful? Absolutely. But imagine having 5 gateways each running all of this — and you can see everything from one screen.

That's not an incremental improvement. That's a paradigm shift.

This Is Just the Beginning

What we're building right now is the foundation. Multi-gateway connection management, unified agent view, aggregate metrics, real-time events.

But imagine what can be built on top of it:

Cross-gateway task delegation — CEO on Gateway 1 assigns a task that auto-routes to CMO on Gateway 2
Unified meeting system — agents from different gateways can "meet" via coordinated scheduling
Company-wide analytics — cost tracking, productivity metrics, department comparisons — across all gateways
Automated failover — gateway goes down? Dashboard auto-migrates agents to another gateway
Public agent interfaces — visitors chat with agents from the dashboard, tasks route to the right gateway

Each gateway is an autonomous AI team. The dashboard is mission control that sees everything, coordinates across boundaries, and gives you — the human — one place to run your entire AI company.

Tl;dr

OpenClaw gives you powerful multi-agent orchestration within a single gateway. Sub-agents, ACP workers, heartbeat, cron, nested spawning, Task Flow — all production-ready.

But if you want to run a real AI company — with departments across regions, security boundaries, independent scaling, and failure isolation — you need the layer above it.

Multi-gateway orchestration. Multiple independent OpenClaw gateways, each autonomous, but all visible and controllable from one dashboard.

One screen. All gateways. All agents. Real-time.

That's what we're building.

And if you've read this far but are still running all your agents on a single gateway — you already know what you need to do next.

The question is: do you want to stay a user, or become the one who builds the next layer?

Based on OpenClaw official documentation (docs.openclaw.ai), verified source code, community patterns, and the Hivin architecture currently in production. April 2026.