← Back to BlogGuide

How to Deploy Your AI Agent in Docker and Kubernetes

H.··5 min read

Last month, a DevOps engineer in our community spent two days debugging why his AI agent kept going unresponsive in production. The container was running. The process was alive. But the agent had silently deadlocked after a bad model response, and nothing noticed.

His monitoring showed green across the board because the container was technically still "up." The agent just wasn't doing anything. Two days of missed Slack messages, ignored emails, and a very confused team wondering why their AI assistant had gone quiet.

This is the container health check problem, and it bites everyone eventually.

The old way: hope and prayer

Before OpenClaw 2026.3.1, deploying an agent in Docker meant writing your own health check script. Most people did something like this:

HEALTHCHECK --interval=30s CMD curl -f http://localhost:3000/ || exit 1

That checks if the web server responds. But the web server responding doesn't mean the agent is working. The gateway could be up while the model connection is broken, the message queue is stuck, or a skill has crashed.

Some people got creative and wrote custom health scripts that checked multiple subsystems. It worked, but it was fragile and different for every setup.

Built-in health endpoints that actually work

OpenClaw 2026.3.1 added native HTTP health check endpoints to the gateway. Four endpoints, covering the standard patterns that Docker and Kubernetes expect:

The liveness endpoints confirm the gateway process is alive and responsive. The readiness endpoints go deeper — they verify the agent can actually process requests, including model connectivity and channel status.

Here's a proper Docker setup:

FROM node:22-slim
RUN npm install -g openclaw
COPY config/ /root/.openclaw/
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
  CMD curl -f http://localhost:3000/healthz || exit 1
CMD ["openclaw", "gateway", "start"]

Now Docker knows the difference between "container is running" and "agent is actually working." If the agent deadlocks, the health check fails, and Docker restarts the container automatically.

Kubernetes: production-grade deployment

Where this really shines is Kubernetes. Most teams running AI agents at scale use Kubernetes for orchestration, and proper health probes are table stakes.

Here's a deployment spec:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: openclaw-agent
spec:
  replicas: 1
  selector:
    matchLabels:
      app: openclaw
  template:
    metadata:
      labels:
        app: openclaw
    spec:
      containers:
      - name: agent
        image: your-registry/openclaw-agent:latest
        ports:
        - containerPort: 3000
        livenessProbe:
          httpGet:
            path: /healthz
            port: 3000
          initialDelaySeconds: 15
          periodSeconds: 30
        readinessProbe:
          httpGet:
            path: /readyz
            port: 3000
          initialDelaySeconds: 10
          periodSeconds: 15
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "2Gi"
            cpu: "1000m"

Kubernetes will wait 15 seconds for the agent to start, then check liveness every 30 seconds. If the liveness probe fails three times, it restarts the pod. The readiness probe ensures traffic only routes to the agent when it's actually ready to handle messages.

No custom scripts. No guessing. Just standard Kubernetes health checks working with endpoints OpenClaw provides out of the box.

Why containers make sense for AI agents

Running an AI agent directly on a server works fine for personal use. But when you need reliability — when your team depends on the agent for daily workflows — containers add three things that matter:

Isolation. Your agent runs in its own environment. A system update or a conflicting package on the host doesn't break anything. The agent has exactly what it needs, nothing more.

Restartability. Agents crash. Models timeout. Memory leaks happen. With containers and proper health checks, the system recovers automatically. The DevOps engineer from the opening of this post? His problem is now a non-event — the container restarts and the agent is back in 30 seconds.

Portability. Your Docker image runs the same way on a Mac Mini in your closet, a cloud VM, or a Kubernetes cluster. Same config, same behavior, different scale.

Docker Compose for the rest of us

Not everyone needs Kubernetes. For a single agent deployment — which covers most personal and small team use cases — Docker Compose hits the sweet spot:

version: '3.8'
services:
  openclaw:
    image: node:22-slim
    command: >
      sh -c "npm install -g openclaw && openclaw gateway start"
    volumes:
      - ./openclaw-config:/root/.openclaw
      - ./workspace:/root/.openclaw/workspace
    ports:
      - "3000:3000"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/healthz"]
      interval: 30s
      timeout: 10s
      retries: 3
    restart: unless-stopped
    environment:
      - NODE_ENV=production

Run docker compose up -d and you have a containerized AI agent with automatic health monitoring and restarts. Your config and workspace persist on the host through volume mounts.

Monitoring beyond health checks

Health checks tell you if the agent is alive. But you probably also want to know what it's doing. A few patterns that work well with containerized OpenClaw deployments:

Log aggregation. OpenClaw writes structured logs to stdout when running in a container. Pipe them to whatever logging stack you use — ELK, Grafana Loki, even CloudWatch.

Uptime monitoring. Point an external uptime checker at your /healthz endpoint. Get alerted on your phone when the agent goes down, not two days later when someone notices it's quiet.

Resource metrics. AI agents can be memory-hungry, especially when processing large documents or running complex skills. Monitor container resource usage and set alerts before you hit limits.

The "it just works" version

If all of this sounds like more infrastructure than you want to manage, that's fair. Not everyone wants to write Dockerfiles and Kubernetes manifests to run an AI assistant.

That's exactly why OpenClaw Setup exists. We handle the deployment — whether that's directly on your hardware or in containers — with proper health monitoring, automatic restarts, and the configuration tuned for your specific use case. The agent runs on your machine, your data stays local, and you don't need to know what a readiness probe is.

If you want a containerized AI agent running by tonight, book a free 15-minute call. We'll figure out the right deployment for your setup and get it running.

Related Reading

Get Your AI Agent Running

We handle the entire setup — deploy, configure, and secure OpenClaw so you don't have to.

  • Fully deployed in 48 hours
  • All channels — Slack, Telegram, WhatsApp
  • Security hardened from day one
  • 14-day hypercare included

One-time setup

$999

Complete setup, no recurring fees