Run an agent on Windows

Windows uses Docker Desktop (WSL2 backend) for the agent container and Ollama for Windows for the model — Ollama runs natively and talks to your GPU directly (NVIDIA is best supported; AMD and CPU also work — see below), so there's no container-GPU setup to do.

Before you start

Minimum requirements — check these first, the rest of the guide assumes them:

Windows 10 64-bit (version 22H2) or Windows 11 — both Docker Desktop and Ollama need this. Check yours: press Win+R, type winver, Enter.
A GPU for real speed. This guide documents NVIDIA end to end (GeForce GTX 900 series or newer, driver 452.39+) — that's the best-supported path, and step 1 checks it in one command. Recent AMD Radeon cards also work, and with no supported GPU it still runs on the CPU (just slower). See AMD, Intel and CPU-only for exactly what changes on each.
~20 GB free disk space — models are large (a 7B model is ~5 GB; bigger ones are tens of GB).

Commands in this guide run in a normal PowerShell window: press Start, type powershell, Enter. You don't need to run it as administrator.

Prerequisites

1. GPU driver

On an AMD Radeon? Skip to step 2 — you just need a current Adrenalin driver, no nvidia-smi check. See AMD, Intel and CPU-only for the details. The rest of this step is for NVIDIA.

If your PC has an NVIDIA card you almost certainly have a driver already — the question is whether it's new enough. You need 452.39 or newer (released 2020); you do not need the very latest. Check your version with this command — it prints just the number:

nvidia-smi --query-gpu=driver_version --format=csv,noheader

You'll see a single version like 566.36. If it's 452.39 or higher, you're done — go to step 2.

Got a "not recognized" error? There's no NVIDIA driver installed (or it's ancient). Download one from nvidia.com/drivers — pick your card from the dropdowns; "Game Ready" and "Studio" both work fine.
Want to update anyway? The easiest way is the NVIDIA App, which checks for and installs updates for you.

(If you ran plain nvidia-smi and got a screenful of numbers — that's the full GPU status table. The only part that matters here is Driver Version: in its top line; the command above prints just that.)

No CUDA toolkit needed — Ollama bundles what it uses.

2. Ollama for Windows

Download and install the latest from ollama.com/download/windows (any recent version works). It installs a background service on http://localhost:11434 and uses your GPU automatically. Confirm it's installed:

ollama --version

You should see a version like ollama version is 0.6.x. That's all you need here — you don't need to download a model yet, and you don't have to pick one by hand. Run the benchmark to get one recommended for your GPU, or see choosing a model for the sizing guide; the agent then pulls whatever model you chose the first time it runs.

Why isn't the model baked into the container? Models are large (several to tens of GB) and the agent image is deliberately small. Ollama holds the model on the host and serves it over HTTP, so every agent you run shares one copy loaded into VRAM once — rather than each container shipping its own. The download happens automatically on first use (the first task waits for it); you can also pull ahead of time with ollama pull <model> once you know which one you want.

3. Docker Desktop

Install the latest from docker.com/products/docker-desktop, keeping the default WSL2 backend option. Start Docker Desktop and wait until it says the engine is running, then verify:

docker --version

You should see something like Docker version 27.x — any current version is fine.

Create your agent

You create the agent in the portal first — that's where the connection token comes from. You never invent or copy a token by hand; the portal generates the whole command for you.

Open app.keikaku.ai → Agents → New agent.
Give it a name, choose Self-hosted and a model (or paste a benchmark code), then click Create.
The next screen shows a ready-to-run docker run command with your token and model already baked in, and a Copy button — copy it.
Paste it into your PowerShell window and run it. That's the whole connection — the agent dials home on its own; nothing else to wire up.

For reference, the command looks like this (the portal fills in the real AGENT_TOKEN — you don't edit it yourself). Note there's no model in the command: the agent fetches the model you chose from the portal when it connects, and downloads it then — you'll watch that progress back in the portal's Agents list.

docker run -d --pull=always --name keikaku-agent --label com.docker.compose.project=keikaku --restart unless-stopped `
  -e API_BASE_URL=https://api.keikaku.ai `
  -e AGENT_TOKEN=<from the app> `
  -e OLLAMA_URL=http://host.docker.internal:11434 `
  -p 9170:9170 `
  ghcr.io/keikaku-ai/agent:latest

What these values are — the portal sets all of them; here's what they mean:

AGENT_TOKEN — the per-agent key from the create screen. It identifies this agent to Keikaku; keep it secret (you can rotate it later).
API_BASE_URL — the Keikaku server it connects to (https://api.keikaku.ai for Cloud).
OLLAMA_URL — where the agent finds the model runtime. That's the Ollama you installed in step 2, running on your PC at localhost:11434; from inside the container your PC is host.docker.internal, so this stays http://host.docker.internal:11434 (Docker Desktop maps it automatically). Verify Ollama is up by opening http://localhost:11434 on your PC — it should say "Ollama is running".
Model — not in the command. You pick it when you create the agent in the portal; the agent receives it on connect and pulls it into Ollama, reporting download progress to the portal. (Setting -e MODEL=… still works as a manual override if you ever want one.)

You don't pass any GPU flag to the agent — Ollama owns the GPU; the agent just talks to it over OLLAMA_URL. The backtick (`) ending each line is PowerShell's line-continuation; paste the whole block at once.

Verify it connected

Two ways: the agent appears as online in the app under Agents, and the local dashboard is at http://localhost:9170 (localhost-only). Logs:

docker logs -f keikaku-agent

Not on NVIDIA? AMD, Intel and CPU-only

This guide is written and tested against NVIDIA — that's the path we support best. Here's how the picture changes on other hardware:

AMD Radeon — works on recent cards. Ollama supports roughly the RX 6000 / 7000 series and several Radeon PRO cards on Windows, using your normal AMD Adrenalin driver — no extra toolkit. Only step 1 changes: update the AMD driver instead, and skip the nvidia-smi check. To confirm the GPU is actually being used, run a model (step 2), then in a second PowerShell window run ollama ps — the PROCESSOR column should say 100% GPU. The exact supported-card list is in Ollama's GPU docs; unsupported Radeon cards silently fall back to CPU.
Intel (Arc or integrated graphics) — not supported. Ollama has no Intel GPU backend today, so models run on the CPU instead.
No dedicated GPU — works, slowly. Everything in this guide still functions on CPU with a small model (7B class). Fine for trying Keikaku out; expect responses to be many times slower than on a GPU.

Run multiple agents

They all share the one Ollama (one model in VRAM). Use the Compose bundle from the app and scale it:

docker compose up -d --scale agent=3

Mind your VRAM — more agents means more concurrent requests queued against the same model.

Update / stop

docker pull ghcr.io/keikaku-ai/agent:latest   # get the latest image
docker restart keikaku-agent                  # restart on it
docker stop keikaku-agent                     # stop
docker rm -f keikaku-agent                    # remove

What the agent does: it executes work generated by your models — writing files and running build/test commands inside its own container and workspace. It only makes outbound HTTPS calls to your Keikaku cloud (no inbound ports).