← Docs

Run a self-hosted agent

Overview

A Keikaku agent is a small worker you run on your own machine. It connects out to your Keikaku cloud, claims tasks, and runs them against a local model on your GPU — your code and models never leave your hardware.

Architecture

Three pieces: your server (your own machine, with the GPU), the Docker agents you run on it, and Keikaku Cloud. Ollama runs natively on your server and is shared by every agent — the model loads into your GPU once. The agents make outbound HTTPS calls to Keikaku Cloud to claim tasks and report results; nothing reaches in, and your code and models never leave your server.

Most setups run a single agent against one large model — that's the sweet spot. Add more agents only if your GPU has headroom for the extra concurrent load; they all share the one model in VRAM.

Prerequisites

Three things on the machine that will run the agent:

macOS isn't supported yet.

Model sizing

Pick the largest model that still fits entirely in your GPU's VRAM — once Ollama has to offload part of a model to the CPU, throughput drops sharply. Two ways to choose:

Download options

Pick your platform, then size your model. Each guide covers install, run, and verify.

Setup steps

  1. Install the prerequisites — Docker + Ollama + GPU drivers (see your OS guide above).
  2. Pick a model — run the benchmark to measure your GPU and get a recommended model + a setup code (optional, but it takes the guesswork out).
  3. Create an agent in the appapp.keikaku.ai → Agents → New agent. Paste the setup code, and you get a ready-to-run command with your connection token and model baked in.
  4. Run it — paste the command; the agent connects and shows up as online.