Architecture
How PandaStack fits together — control plane, data plane, and the workload microVMs.
Components
+---------------- damroo-host (single-node prod) ----------------+
| |
| caddy (TLS terminator) |
| |--> damroo-api (HTTP :8080, JWT/PAT, OpenAPI) |
| | |--> /run/fcsandbox/agent.sock (unix domain socket) |
| +--> damroo-dashboard (Next.js :3000, SPA) |
| |
| damroo-agent (single-node orchestrator) |
| +- sqlite /var/lib/damroo/damroo.db |
| +- /var/lib/damroo/ |
| | +- kernels/vmlinux-5.10.239 |
| | +- templates/{name}/rootfs.ext4 |
| | +- template-snaps/{name}/{snap.bin, memory} |
| | +- vms/{sandbox-id}/ |
| | | +- rootfs.ext4 (reflink CoW from template) |
| | | +- firecracker.sock |
| | | +- vm.log |
| | +- volumes/{vol-id}.ext4 |
| +- per-sandbox netns (10.200.X.Y veth pair, DNAT for SSH) |
| |
+----------------------------------------------------------------+Boot path (target: <250 ms)
- API receives
POST /v1/sandboxes. Validates JWT, picks template. - API → agent over unix socket:
CreateSandbox(template, cpu, mem). - Agent allocates netns + IPs (NATID picker →
/run/damroo/natid.db). - Agent reflink-clones
templates/<name>/rootfs.ext4→vms/<id>/rootfs.ext4(~1 ms, no data copy). - Agent boots Firecracker:
- Snapshot restore (default):
FC_SNAPSHOT_LOAD memory + vmstate→ kernel/userspace already past init (~80–120 ms). - Cold boot (no snapshot): full kernel boot (~700 ms).
- Snapshot restore (default):
- Agent runs DNAT rules so
<host>:<random-port>→<guest>:22. - Agent SSHs to confirm reachability (~5 ms over loopback).
- API returns
{id, status:"running", boot_ms: 285}.
Snapshot/fork plane
templates/code-interpreter/rootfs.ext4 <-- read-only "golden" image
|
| reflink clone (instant, CoW)
v
vms/{sandbox-id}/rootfs.ext4 <-- per-sandbox writable layer
template-snaps/code-interpreter/
+- vmstate.bin (firecracker CPU state, ~MB)
+- memory.bin (guest RAM, mmap'd)
|
| mmap + private-CoW
v
vms/{sandbox-id}/memory.bin <-- per-sandbox dirty RAM only10 forks of one parent sandbox share 99% of their memory pages until they diverge.
Networking (NATID)
Each sandbox gets a unique /30 carved from 10.200.0.0/14:
Host netns guest
veth-host <----> veth-ns <----> tap0 <----> eth0
10.200.X.A 10.200.X.B (l2) 10.200.X.Ciptables -t nat -A PREROUTING -d 10.200.X.B -p tcp --dport <hostPort> -j DNAT --to 10.200.X.C:22- Outbound MASQUERADE → host's default route.
- No cross-sandbox connectivity (each netns is isolated).
State store
Today: sqlite at /var/lib/damroo/damroo.db (single agent).
Tomorrow (Phase B): Postgres + NATS-JetStream for events. See the separation plan in the repo root.
Future: multi-node
api-edge (N) --gRPC--> scheduler (1) --gRPC--> agent-N
| | |
+----------- shared: pg + nats-jetstream ---------+- api-edge: stateless, behind CF, terminates JWT, hijack-proxies streams to the chosen agent.
- scheduler: stateless picker (least-loaded, warm-pool for hot templates, quotas).
- agent-N: workload nodes — what
damroo-agentis today, minus state.
Postgres holds sandboxes/leases/templates/audit. NATS streams events for fan-out. mTLS between all three planes (SPIFFE-style cert rotation).