Sandbox lifecycle
Born, used, paused, hibernated, forked, snapshotted, deleted — every state a sandbox can be in.
A sandbox is a single Firecracker microVM with a dedicated kernel, rootfs, network namespace,
and lifecycle. Sandboxes are identified by a UUID (e.g. 278a4f42-3467-4424-98e6-a547646dd0fd)
and always belong to a workspace.
States
create
│
▼
┌─ creating ─┐
│
│ snapshot restored, SSH ready
▼
running ◄────────────┐
│ │
pause │ │ resume
▼ │
paused ──────────────┘
│
│ hibernate
▼
hibernated (memory written to disk, VM stopped)
│
│ wake
▼
running
any state ──► failed (orchestrator marks unhealthy sandboxes)
any state ──► deleted (rootfs purged, slot released)Lifetime guarantees
- Cold create → running: P50 250 ms, P99 350 ms (snapshot mode on XFS+reflink).
- Pause → resume: < 5 ms (just
firecracker pause). - Hibernate → wake: ~150 ms (write memory state to disk, restore from it).
- Fork: ~50 ms per child (memory CoW + rootfs reflink).
Lease semantics
Every running sandbox holds a lease in the shared store. The agent that owns it
refreshes the lease every 10 s. If the lease expires (agent crash, network partition),
the scheduler marks the sandbox failed and releases its NATID slot.
This is what lets you safely have a multi-node cluster: a dead agent's sandboxes don't linger as zombies.
Health monitor
While a sandbox is running, the agent runs a background health monitor that:
- Pokes the SSH socket every 5 s.
- Checks Firecracker's
/stateendpoint every 30 s. - Tracks RSS + vCPU usage for metrics.
- Marks the sandbox
failedafter 3 consecutive failed probes.
You can subscribe to lifecycle changes via the events stream.