The operating system
for AI agents.
Orchestrate, run, and govern multi-agent flows with contracts that don't drift. Self-host day one. Air-gap supported. Built on AgentsKit.
Desktop app coming soon · CLI alpha shipping in M1
Flow engine
DAG-native. Durable. Time-travel debuggable.
Compose agents into flows: compare, vote, debate, auction, blackboard. Pause for humans. Branch from any past step.
Why it's different
Foundation over speed.
Existing agent platforms optimized speed-of-shipping. The result: drift, lock-in, abandoned plugins. We optimize the opposite.
Stable contracts
Zod at every boundary. SemVer strict. ADR before architecture. RFC before breaking changes. Backward-compat within a major.
Zero lock-in
30+ LLM adapters. Self-host day one. Air-gap supported. Workspace lockfile guarantees byte-reproducible runs across machines.
Enterprise-native
Signed audit log (Merkle chain). Capability-based RBAC. Egress default-deny. SOC 2 / HIPAA / GDPR aligned, not bolted on.
Capabilities
Everything serious teams need.
Signed audit log
Merkle-chained, HSM-ready. Tamper-evident trails for regulated workloads.
OpenTelemetry gen_ai
Datadog, Honeycomb, Langfuse, New Relic, Grafana, PostHog — out of the box.
MCP bridge v2
Publish AgentsKit tools as MCP servers. Consume any MCP server. Bidirectional.
Generative OS
Natural language → agent, flow, trigger, or tool. Editable, never opaque.
Run modes
production · preview · dry-run · replay · simulate · deterministic. Pick the safety floor.
Multi-agent topologies
compare · vote · debate · auction · blackboard. ReAct loops. Speculative execution.
Pre-flight cost estimate
Token + dollar projection before run. Live counter during. Per-tenant guardrails.
Sandbox runtimes
Side-effect declarations + tiered isolation. e2b built-in. Bring your own runtime.
Built for
Four wedges. One platform.
Healthcare & clinical
Air-gap mode. Safe-Harbor PII redaction. Patient consent + break-glass. Determinism mode for regulated decisions.
Coding & dev tooling
Repo-aware agents. Multi-runtime sandbox. Diff primitives. Cost-per-PR. Local-model fallback for offline work.
Marketing agencies
Multi-client workspace isolation. BrandKit (tone, banned phrases, disclaimers). Approval HITL. Per-client cost reporting.
Ops & SRE
Durable flows. Cron + webhook + CDC triggers. Cost heat map. Anomaly detection on traces. PagerDuty + Slack native.
Architecture
Thin layer. Strong contracts.
@agentskit/os-core stays under 15 KB gzipped. Everything else is independently installable. Use one piece without the desktop.
Quick start
Six commands to your first agent.
Pre-flight cost estimates. Workspace lockfile. Docker deploy. All from the CLI.
initScaffold workspace + sane defaultsdoctorDiagnose env, providers, sandboxrunExecute flow with run-mode + estimatelockPin versions for reproducibilitydeployShip to docker / cloud target
# install once core+cli shippnpm add -D @agentskit/os-cli# scaffoldpnpm agentskit-os init# diagnosepnpm agentskit-os doctor# run with cost estimate firstpnpm agentskit-os run pr-review --mode preview --estimate# lock + shippnpm agentskit-os lockpnpm agentskit-os deploy --target docker
Roadmap
Eight milestones to 1.0.
Public process. Every milestone ships ADRs, RFCs, tests, docs. No surprises.
- M1Core schemas + CLI alphaIn progress
- M2Desktop shell · FlowEditor · TraceViewerUp next
- M3Flow engine · DAG · durable · HITLPlanned
- M4Triggers · MCP bridge v2Planned
- M5Marketplace · plugin hostPlanned
- M6Observability · audit signing · vaultPlanned
- M7Generative OS (NL → flow)Planned
- M8Cloud sync · CRDT collab · 1.0Planned
Coming soon
Desktop app. Marketplace. Cloud sync.
Get early access. Shape the contracts before 1.0. Enterprise pilots opening Q3 2026.
No spam. Updates roughly every milestone.