The Working Layer

Frameworks — Pi, Superpowers, gstack, Hermes Agent, and Paperclip — represent the engineering discipline that makes autonomous systems usable in practice.

Apr 14, 2026

Series context. This article is part of a sequence in this series. However, if you want to avoid the guide and case study (to follow), minimally, read these three: The Loop is the Lab covers the architecture of eight autonomous research systems through a seven-primitive analytical framework + governance. The Speciation of Intelligence covers eight evaluation harnesses (Loops A–H). This article covers the practitioner layer: some of the tools developers and teams are deploying in production.

Recommended reading order: Loop is the Lab → Speciation of Intelligence → This article (The Working Layer).

The Gap Nobody Named

Research-grade systems — AlphaEvolve, Darwin Gödel Machine, NVIDIA NemoClaw — run at scales most practitioners will never touch. But the practitioner who wants the discipline of autonomous systems without the infrastructure of a national lab has, since March 2026, had a rapidly expanding toolkit. All frameworks reviewed here have grown substantially in the weeks since this series first publication — in some cases beyond all expectations (mine at least). Again, hard to keep up, as you test and token spend.

The irony is that the gap emerged from abundance, not scarcity. By early 2026, frontier coding agents became capable enough to produce working production code autonomously. The problem shifted from can the agent do this? to how do I manage an agent that can do almost anything? The practitioner frameworks answer that second question, and the velocity of adoption suggests developers recognise the answer as necessary, not optional.

"OpenClaw is an employee. Paperclip is the company."
— Paperclip GitHub documentation · February 2026

Framework 0 — Pi

Minimal Core - 4 Tools (Read, Write, Edit, Bash) - Session Trees - Self-Extending - OpenClaw Engine

Mario Zechner (badlogic), pi-mono

34,200+ stars, pi-skills: 1,100+ stars (MIT Licence)

Every other framework in this article is an addition — a layer of process, memory, or coordination placed on top of an existing agent. Pi is the opposite. It is a deliberate subtraction. Mario Zechner’s design principle is to give the model the absolute minimum it needs to be useful, then let it build everything else. The core is four tools: Read, Write, Edit, Bash. The system prompt is reputedly the shortest of any production coding agent. Nothing else is baked in.

It is a considered philosophical stance. Pi argues that every feature you add to the agent core is a feature the agent can no longer reason freely about. The moment you bake in MCP support, you have implicitly decided how the agent should communicate with external tools. Pi’s answer: if you want MCP, ask the agent to build an extension that does it, or use Peter Steinberger’s mcporter which exposes MCP calls via a CLI interface the agent can invoke with Bash. The agent maintains its own capability surface.

The extension system is what makes this viable rather than merely minimal. Extensions are TypeScript files loaded at runtime via jiti (no compilation step). Each extension can register new tools, slash commands, keyboard shortcuts, and TUI widgets. Crucially, extension state persists into sessions — so an extension can accumulate context across interactions rather than resetting. The agent itself can write, hot-reload, and test new extensions within a single session loop, creating a software-builds-software dynamic that distinguishes Pi from every tool that relies on pre-packaged skill libraries.

The most architecturally distinctive property is the session tree. Where every other agent maintains a linear conversation thread, Pi sessions are trees. A branch is a full fork of the session state at any point: a sub-session with its own context, its own tool calls, its own reasoning. When the branch completes, Pi can rewind the main session and summarise what happened on the fork. The practical use Armin Ronacher describes: branch to run a code review, get findings, bring specific fixes back to the main session without contaminating the main context with review overhead. Or branch to fix a broken extension tool without wasting tokens in the primary session.

Pi is also the engine of OpenClaw — the substrate on which Peter Steinberger built the messaging-connected agent that went viral in January 2026. The Pi SDK (@mariozechner/pi-coding-agent) is designed specifically so others can build their own agents on it: RPC mode over stdin/stdout for non-Node integrations, a multi-provider AI abstraction that allows sessions to contain messages from different model providers, and a published session format (pi-share-hf) for sharing work sessions to HuggingFace for training data contribution.

Armin Ronacher’s community-contributed extensions illustrate the extensibility model concretely: /answer extracts questions from agent prose and reformats them into a clean input box; /todos manages local markdown task files that both agent and user can manipulate; /review branches the session to a fresh review context and reports findings back; /control lets one Pi instance send prompts to another; /files surfaces all session-referenced files with VS Code diff and Finder reveal.

Considerations/Issues Raised: Pi's minimalism is also its ceiling for practitioners who want structure immediately. Superpowers or gstack can be installed in minutes and impose disciplined workflows on any Claude Code session. Pi requires the practitioner to build their own discipline — or at least know to ask the agent to do so. The "agent builds its own extensions" loop is powerful but front-loads cognitive work: you must know what capability you want before you can ask the agent to create it. For developers who are already fluent with agentic workflows, this is a feature. For those still learning what good agentic practice looks like, the other frameworks in this article are better starting points. Additionally, Pi intentionally omits sub-agents and plan mode — legitimate limitations for workflows that depend on either.

Framework I — Superpowers

obra (Jesse Vincent / Prime Radiant), GitHub · Oct 2025, 146,000+ stars (MIT Licence) Global Rank #47

Jesse Vincent noticed a pattern that every developer using AI coding agents eventually hits: the agent is highly capable but constitutionally undisciplined. It rushes to produce output. It writes implementation before understanding requirements. It skips tests. Superpowers does not try to make the model smarter — it gives the model process guardrails. A library of SKILL.md files with explicit instructions, hard gates, and mandatory decision points activates automatically based on task context.

Jesse Vincent is now associated with Prime Radiant, the company built around Superpowers. The framework has expanded from primarily Claude Code to 7+ platforms: Claude Code, Cursor, Codex, OpenCode, Gemini CLI, GitHub Copilot, and Copilot Cloud Agent (PR pending). A Codex App compatibility design spec was added March 23, handling read-only environment detection, worktree-safe skill behaviour, and sandbox fallback patterns. The subagent plan-review loop was removed — it was adding ~25 min overhead without measurably improving plan quality. Bootstrap injection now prepends to the first user message instead of adding a system message, fixing compatibility with Qwen and models that break on multiple system prompts. Skills path logic was also corrected for consistency.

The core workflow remains unchanged: Brainstorm → Design approval → Plan (2–5 min tasks, complete context) → Subagent execution (each task gets fresh context) → Two-stage review → Ship. Each phase is a separate skill with explicit entry and exit criteria. The TDD enforcement is absolute: when an agent writes code before tests, Superpowers auto-deletes the code. No exceptions. Developers using the framework report 85–95% test coverage compared to 30–50% without it.

The SKILL.md skill format used by Superpowers underpins the agentskills.io cross-platform standard. A correction was made in this period: the SKILL.md frontmatter documentation previously claimed the format supported “only two fields” — it now correctly states “two required fields” with a link to the full agentskills.io spec.

Considerations: Superpowers enforces process but cannot choose the right process. If the methodology itself is wrong for a given problem type — e.g. applying TDD to exploratory research code — the framework enforces the wrong discipline rigorously. The human must configure the methodology, not just run it. This has not changed.

Framework II — gstack

Garry Tan / Y Combinator, GitHub March 2026 · ~70,000 stars · 23 tools (MIT Licence)

gstack's answer to the agent management problem is to separate roles into distinct slash-command skills, each loaded into its own clean context with a specific persona, specific questions, and specific acceptance criteria. The CEO challenges product framing; the staff engineer finds production vulnerabilities; the QA lead tests the actual app in a real browser. When those roles run in the same context window, mediocrity across all dimensions is the result.

Key new capabilities: /pair-agent (cross-vendor browser coordination — the first time agents from different providers can share a browser with real security: scoped tokens, tab isolation, domain restrictions, activity attribution); /document-release (auto-updates all project docs on every /ship); /autoplan (chains CEO + design + eng reviews into one pipeline); /careful, /freeze, /guard, /investigate (safety guardrails). Review Readiness Dashboard added — shows which reviews have run before you ship. Smart routing: CEO review is not triggered for infrastructure bug fixes. Codex and Gemini CLI support added via --host flag. 600K+ lines of production code claimed by Tan across 60 days. The codebase is now primarily TypeScript (79.6%) with Go (18.3%), co-authored with Claude Opus 4.6.

The browser subsystem — a persistent Chromium daemon over localhost HTTP — remains the core non-obvious technical component. Cold starts have been eliminated (~100–200ms calls vs 3–5 second per-call launches). /pair-agent, the newest notable addition, extends this to cross-agent collaboration: another agent (OpenClaw, Hermes, Codex) opens its own tab in the same managed browser, coordinated via scoped tokens. Agents from different vendors can now visually coordinate through a shared browser without interfering with each other.

Criticism of gstack as “just prompts in text files” remains ongoing. The balanced assessment holds: the individual slash commands are not revolutionary, but the persistent browser daemon and the coherent full-sprint workflow add genuine value beyond what ad-hoc prompts achieve. The /pair-agent feature in particular goes materially beyond what any pure-Markdown approach can provide.

Considerations: gstack encodes Garry Tan's epistemics. These are high-quality questions shaped by advising thousands of YC startups, but they are one person's priors — potentially wrong for domains outside B2B SaaS and consumer apps. The skills can be customised; the defaults carry assumptions. The /pair-agent feature introduces a new attack surface: scoped tokens and tab isolation are implemented, but cross-vendor agent coordination in a shared browser environment is genuinely novel territory with limited security track record as of April 2026.

Framework III — Hermes Agent

Nous Research, GitHub - February 2026 38,700+ stars v0.8.0 (Apr 8) (MIT Licence)

Hermes Agent is built for the case where you are not present — where you want an agent that lives on a server, remembers everything it has ever learned, reaches you via Telegram while it works on a cloud VM, and grows more capable the longer it runs. The core innovation is a multi-level memory hierarchy that functions as procedural learning, synthesising completed tasks into agentskills.io Skill Documents that persist and compound across sessions.

The architecture remains three-tiered — user interfaces (CLI, Telegram, Discord, Slack, WhatsApp, Signal, Email, and a growing platform list), core ReAct loop (Observe → Reason → Act), and execution backends (Local, Docker, SSH, Singularity, Modal) — but the surface area of each tier has expanded substantially. The hermes-paperclip-adapter in particular closes a structural gap that the original article identified: Hermes and Paperclip can now be formally integrated, with Hermes running as an accountable agent within a Paperclip org chart, budget, and audit trail.

The agentskills.io standard adopted across 11+ tools (Claude Code, Cursor, GitHub Copilot, Gemini CLI, VS Code, and others) remains a key differentiating characteristic. Skills written for Hermes work in Claude Code and vice versa.

Considerations: Hermes is the most capable persistent agent in this group, but "grows with you" is also its primary risk: Skill Documents accumulate without a formal audit mechanism. Incorrect workflows can persist and be retrieved as if they were validated. Memory depth multiplies errors as reliably as successes. The hermes-agent-self-evolution repo (DSPy + GEPA) introduces a further wrinkle: a system that evolves its own skills adds the possibility that a flawed skill improves along the wrong dimension. This was not present in the original article — it is new as of late March 2026.

Framework IV — Paperclip

Multi-Agent Orchestration - Org Charts - Budget Governance

paperclipai, GitHub February 2026 : 43,000+ stars (MIT Licence)

Paperclip was built by a founder who was running 20–30 Claude Code sessions simultaneously and could not remember what any of them were doing. The solution was not to reduce the number of agents — it was to give them an org chart. Paperclip models the management layer of a company: goals cascade from mission to department to task; agents have roles and reporting lines; every action is traced in an immutable audit log; each agent has a monthly budget with auto-pause at 100% utilisation.

The key distinction from every other framework in this article remains intact: Superpowers and gstack govern how one agent works; Hermes governs how one persistent agent remembers; Paperclip governs how multiple agents coordinate toward a shared business goal. If OpenClaw is an employee, Paperclip is the company that employs it — and Hermes Agent can now formally be one of those employees.

The heartbeat architecture (any system that can receive a heartbeat signal is hireable) and the goal propagation mechanism (tasks carry full goal ancestry so agents always know the “why”) remain the structural innovations that differentiate Paperclip from simpler multi-agent tools like CrewAI or AutoGen.

Considerations: Paperclip models a company but does not model trust. It enforces budget and goal constraints — but cannot verify that an agent's reported action matches its actual action. The audit trail shows what agents claimed they did; production monitoring must verify they actually did it. The hermes-paperclip-adapter integration adds a new consideration: Hermes's self-improving skill library is now operating inside Paperclip's governance structure, but Paperclip's approval gates do not inspect Hermes's skill documents. A skill that Hermes self-generates and stores is not subject to Paperclip review before it is used.

Does Model Choice Change Results?

Yes — but differently for each layer. The cross-cutting insight remains unchanged, with one addendum: the expanded model catalogue in Hermes (now 400+ via Nous Portal) makes model-tiering inside a Hermes deployment significantly more practical. The ability to switch models mid-session (/model in v0.8.0) without restarting means developers can now run expensive frontier models for complex reasoning tasks and drop to cheaper models for routine tool-calling within the same agent session.

How These Connect to the Research Layer

The full stack now has an additional foundational layer.

From infrastructure to orchestration: NemoClaw (governance/safety) → Pi (minimal execution engine; OpenClaw’s substrate) → OpenClaw/Claude Code/Codex (execution environments) → Hermes/gstack/Superpowers (workflow discipline and persistence) → Paperclip (multi-agent company operating system). Pi occupies the same architectural position as Claude Code and OpenClaw — it is an execution environment — but with an inverted design philosophy.

Where OpenClaw maximises capability out of the box, Pi minimises the core and delegates capability to the agent itself.

The pi-skills repository is interoperable with Claude Code and Codex CLI — a skill written for Pi works in Claude Code and vice versa, consistent with the agentskills.io standard that Hermes also uses. The session tree architecture has no equivalent elsewhere in this stack: it is the only execution environment in this article that supports forking, rewinding, and merging session contexts, a capability that becomes relevant when you want to run code reviews, tool repairs, or exploratory side-quests without contaminating the primary session’s context.

The hermes-paperclip-adapter and gstack’s /pair-agent (discussed in earlier sections) mean the methodology and persistence layers are now formally interconnected. Pi’s /control extension — which lets one Pi instance prompt another — is a lighter-weight analogue to Paperclip’s orchestration: experimental multi-agent coordination without the org chart overhead.

Which Should I Use?

References

Framework Repositories

[1] Mario Zechner (badlogic). pi-mono: AI agent toolkit — coding agent CLI, unified LLM API, TUI & web UI libraries. GitHub, 2025–2026. github.com/badlogic/pi-mono — 34.2K+ stars; 3.9K forks; core of OpenClaw; actively maintained as of April 12, 2026 (OSS weekend in progress).

[2] Mario Zechner (badlogic). pi-skills: Skills for pi coding agent. GitHub, 2025–2026. github.com/badlogic/pi-skills — 1.1K+ stars; compatible with Claude Code and Codex CLI; follows agentskills.io format.

[3] Jesse Vincent (obra) / Prime Radiant. Superpowers: Agentic Skills Framework. GitHub, October 2025. github.com/obra/superpowers — 146K+ stars; confirmed active April 11, 2026.

[4] Garry Tan. gstack: 23 opinionated tools for Claude Code and 8 agent platforms. GitHub, March 2026. github.com/garrytan/gstack — ~70K stars; v0.15.14.0.

[5] Nous Research. Hermes Agent: The Self-Improving AI Agent. GitHub, February 2026. github.com/NousResearch/hermes-agent — 38.7K+ stars; v0.8.0 (April 8, 2026).

[6] paperclipai. Paperclip: Open-Source Orchestration for Zero-Human Companies. GitHub, February 2026. github.com/paperclipai/paperclip — 43K+ stars; confirmed stable April 12, 2026.

Pi — Primary Sources

[7] Ronacher, A. (mitsuhiko). Pi: The Minimal Agent Within OpenClaw. lucumr.pocoo.org, January 31, 2026. lucumr.pocoo.org/2026/1/31/pi/ . Armin Ronacher’s account of Pi’s philosophy, session tree architecture, extension model, and OpenClaw lineage. Licensed CC BY-NC 4.0.

[8] pi.dev / badlogic. Pi coding agent documentation. pi.dev, 2026. pi.dev — Official product documentation covering extension API, skill system, package management (pi install), RPC mode, and SDK usage.

[9] badlogic. pi-mono/packages/coding-agent README. GitHub, 2026. github.com/badlogic/pi-mono/packages/coding-agent — Technical reference for the coding agent package: four built-in tools, extension system, session tree, SDK integration, RPC protocol.

[10] badlogic. pi-mono/packages/coding-agent/docs/extensions.md. GitHub, 2026. github.com/badlogic/pi-mono/…/extensions.md — Complete TypeScript ExtensionAPI reference: registerTool, registerCommand, registerShortcut, registerFlag, ctx.ui.confirm/notify/setStatus/setWidget, event hooks (session_start, tool_call, session_fork, session_switch).

[11] Zechner, M. (badlogic). What I learned building an opinionated and minimal coding agent. mariozechner.at, November 2025. mariozechner.at — Pi’s creator on the design philosophy: why no sub-agents or plan mode; synchronous bash; multi-provider session portability; TUI rendering architecture; explicit discussion of the trade-offs vs Claude Code.

[12] Ronacher, A. (mitsuhiko). agent-stuff: Pi extensions and skills. GitHub. github.com/mitsuhiko/agent-stuff — Reference implementation of Pi extensions by a prominent contributor: /answer, /todos, /review, /control, /files, and CDP-based browser skill. Illustrates what “agent builds its own tooling” looks like in practice.

[13] Ronacher, A. I don’t use plan mode. lucumr.pocoo.org, December 2025. lucumr.pocoo.org/2025/12/17/what-is-plan-mode/ — Context for Pi’s decision to omit plan mode; the argument for natural prose + /answer over structured question dialogs.

[14] badlogic. pi-mono/releases. GitHub, 2026. github.com/badlogic/pi-mono/releases — Changelog including recent TUI sync rendering, Kitty keyboard protocol, cross-cwd session replacement, Earendil startup announcements (April 8–9, 2026).

[15] badlogic. pi-share-hf: Publish Pi sessions to HuggingFace. GitHub. github.com/badlogic/pi-share-hf — Session publication tool for contributing real-world agent trajectories as training data; part of the OSS agent data sharing initiative.

[16] Steinberger, P. (steipete). mcporter: MCP servers as CLI tools. GitHub. github.com/steipete/mcporter — Referenced in Pi’s documentation as the recommended bridge for users who need MCP capability without embedding MCP in Pi’s core.

Additional References — April 2026

[17] Nous Research. hermes-paperclip-adapter. GitHub, April 4, 2026. github.com/NousResearch/hermes-paperclip-adapter — TypeScript, MIT, 735 stars.

[18] Nous Research. hermes-agent-self-evolution. GitHub, March 29, 2026. github.com/NousResearch/hermes-agent-self-evolution — DSPy + GEPA evolutionary skill optimisation.

[19] Nous Research. Hermes Agent v0.8.0 Release Notes. GitHub, April 8, 2026. github.com/NousResearch/hermes-agent/releases/tag/v2026.4.8

[20] Nous Research. Hermes Agent v0.7.0 Release Notes. GitHub, April 3, 2026. github.com/NousResearch/hermes-agent/releases/tag/v2026.4.3

[21] obra/superpowers Releases. GitHub, March–April 2026. github.com/obra/superpowers/releases

[22] Augment Code. Garry Tan open-sources gstack. April 6, 2026. augmentcode.com

[23] Awesome Agents. GStack Guide. Updated April 8, 2026. awesomeagents.ai

[24] ByteIota. Superpowers Skills Framework Hits 121K Stars. byteiota.com

[25] dplooy. Paperclip AI: Build Zero-Human Companies with Agents. April 2026. dplooy.com

[26] OSSInsight. 50,000 Stars for One Person’s Config File. March 27, 2026. ossinsight.io

Original References

[27] Nous Research. Hermes Agent Documentation. 2026. hermes-agent.nousresearch.com/docs — updated for v0.8.0.

[28] Jesse Vincent. Superpowers: How I’m Using Coding Agents. obra.github.io, October 2025. Creator’s original methodology post.

[29] gstacks.org. GStack documentation. 2026. gstacks.org — reflects earlier tool count; primary source now garrytan/gstack.

[30] MarkTechPost. Nous Research Releases Hermes Agent. February 26, 2026. marktechpost.com

[31] TechCrunch. Why Garry Tan’s Claude Code setup has gotten so much love, and hate. March 17, 2026. techcrunch.com — Tool count now outdated (8 at launch; now 23).

[32] MarkTechPost. Garry Tan Releases gstack. March 14, 2026. marktechpost.com ; tool count outdated.

[33] eWeek. Meet Paperclip: The Tool Turning OpenClaw Agents Into an AI Company. March 2026. eweek.com

[34] agentskills.io. The Open Standard for Agent Skills. 2026. agentskills.io

[35] Andrej Karpathy / Eureka Labs. AutoResearch. GitHub, March 2026. github.com/karpathy/autoresearch — Covered in Articles of this series.

[36] NVIDIA Corporation. NemoClaw. GitHub, 2026. github.com/NVIDIA/NemoClaw — Covered in Articles of this series.

Interesting Engineering++

Discussion about this post

Ready for more?