Agent Builder
Design, deploy, and evaluate autonomous agents. From single tool-use loops to production multi-agent systems.
An agent builder ships LLM systems that take actions, not just generate text. The art is knowing when to add another agent, when to add a tool, and when to take the human out of the loop.
Agent Basics
What an agent actually is, beyond the marketing.
- 01
What Counts as an Agent
coreLoop + tools + autonomy. Anything less is just a chatbot with extra steps.
- 02
The Agent Loop
corePlan → act → observe → reflect → repeat. Every framework is some flavor of this.
- 03
When Not to Use an Agent
coreIf a single prompt or a small pipeline works, use that. Agents add latency, cost, and failure modes.
Tool Use
The thing that makes an agent more than a chatbot.
- 01
Designing Tools
coreNarrow schemas, clear names, honest error messages. Tools are an API for the model.
- 02
Tool Orchestration
coreParallel tool calls, dependent calls, partial failures. The model can't see your queue.
- 03
MCP Servers
recommendedPackage tools as MCP servers so any agent runtime can use them.
Memory & Context
Agents that forget the last thing they did aren't agents.
- 01
Short-Term (Context Window)
coreCompaction, summarization, sliding windows. Long sessions decay without help.
- 02
Long-Term Memory
recommendedFile-based, vector-based, structured. Pick by query pattern, not by hype.
- 03
State Machines
recommendedSome workflows aren't free-form — model the state explicitly, let the agent fill in the steps.
Orchestration Patterns
Single-agent loops, planner/worker, fan-out, voting, human-in-the-loop.
- 01
Single-Agent Loops
coreThe default. Master this before adding more agents.
- 02
Sequential Pipelines
corePlan → Build → Review → Deploy. Each stage is a focused agent.
- 03
Fan-Out / Fan-In
recommendedParallel subagents for independent work. Aggregate at the end.
- 04
Voting & Consensus
optionalWhen correctness matters more than latency — multiple agents, majority vote.
- 05
Human-in-the-Loop
coreApproval gates at the right points keep autonomy from becoming catastrophe.
Agent Evaluation
Agents fail in weirder ways than prompts. You need harnesses, not vibes.
- 01
Trajectory Evaluation
coreScore the path, not just the final answer. Wrong tools called early are silent failures.
- 02
Task Success Metrics
coreDefine 'done' for each task type. Auto-graded where possible, human-graded otherwise.
- 03
Regression Suites
recommendedLock in known-good traces. Block deploys when an agent regresses on them.
Safety & Permissions
Autonomy without guardrails is a liability. Sandbox, scope, audit.
- 01
Sandboxing
coreFilesystem, network, shell — restrict the blast radius before you give an agent the keys.
- 02
Permission Models
coreAllowlists, approval gates, scoped credentials. Default deny.
- 03
Audit Logs
recommendedEvery tool call, every input, every output. Future incidents need this trail.
Production Agents
What it takes to run agents 24/7 without a human babysitting.
- 01
Cost Budgeting
corePer-task budgets, model tiering, and the kill switch when an agent loops.
- 02
Agent Observability
coreTraces, span trees, replay. You'll want to scrub a failed run like a video.
- 03
Failure Modes
coreLoops, hallucinated tool calls, scope drift. Detect, retry, escalate.
- 04
SLAs & Reliability
recommendedWhat you can promise when the underlying model is non-deterministic.
Real Agent Patterns
Battle-tested templates you can adapt.
- 01
Auto-PR Pipeline
recommendedIssue → plan → patch → review → merge. The canonical multi-agent workflow.
- 02
- 03
Triage Swarm
optionalClassify, label, and route incoming issues / tickets / emails.
- 04