RM-004 · Advanced · 4–6 months

Agent Builder

Design, deploy, and evaluate autonomous agents. From single tool-use loops to production multi-agent systems.

An agent builder ships LLM systems that take actions, not just generate text. The art is knowing when to add another agent, when to add a tool, and when to take the human out of the loop.

0/28 topics · 0% complete

S01

Agent Basics

What an agent actually is, beyond the marketing.

01
What Counts as an Agent
core
Loop + tools + autonomy. Anything less is just a chatbot with extra steps.
- lessonWhat Is Antigravity
02
The Agent Loop
core
Plan → act → observe → reflect → repeat. Every framework is some flavor of this.
- lessonThe Agentic Loop
03
When Not to Use an Agent
core
If a single prompt or a small pipeline works, use that. Agents add latency, cost, and failure modes.

S02

Tool Use

The thing that makes an agent more than a chatbot.

01
Designing Tools
core
Narrow schemas, clear names, honest error messages. Tools are an API for the model.
02
Tool Orchestration
core
Parallel tool calls, dependent calls, partial failures. The model can't see your queue.
03
MCP Servers
recommended
Package tools as MCP servers so any agent runtime can use them.
- lessonMCP Servers

S03

Memory & Context

Agents that forget the last thing they did aren't agents.

01
Short-Term (Context Window)
core
Compaction, summarization, sliding windows. Long sessions decay without help.
02
Long-Term Memory
recommended
File-based, vector-based, structured. Pick by query pattern, not by hype.
03
State Machines
recommended
Some workflows aren't free-form — model the state explicitly, let the agent fill in the steps.

S04

Orchestration Patterns

Single-agent loops, planner/worker, fan-out, voting, human-in-the-loop.

01
Single-Agent Loops
core
The default. Master this before adding more agents.
02
Sequential Pipelines
core
Plan → Build → Review → Deploy. Each stage is a focused agent.
- lessonSequential Pipelines
03
Fan-Out / Fan-In
recommended
Parallel subagents for independent work. Aggregate at the end.
- lessonFan-Out / Fan-In
04
Voting & Consensus
optional
When correctness matters more than latency — multiple agents, majority vote.
- lessonVoting & Consensus
05
Human-in-the-Loop
core
Approval gates at the right points keep autonomy from becoming catastrophe.
- lessonHuman in the Loop

S05

Agent Evaluation

Agents fail in weirder ways than prompts. You need harnesses, not vibes.

01
Trajectory Evaluation
core
Score the path, not just the final answer. Wrong tools called early are silent failures.
02
Task Success Metrics
core
Define 'done' for each task type. Auto-graded where possible, human-graded otherwise.
03
Regression Suites
recommended
Lock in known-good traces. Block deploys when an agent regresses on them.

S06

Safety & Permissions

Autonomy without guardrails is a liability. Sandbox, scope, audit.

01
Sandboxing
core
Filesystem, network, shell — restrict the blast radius before you give an agent the keys.
- lessonSecurity
02
Permission Models
core
Allowlists, approval gates, scoped credentials. Default deny.
- lessonApproval Gates
03
Audit Logs
recommended
Every tool call, every input, every output. Future incidents need this trail.

S07

Production Agents

What it takes to run agents 24/7 without a human babysitting.

01
Cost Budgeting
core
Per-task budgets, model tiering, and the kill switch when an agent loops.
- lessonCost Budgeting
02
Agent Observability
core
Traces, span trees, replay. You'll want to scrub a failed run like a video.
- lessonObservability
03
Failure Modes
core
Loops, hallucinated tool calls, scope drift. Detect, retry, escalate.
- lessonFailure Modes
04
SLAs & Reliability
recommended
What you can promise when the underlying model is non-deterministic.
- lessonSLAs

S08

Real Agent Patterns

Battle-tested templates you can adapt.

01
Auto-PR Pipeline
recommended
Issue → plan → patch → review → merge. The canonical multi-agent workflow.
- lessonAuto-PR Pipeline
02
Doc Team
optional
Generate, review, and refresh docs from the codebase on a schedule.
- lessonDoc Team
03
Triage Swarm
optional
Classify, label, and route incoming issues / tickets / emails.
- lessonTriage Swarm
04
Migration Squad
optional
Codebase-wide refactors broken into parallel agents.
- lessonMigration Squad

Agent Builder

What Counts as an Agent

The Agent Loop

When Not to Use an Agent

Designing Tools

Tool Orchestration

MCP Servers

Short-Term (Context Window)

Long-Term Memory

State Machines

Single-Agent Loops

Sequential Pipelines

Fan-Out / Fan-In

Voting & Consensus

Human-in-the-Loop

Trajectory Evaluation

Task Success Metrics

Regression Suites

Sandboxing

Permission Models

Audit Logs

Cost Budgeting

Agent Observability

Failure Modes

SLAs & Reliability

Auto-PR Pipeline

Doc Team

Triage Swarm

Migration Squad