AI Engineer
Build production LLM systems: prompting, RAG, evals, agents, fine-tuning, and the ops layer underneath.
An AI engineer is the bridge between research-grade models and shipping product. You don't need to train models from scratch — you need to compose them, evaluate them, and run them reliably in production.
Foundations
Just enough ML and transformer intuition to reason about how LLMs actually behave.
- 01
ML Basics
coreTraining vs. inference, loss, gradient descent. Skip the calculus; keep the intuition.
- 02
Transformer Intuition
coreAttention, embeddings, autoregressive generation. Read the paper once, then move on.
- 03
Tokenization & Context
coreTokens, BPE, context windows, why your prompt got truncated.
- 04
Sampling Parameters
coreTemperature, top-p, top-k, repetition penalty — what they actually do.
Prompt Engineering
The patterns that consistently lift quality without finetuning anything.
- 01
Prompt Patterns
coreZero-shot, few-shot, role prompting, chain-of-thought, and when each helps.
- 02
Structured Output
coreJSON modes, response schemas, tool-call coercion — making LLMs return clean data.
- 03
Prompt Caching
recommendedCache long system prompts and shared context. Pay once, reuse for hours.
- 04
Context Engineering
coreTreat the context window as a UX surface. Order matters; recency matters; relevance matters.
Retrieval-Augmented Generation
Give the model access to your data without retraining.
- 01
Embeddings
coreVector representations of text. Pick a model, normalize, store.
- 02
Vector Databases
corepgvector, Pinecone, Chroma, Weaviate. Pick by ops profile, not benchmarks.
- 03
Chunking Strategies
coreFixed, semantic, hierarchical. Bad chunking ruins good retrieval.
- 04
Hybrid Search & Reranking
recommendedBM25 + dense retrieval + a cross-encoder reranker beats any single method.
- 05
RAG Evaluation
coreFaithfulness, answer relevance, context precision — measure before you tune.
Agents & Tool Use
Let the model take actions in the real world — and survive when it goes off-script.
- 01
Tool Use (Function Calling)
coreDefine tools with JSON schemas; let the model pick. The foundation of every agent.
- 02
Agent Loops
coreReAct, plan-then-execute, self-critique. Most production agents are simpler than you think.
- 03
Multi-Agent Systems
recommendedPlanner/worker, supervisor/swarm. When more agents help vs. when they just add cost.
- 04
MCP (Model Context Protocol)
recommendedThe emerging standard for letting agents talk to your tools.
Evaluations
If you can't measure it, you can't ship it. Evals are the unsexy moat.
- 01
Eval Design
coreStart with 20 hand-graded examples. Scale from there. Don't skip this step.
- 02
LLM-as-Judge
recommendedUse a model to grade outputs. Calibrate against a human-labeled subset.
- 03
Regression Evals in CI
coreRun your eval set on every prompt change. Block merges on quality drops.
Fine-tuning
When prompting plateaus. Usually you can avoid it.
- 01
When to Fine-tune
coreAlmost never first. Exhaust prompting + RAG + tool design before reaching for FT.
- 02
LoRA & Adapter Methods
recommendedCheap, fast, reversible. The default if you actually need to fine-tune.
- 03
Dataset Curation
recommended1000 great examples beat 100k mediocre ones. Spend the time on data.
Production
Latency, cost, observability, safety — what makes a demo a product.
- 01
Inference & Streaming
coreTTFT vs. total latency, streaming tokens, batching, queueing.
- 02
Cost Controls
coreModel tiering, caching, prompt compression, fallbacks. Costs compound fast.
- 03
Observability
coreTrace every call. Log inputs, outputs, latencies, token counts, costs.
- 04
Safety & Guardrails
recommendedInput validation, output filters, jailbreak resistance. Plan for adversarial users.
- 05
A/B Testing Prompts
optionalRoll prompt changes like code changes — feature-flagged, measured, reversible.