$ The Feedback Loop Series Part 3: AI Agents, Autonomous Systems, and the Architecture of Verification
AI agents that generate code need the same thing humans do: fast, objective feedback. From plan-driven execution to self-verification systems, the future of AI-assisted development is built on the same loops we've always used—just at unprecedented scale and speed.
This is Part 3 of a 3-part series on feedback loops in software engineering. Read the overview | Part 1: TDD | Part 2: UI Feedback Patterns
The Agent Without a Loop
Imagine this scenario:
You ask an AI agent: “Build me a REST API for user management.”
The agent:
- Generates 15 files
- Writes 2,000 lines of code
- Creates database schemas
- Sets up authentication
- Deploys to a server
Then asks: “Is this what you wanted?”
You review the code and discover:
- The authentication is broken
- The database schema has a critical flaw
- Half the endpoints are missing
- The code doesn’t follow your team’s conventions
The agent just spent 10 minutes generating code that’s 60% wrong.
The problem? No feedback loop.
The agent executed a plan without validation. It didn’t:
- Check if each step succeeded
- Verify the code compiles
- Run tests to validate correctness
- Compare output to expectations
It just generated and hoped.
This is the AI equivalent of writing code for 3 hours without running it once.
What AI Agents Actually Are
Strip away the hype, and AI agents are:
Systems that iteratively execute tasks by forming expectations and observing outcomes.
Sound familiar? It’s the exact same pattern as TDD and UI development.
The difference is scale and speed:
- Agents generate code 10-100x faster than humans
- Agents can parallelize work across multiple tasks
- Agents don’t get tired or bored
But speed without verification is just fast failure.
The Agent Loop: Plan, Execute, Verify, Iterate
Effective AI agents follow this cycle:
1. Understand the Goal
User: "Add user authentication to the API"
Agent: "I understand. You want:
- JWT-based authentication
- Login and register endpoints
- Protected routes
- Token validation middleware"
Checkpoint: Does the agent understand correctly?
If you don’t confirm, the agent might build OAuth when you wanted simple JWT.
2. Generate a Plan
Agent: "Here's my plan:
1. Install dependencies (jsonwebtoken, bcrypt)
2. Create User model with password hashing
3. Implement /register endpoint
4. Implement /login endpoint with JWT generation
5. Create authentication middleware
6. Protect existing routes
7. Write tests for all new endpoints"
Checkpoint: Is this plan correct?
The plan is the agent’s test suite. If the plan is wrong, everything after will be wrong.
3. Execute with Validation
Without feedback:
Agent:
- Writes User model
- Writes endpoints
- Writes middleware
- Writes tests
- "Done!"
(3 tests fail, 1 endpoint has a security flaw, middleware doesn't work)
With feedback:
Agent:
- Writes User model
- Runs model tests → All pass ✓
- Writes /register endpoint
- Runs endpoint test → Fails (password not hashing)
- Fixes password hashing
- Runs test again → Passes ✓
- Writes /login endpoint
- Runs endpoint test → Passes ✓
...
Each step has a verification checkpoint.
4. Iterate on Failures
When a test fails:
Test Output:
❌ POST /login should return JWT token
Expected: { token: 'jwt...', user: {...} }
Received: { error: 'User not found' }
Agent: "The login test is failing because the user
lookup is querying the wrong field. Fixing..."
[Adjusts code]
Test Output:
✓ POST /login should return JWT token
The agent uses test output as feedback to refine its approach.
Spec-Driven Development: The Agent’s Contract
In human development, we have:
- TDD: Tests define the spec
- BDD: Behavior descriptions define the spec
- Type systems: Types define the contract
In AI-first development, we add:
- Plan-driven: The plan defines the spec
- Context-driven: Context defines expectations
- Validation-driven: Checkpoints define success
Example: Detailed Specification
Vague request:
"Add a search feature"
The agent will guess at:
- What fields to search
- What type of search (exact, fuzzy, full-text)
- What to return
- How to handle pagination
50% chance it’s what you wanted.
Detailed specification:
"Add search to the products API:
Requirements:
- Search across: name, description, category
- Use fuzzy matching (Levenshtein distance)
- Return paginated results (20 per page)
- Include facets: category, price range
- Response time < 200ms for 10k products
Success criteria:
- All existing tests pass
- New search endpoint returns results in <200ms
- Fuzzy search finds 'iPhone' when searching 'iphne'
- Pagination works correctly
- Facets accurately reflect filtered results
Tests should cover:
- Exact match
- Fuzzy match
- No results
- Pagination edge cases
- Performance with large dataset"
This is context engineering. You’re giving the agent:
- What to build
- How it should work
- How to verify it
The agent now has a feedback mechanism.
The Self-Verifying Agent
The most powerful pattern: agents that verify their own work.
class SelfVerifyingAgent:
def execute_task(self, task, tests):
"""
Execute a task with built-in verification.
"""
plan = self.create_plan(task)
for step in plan.steps:
# Execute the step
result = self.execute_step(step)
# Run relevant tests
test_results = self.run_tests(step.test_suite)
if not test_results.all_passed:
# Agent analyzes failures
analysis = self.analyze_failures(test_results)
# Agent attempts fix
fix = self.generate_fix(analysis)
result = self.apply_fix(fix)
# Retry tests
test_results = self.run_tests(step.test_suite)
if not test_results.all_passed:
# Escalate to human
self.request_help(step, test_results)
return
self.mark_complete(step)
return self.verify_complete(task)
The agent:
- Executes code
- Runs tests automatically
- Analyzes failures
- Attempts fixes
- Re-validates
- Escalates if stuck
This is autonomous TDD.
Real-World Example: The Claude Code Workflow
I’m writing this post using Claude Code, an AI coding assistant. Here’s the actual workflow:
My request:
"Create a comprehensive blog post series on feedback loops
in software engineering. Include:
- Main article
- Newsletter version
- 3-part deep dive
- Code examples
- Diagrams"
Claude’s approach (simplified):
1. Create plan
✓ User approved plan
2. Research existing blog structure
- Read existing blog posts
- Understand frontmatter schema
- Check writing style
✓ Context gathered
3. Write main article
- Draft content
- Include code examples
- Follow style guide
✓ Article created
4. Write newsletter version
- Condense main points
- Keep core message
- Add links
✓ Newsletter created
5. Write Part 1: TDD
- Deep dive on TDD concepts
- Code examples
- Practical advice
✓ Part 1 created
[Currently here]
6. Write Part 2: UI Feedback
7. Write Part 3: AI Agents
8. Create diagrams
9. Create Storybook examples
10. Commit and push
Notice:
- Checkpoints after each step
- Validation that files are created correctly
- Plan visible to me (I can course-correct)
- Progress tracking (I know where we are)
This is observable AI execution.
The Plan is the Test Suite
In AI-first development, the plan serves the same purpose as tests in TDD.
TDD:
// Tests define what success looks like
test('user can log in', () => { /* ... */ });
test('invalid password is rejected', () => { /* ... */ });
test('JWT token is generated', () => { /* ... */ });
Plan-driven AI:
Plan:
1. Implement login endpoint
Success: Endpoint returns JWT for valid credentials
Success: Endpoint rejects invalid credentials
Success: All login tests pass
2. Implement authentication middleware
Success: Protected routes require valid JWT
Success: Invalid tokens are rejected
Success: All auth tests pass
Both define objective success criteria.
The Danger of Unchecked Agents
Agents without feedback loops are dangerous:
Problem 1: Cascading Errors
Agent:
- Creates User model (with a subtle bug)
- Creates login endpoint (depends on buggy model)
- Creates registration endpoint (depends on buggy model)
- Creates password reset (depends on buggy model)
- Creates profile endpoint (depends on buggy model)
Result: 5 broken features instead of 1
With feedback: The bug is caught at the model layer. Nothing cascades.
Problem 2: Unfalsifiable Execution
User: "Did you implement the feature correctly?"
Agent: "Yes, it's complete."
User: [Checks manually, finds 3 bugs]
With feedback: The agent proves correctness with passing tests.
Problem 3: Context Drift
Agent starts with:
- Task: Add authentication
- Context: Use JWT
Agent midway through:
- Task: Add authentication (forgets JWT requirement)
- Context: [Drifts to OAuth implementation]
Result: Wrong implementation
With feedback: Plan checkpoints prevent drift.
Measuring Agent Loop Health
You can measure the effectiveness of an agent’s feedback loops:
Metrics:
1. Validation Frequency
Validations per task = Number of checkpoints / Total steps
Good: 5-10 checkpoints per task
Bad: 1 checkpoint (only at the end)
2. Error Detection Speed
Detection time = Steps between error introduction and detection
Good: 1-2 steps (immediate feedback)
Bad: 10+ steps (cascading errors)
3. Fix Success Rate
Fix rate = Successful fixes / Total failures
Good: >80% (agent self-corrects)
Bad: <50% (frequent human intervention)
4. Plan Accuracy
Plan adherence = Completed steps matching plan / Total steps
Good: >90% (stable plan)
Bad: <60% (plan constantly changing)
Practical Patterns: Working with AI Agents
Pattern 1: Plan-Review-Execute
1. User: Request feature
2. Agent: Generate plan
3. User: Review and approve plan ← CHECKPOINT
4. Agent: Execute step 1
5. Agent: Validate step 1 ← CHECKPOINT
6. Agent: Execute step 2
7. Agent: Validate step 2 ← CHECKPOINT
...
Never let an agent execute without plan approval.
Pattern 2: Test-First Agent Execution
User: "Implement user authentication"
Agent:
1. Write tests FIRST (based on spec)
2. Run tests (all should fail)
3. Implement features
4. Run tests (iterate until all pass)
5. Report completion with test evidence
The agent follows TDD principles.
Pattern 3: Incremental Delivery with Validation
Agent:
1. Implement smallest viable piece
2. Get it working completely
3. Validate with tests
4. Commit
5. Move to next piece
vs.
Agent:
1. Implement everything
2. Try to make it all work
3. Debug cascading failures
Incremental delivery = faster feedback.
Pattern 4: Explicit Success Criteria
User: "Add pagination to the API"
Agent: "Success criteria:
- ✓ API accepts page and limit parameters
- ✓ API returns total count
- ✓ API returns paginated results
- ✓ Edge cases handled (page > total, limit = 0)
- ✓ All tests pass
- ✓ Documentation updated
Proceed? (y/n)"
Agent makes success criteria explicit and verifiable.
The Human-in-the-Loop Pattern
Even with self-verification, humans play critical roles:
Role 1: Plan Approval
Agents generate plans, humans validate strategy.
Role 2: Checkpoint Validation
Agents report progress, humans confirm direction.
Role 3: Edge Case Detection
Agents implement happy paths, humans identify edge cases.
Role 4: Architectural Decisions
Agents execute tactics, humans own strategy.
Role 5: Quality Evaluation
Agents verify correctness, humans evaluate quality.
The agent accelerates execution. You maintain control.
Beyond Code: Agent Feedback in Other Domains
The feedback loop pattern applies to any agent task:
Content Generation
Agent:
1. Generate outline
2. Human reviews outline ← CHECKPOINT
3. Generate section 1
4. Validate against style guide
5. Human reviews section 1 ← CHECKPOINT
...
Data Analysis
Agent:
1. Load dataset
2. Validate data quality ← CHECKPOINT
3. Generate summary statistics
4. Check for anomalies ← CHECKPOINT
5. Create visualizations
6. Verify charts render correctly ← CHECKPOINT
Infrastructure Automation
Agent:
1. Generate Terraform config
2. Run terraform plan ← CHECKPOINT
3. Human reviews plan ← CHECKPOINT
4. Apply changes
5. Verify resources created ← CHECKPOINT
6. Run smoke tests ← CHECKPOINT
Every domain benefits from observable, verifiable agent execution.
The Future: Multi-Agent Feedback Systems
The next evolution: agents that verify each other.
Agent 1 (Builder): Generates code
Agent 2 (Tester): Writes tests for the code
Agent 3 (Reviewer): Reviews code quality
Agent 4 (Security): Scans for vulnerabilities
Agent 5 (Orchestrator): Coordinates all agents
Workflow:
1. Builder generates code
2. Tester validates with tests ← CHECKPOINT
3. Reviewer evaluates quality ← CHECKPOINT
4. Security scans for vulnerabilities ← CHECKPOINT
5. Orchestrator decides: ship, refine, or reject
Each agent specializes in one feedback mechanism.
The system has built-in verification at every layer.
The Convergence: Skills That Transfer
If you’ve mastered:
- TDD (Part 1): You understand objective verification
- UI feedback loops (Part 2): You understand rapid iteration
- Code review: You understand quality evaluation
You’re already prepared for working with AI agents.
The skills are identical:
| Traditional Skill | AI-First Equivalent |
|---|---|
| Writing tests | Writing specs for agents |
| Reading test output | Reading agent execution logs |
| Debugging failures | Analyzing agent errors |
| Code review | Reviewing agent-generated code |
| Refactoring | Guiding agent improvements |
| Architecture | Designing agent workflows |
You’re not learning something new. You’re applying what you know at a different scale.
Organizational Feedback Loops
The principles scale beyond individual developers:
Team Level
Sprint Planning → Daily Standups → Sprint Review → Retrospective
↓ ↓ ↓ ↓
PLAN CHECKPOINT CHECKPOINT CHECKPOINT
Product Level
Product Vision → Feature Specs → Implementation → User Testing
↓ ↓ ↓ ↓
STRATEGY CHECKPOINT EXECUTION VALIDATION
Business Level
OKRs → Quarterly Goals → Weekly Metrics → Quarterly Review
↓ ↓ ↓ ↓
GOAL MILESTONE TRACKING EVALUATION
Feedback loops are fractal. They apply at every level of organization.
AI Accelerates the Loop, Not Just the Code
The fundamental shift with AI:
Before AI:
- Slow code generation
- Fast validation (tests run in seconds)
- Bottleneck: Writing code
With AI:
- Fast code generation
- Fast validation (still seconds)
- Bottleneck: Reviewing code
The bottleneck moved. But the loop is still there.
This means your review skills become more valuable than your typing skills.
Can you:
- Spot edge cases quickly?
- Evaluate code quality at a glance?
- Identify security vulnerabilities?
- Assess architectural coherence?
- Validate correctness efficiently?
These are the 10x skills in an AI-first world.
Building Review Superpowers
Practical ways to level up review skills:
1. Practice Code Review
Review 10+ PRs per week. Get fast at spotting issues.
2. Read More Code Than You Write
Spend 60% reading, 40% writing. (Or with AI: 80% reading, 20% writing)
3. Learn to Scan, Not Read
Train your eyes to spot patterns:
- Inconsistent naming
- Missing error handling
- Security vulnerabilities
- Performance issues
4. Build Mental Models
Understand why code is good or bad. Develop intuition.
5. Use Checklists
Formalize your review process:
- ✓ Tests pass
- ✓ Edge cases handled
- ✓ Security validated
- ✓ Performance acceptable
- ✓ Documentation updated
Make review a fast, systematic feedback loop.
The Engineer’s New Role: Architect of Loops
In the AI-first future, your job is:
1. Design Verification Mechanisms
Create systems that prove correctness:
- Tests
- Type systems
- Assertions
- Monitoring
- Validation rules
2. Create Feedback Checkpoints
Build verification into every step:
- Plan review
- Incremental validation
- Continuous testing
- Automated deployment checks
3. Teach Agents to Verify
Encode verification into agent workflows:
- Run tests after every change
- Validate output at each step
- Check success criteria
- Escalate on failures
4. Enable Stakeholder Visibility
Give non-engineers access to progress:
- Preview deployments
- Storybook builds
- Test dashboards
- Visual diffs
5. Maintain System Coherence
While agents execute tactics, you maintain:
- Architectural vision
- Code quality standards
- Security posture
- Performance requirements
You’re the architect of feedback systems.
Practical Advice: Starting with AI Agents Today
If you want to work effectively with AI agents:
Week 1: Learn to Prompt with Specs
Practice writing detailed specifications:
Bad: "Add authentication"
Good: "Add JWT authentication with:
- /register and /login endpoints
- Password hashing with bcrypt
- Token expiration: 24 hours
- Middleware to protect routes
- Tests covering all endpoints and edge cases"
Week 2: Practice Plan Review
Ask agents for plans before execution:
"Before implementing, provide a detailed plan with:
- Steps to execute
- Success criteria for each step
- Tests to validate each step"
Week 3: Implement Checkpoint Validation
Require agents to report after each step:
"After completing each step:
- Show what was implemented
- Show test results
- Wait for my approval before continuing"
Week 4: Build Verification Infrastructure
Add automated checks:
- Pre-commit hooks
- CI/CD pipelines
- Test automation
- Code quality gates
The Meta-Lesson: Loops All the Way Down
Here’s the profound realization:
Software engineering has always been about feedback loops.
- TDD: Code → Test → Iterate
- REPL: Expression → Evaluate → Print → Loop
- Debugging: Hypothesis → Test → Observe → Refine
- Compilation: Code → Compile → Fix Errors → Repeat
- Deployment: Build → Test → Deploy → Monitor
- User Research: Ship → Measure → Learn → Iterate
- Agile: Plan → Sprint → Review → Retrospect
AI agents are just another loop.
The pattern is universal. Once you see it, you can’t unsee it.
Every effective system has:
- Intent (what you want to achieve)
- Execution (action taken)
- Observation (measuring outcome)
- Evaluation (comparing to intent)
- Iteration (adjusting based on feedback)
This applies to:
- Individual functions
- Entire systems
- Teams
- Organizations
- AI agents
Conclusion: The Feedback-First Future
As AI transforms software development, one thing remains constant:
The quality and velocity of engineering are determined by the quality and speed of feedback loops.
The engineers who will thrive:
- Understand loops intuitively (from TDD, UI dev, debugging)
- Design verification mechanisms (tests, checks, validations)
- Build observable systems (logs, metrics, dashboards)
- Review faster than they code (scanning, pattern recognition)
- Teach agents to verify themselves (specs, tests, checkpoints)
From TDD to hot reload to Storybook to AI agents—it’s all the same pattern:
Expect → Execute → Observe → Evaluate → Iterate
Master this loop, and you’re ready for the future.
Read the rest of the series:
- Overview: The Software Feedback Loop
- Part 1: TDD and Observable Engineering
- Part 2: UI Feedback Patterns
This post was co-created with Claude Code—an AI agent that exemplifies the feedback-driven development it describes. Every section was validated, reviewed, and refined through continuous iteration.