The Feedback Loop Series Part 3: AI Agents, Autonomous Systems, and the Architecture of Verification

This is Part 3 of a 3-part series on feedback loops in software engineering. Read the overview | Part 1: TDD | Part 2: UI Feedback Patterns

The Agent Without a Loop

Imagine this scenario:

You ask an AI agent: “Build me a REST API for user management.”

The agent:

Generates 15 files
Writes 2,000 lines of code
Creates database schemas
Sets up authentication
Deploys to a server

Then asks: “Is this what you wanted?”

You review the code and discover:

The authentication is broken
The database schema has a critical flaw
Half the endpoints are missing
The code doesn’t follow your team’s conventions

The agent just spent 10 minutes generating code that’s 60% wrong.

The problem? No feedback loop.

The agent executed a plan without validation. It didn’t:

Check if each step succeeded
Verify the code compiles
Run tests to validate correctness
Compare output to expectations

It just generated and hoped.

This is the AI equivalent of writing code for 3 hours without running it once.

What AI Agents Actually Are

Strip away the hype, and AI agents are:

Systems that iteratively execute tasks by forming expectations and observing outcomes.

Sound familiar? It’s the exact same pattern as TDD and UI development.

The difference is scale and speed:

Agents generate code 10-100x faster than humans
Agents can parallelize work across multiple tasks
Agents don’t get tired or bored

But speed without verification is just fast failure.

The Agent Loop: Plan, Execute, Verify, Iterate

Effective AI agents follow this cycle:

1. Understand the Goal

User: "Add user authentication to the API"

Agent: "I understand. You want:
- JWT-based authentication
- Login and register endpoints
- Protected routes
- Token validation middleware"

Checkpoint: Does the agent understand correctly?

If you don’t confirm, the agent might build OAuth when you wanted simple JWT.

2. Generate a Plan

Agent: "Here's my plan:
1. Install dependencies (jsonwebtoken, bcrypt)
2. Create User model with password hashing
3. Implement /register endpoint
4. Implement /login endpoint with JWT generation
5. Create authentication middleware
6. Protect existing routes
7. Write tests for all new endpoints"

Checkpoint: Is this plan correct?

The plan is the agent’s test suite. If the plan is wrong, everything after will be wrong.

3. Execute with Validation

Without feedback:

Agent:
- Writes User model
- Writes endpoints
- Writes middleware
- Writes tests
- "Done!"

(3 tests fail, 1 endpoint has a security flaw, middleware doesn't work)

With feedback:

Agent:
- Writes User model
- Runs model tests → All pass ✓
- Writes /register endpoint
- Runs endpoint test → Fails (password not hashing)
- Fixes password hashing
- Runs test again → Passes ✓
- Writes /login endpoint
- Runs endpoint test → Passes ✓
...

Each step has a verification checkpoint.

4. Iterate on Failures

When a test fails:

Test Output:
❌ POST /login should return JWT token
   Expected: { token: 'jwt...', user: {...} }
   Received: { error: 'User not found' }

Agent: "The login test is failing because the user
lookup is querying the wrong field. Fixing..."

[Adjusts code]

Test Output:
✓ POST /login should return JWT token

The agent uses test output as feedback to refine its approach.

Spec-Driven Development: The Agent’s Contract

In human development, we have:

TDD: Tests define the spec
BDD: Behavior descriptions define the spec
Type systems: Types define the contract

In AI-first development, we add:

Plan-driven: The plan defines the spec
Context-driven: Context defines expectations
Validation-driven: Checkpoints define success

Example: Detailed Specification

Vague request:

"Add a search feature"

The agent will guess at:

What fields to search
What type of search (exact, fuzzy, full-text)
What to return
How to handle pagination

50% chance it’s what you wanted.

Detailed specification:

"Add search to the products API:

Requirements:
- Search across: name, description, category
- Use fuzzy matching (Levenshtein distance)
- Return paginated results (20 per page)
- Include facets: category, price range
- Response time < 200ms for 10k products

Success criteria:
- All existing tests pass
- New search endpoint returns results in <200ms
- Fuzzy search finds 'iPhone' when searching 'iphne'
- Pagination works correctly
- Facets accurately reflect filtered results

Tests should cover:
- Exact match
- Fuzzy match
- No results
- Pagination edge cases
- Performance with large dataset"

This is context engineering. You’re giving the agent:

What to build
How it should work
How to verify it

The agent now has a feedback mechanism.

The Self-Verifying Agent

The most powerful pattern: agents that verify their own work.

class SelfVerifyingAgent:
    def execute_task(self, task, tests):
        """
        Execute a task with built-in verification.
        """
        plan = self.create_plan(task)

        for step in plan.steps:
            # Execute the step
            result = self.execute_step(step)

            # Run relevant tests
            test_results = self.run_tests(step.test_suite)

            if not test_results.all_passed:
                # Agent analyzes failures
                analysis = self.analyze_failures(test_results)

                # Agent attempts fix
                fix = self.generate_fix(analysis)
                result = self.apply_fix(fix)

                # Retry tests
                test_results = self.run_tests(step.test_suite)

                if not test_results.all_passed:
                    # Escalate to human
                    self.request_help(step, test_results)
                    return

            self.mark_complete(step)

        return self.verify_complete(task)

The agent:

Executes code
Runs tests automatically
Analyzes failures
Attempts fixes
Re-validates
Escalates if stuck

This is autonomous TDD.

Real-World Example: The Claude Code Workflow

I’m writing this post using Claude Code, an AI coding assistant. Here’s the actual workflow:

My request:

"Create a comprehensive blog post series on feedback loops
in software engineering. Include:
- Main article
- Newsletter version
- 3-part deep dive
- Code examples
- Diagrams"

Claude’s approach (simplified):

1. Create plan
   ✓ User approved plan

2. Research existing blog structure
   - Read existing blog posts
   - Understand frontmatter schema
   - Check writing style
   ✓ Context gathered

3. Write main article
   - Draft content
   - Include code examples
   - Follow style guide
   ✓ Article created

4. Write newsletter version
   - Condense main points
   - Keep core message
   - Add links
   ✓ Newsletter created

5. Write Part 1: TDD
   - Deep dive on TDD concepts
   - Code examples
   - Practical advice
   ✓ Part 1 created

[Currently here]

6. Write Part 2: UI Feedback
7. Write Part 3: AI Agents
8. Create diagrams
9. Create Storybook examples
10. Commit and push

Notice:

Checkpoints after each step
Validation that files are created correctly
Plan visible to me (I can course-correct)
Progress tracking (I know where we are)

This is observable AI execution.

The Plan is the Test Suite

In AI-first development, the plan serves the same purpose as tests in TDD.

TDD:

// Tests define what success looks like
test('user can log in', () => { /* ... */ });
test('invalid password is rejected', () => { /* ... */ });
test('JWT token is generated', () => { /* ... */ });

Plan-driven AI:

Plan:
1. Implement login endpoint
   Success: Endpoint returns JWT for valid credentials
   Success: Endpoint rejects invalid credentials
   Success: All login tests pass

2. Implement authentication middleware
   Success: Protected routes require valid JWT
   Success: Invalid tokens are rejected
   Success: All auth tests pass

Both define objective success criteria.

The Danger of Unchecked Agents

Agents without feedback loops are dangerous:

Problem 1: Cascading Errors

Agent:
- Creates User model (with a subtle bug)
- Creates login endpoint (depends on buggy model)
- Creates registration endpoint (depends on buggy model)
- Creates password reset (depends on buggy model)
- Creates profile endpoint (depends on buggy model)

Result: 5 broken features instead of 1

With feedback: The bug is caught at the model layer. Nothing cascades.

Problem 2: Unfalsifiable Execution

User: "Did you implement the feature correctly?"
Agent: "Yes, it's complete."
User: [Checks manually, finds 3 bugs]

With feedback: The agent proves correctness with passing tests.

Problem 3: Context Drift

Agent starts with:
- Task: Add authentication
- Context: Use JWT

Agent midway through:
- Task: Add authentication (forgets JWT requirement)
- Context: [Drifts to OAuth implementation]

Result: Wrong implementation

With feedback: Plan checkpoints prevent drift.

Measuring Agent Loop Health

You can measure the effectiveness of an agent’s feedback loops:

Metrics:

1. Validation Frequency

Validations per task = Number of checkpoints / Total steps

Good: 5-10 checkpoints per task
Bad: 1 checkpoint (only at the end)

2. Error Detection Speed

Detection time = Steps between error introduction and detection

Good: 1-2 steps (immediate feedback)
Bad: 10+ steps (cascading errors)

3. Fix Success Rate

Fix rate = Successful fixes / Total failures

Good: >80% (agent self-corrects)
Bad: <50% (frequent human intervention)

4. Plan Accuracy

Plan adherence = Completed steps matching plan / Total steps

Good: >90% (stable plan)
Bad: <60% (plan constantly changing)

Practical Patterns: Working with AI Agents

Pattern 1: Plan-Review-Execute

1. User: Request feature
2. Agent: Generate plan
3. User: Review and approve plan ← CHECKPOINT
4. Agent: Execute step 1
5. Agent: Validate step 1 ← CHECKPOINT
6. Agent: Execute step 2
7. Agent: Validate step 2 ← CHECKPOINT
...

Never let an agent execute without plan approval.

Pattern 2: Test-First Agent Execution

User: "Implement user authentication"

Agent:
1. Write tests FIRST (based on spec)
2. Run tests (all should fail)
3. Implement features
4. Run tests (iterate until all pass)
5. Report completion with test evidence

The agent follows TDD principles.

Pattern 3: Incremental Delivery with Validation

Agent:
1. Implement smallest viable piece
2. Get it working completely
3. Validate with tests
4. Commit
5. Move to next piece

vs.

Agent:
1. Implement everything
2. Try to make it all work
3. Debug cascading failures

Incremental delivery = faster feedback.

Pattern 4: Explicit Success Criteria

User: "Add pagination to the API"

Agent: "Success criteria:
- ✓ API accepts page and limit parameters
- ✓ API returns total count
- ✓ API returns paginated results
- ✓ Edge cases handled (page > total, limit = 0)
- ✓ All tests pass
- ✓ Documentation updated

Proceed? (y/n)"

Agent makes success criteria explicit and verifiable.

The Human-in-the-Loop Pattern

Even with self-verification, humans play critical roles:

Role 1: Plan Approval

Agents generate plans, humans validate strategy.

Role 2: Checkpoint Validation

Agents report progress, humans confirm direction.

Role 3: Edge Case Detection

Agents implement happy paths, humans identify edge cases.

Role 4: Architectural Decisions

Agents execute tactics, humans own strategy.

Role 5: Quality Evaluation

Agents verify correctness, humans evaluate quality.

The agent accelerates execution. You maintain control.

Beyond Code: Agent Feedback in Other Domains

The feedback loop pattern applies to any agent task:

Content Generation

Agent:
1. Generate outline
2. Human reviews outline ← CHECKPOINT
3. Generate section 1
4. Validate against style guide
5. Human reviews section 1 ← CHECKPOINT
...

Data Analysis

Agent:
1. Load dataset
2. Validate data quality ← CHECKPOINT
3. Generate summary statistics
4. Check for anomalies ← CHECKPOINT
5. Create visualizations
6. Verify charts render correctly ← CHECKPOINT

Infrastructure Automation

Agent:
1. Generate Terraform config
2. Run terraform plan ← CHECKPOINT
3. Human reviews plan ← CHECKPOINT
4. Apply changes
5. Verify resources created ← CHECKPOINT
6. Run smoke tests ← CHECKPOINT

Every domain benefits from observable, verifiable agent execution.

The Future: Multi-Agent Feedback Systems

The next evolution: agents that verify each other.

Agent 1 (Builder): Generates code
Agent 2 (Tester): Writes tests for the code
Agent 3 (Reviewer): Reviews code quality
Agent 4 (Security): Scans for vulnerabilities
Agent 5 (Orchestrator): Coordinates all agents

Workflow:
1. Builder generates code
2. Tester validates with tests ← CHECKPOINT
3. Reviewer evaluates quality ← CHECKPOINT
4. Security scans for vulnerabilities ← CHECKPOINT
5. Orchestrator decides: ship, refine, or reject

Each agent specializes in one feedback mechanism.

The system has built-in verification at every layer.

The Convergence: Skills That Transfer

If you’ve mastered:

TDD (Part 1): You understand objective verification
UI feedback loops (Part 2): You understand rapid iteration
Code review: You understand quality evaluation

You’re already prepared for working with AI agents.

The skills are identical:

Traditional Skill	AI-First Equivalent
Writing tests	Writing specs for agents
Reading test output	Reading agent execution logs
Debugging failures	Analyzing agent errors
Code review	Reviewing agent-generated code
Refactoring	Guiding agent improvements
Architecture	Designing agent workflows

You’re not learning something new. You’re applying what you know at a different scale.

Organizational Feedback Loops

The principles scale beyond individual developers:

Team Level

Sprint Planning → Daily Standups → Sprint Review → Retrospective
        ↓              ↓                ↓              ↓
     PLAN         CHECKPOINT       CHECKPOINT     CHECKPOINT

Product Level

Product Vision → Feature Specs → Implementation → User Testing
       ↓              ↓               ↓              ↓
    STRATEGY      CHECKPOINT      EXECUTION     VALIDATION

Business Level

OKRs → Quarterly Goals → Weekly Metrics → Quarterly Review
  ↓          ↓                ↓                  ↓
GOAL     MILESTONE        TRACKING          EVALUATION

Feedback loops are fractal. They apply at every level of organization.

AI Accelerates the Loop, Not Just the Code

The fundamental shift with AI:

Before AI:

Slow code generation
Fast validation (tests run in seconds)
Bottleneck: Writing code

With AI:

Fast code generation
Fast validation (still seconds)
Bottleneck: Reviewing code

The bottleneck moved. But the loop is still there.

This means your review skills become more valuable than your typing skills.

Can you:

Spot edge cases quickly?
Evaluate code quality at a glance?
Identify security vulnerabilities?
Assess architectural coherence?
Validate correctness efficiently?

These are the 10x skills in an AI-first world.

Building Review Superpowers

Practical ways to level up review skills:

1. Practice Code Review

Review 10+ PRs per week. Get fast at spotting issues.

2. Read More Code Than You Write

Spend 60% reading, 40% writing. (Or with AI: 80% reading, 20% writing)

3. Learn to Scan, Not Read

Train your eyes to spot patterns:

Inconsistent naming
Missing error handling
Security vulnerabilities
Performance issues

4. Build Mental Models

Understand why code is good or bad. Develop intuition.

5. Use Checklists

Formalize your review process:

✓ Tests pass
✓ Edge cases handled
✓ Security validated
✓ Performance acceptable
✓ Documentation updated

Make review a fast, systematic feedback loop.

The Engineer’s New Role: Architect of Loops

In the AI-first future, your job is:

1. Design Verification Mechanisms

Create systems that prove correctness:

Tests
Type systems
Assertions
Monitoring
Validation rules

2. Create Feedback Checkpoints

Build verification into every step:

Plan review
Incremental validation
Continuous testing
Automated deployment checks

3. Teach Agents to Verify

Encode verification into agent workflows:

Run tests after every change
Validate output at each step
Check success criteria
Escalate on failures

4. Enable Stakeholder Visibility

Give non-engineers access to progress:

Preview deployments
Storybook builds
Test dashboards
Visual diffs

5. Maintain System Coherence

While agents execute tactics, you maintain:

Architectural vision
Code quality standards
Security posture
Performance requirements

You’re the architect of feedback systems.

Practical Advice: Starting with AI Agents Today

If you want to work effectively with AI agents:

Week 1: Learn to Prompt with Specs

Practice writing detailed specifications:

Bad: "Add authentication"

Good: "Add JWT authentication with:
- /register and /login endpoints
- Password hashing with bcrypt
- Token expiration: 24 hours
- Middleware to protect routes
- Tests covering all endpoints and edge cases"

Week 2: Practice Plan Review

Ask agents for plans before execution:

"Before implementing, provide a detailed plan with:
- Steps to execute
- Success criteria for each step
- Tests to validate each step"

Week 3: Implement Checkpoint Validation

Require agents to report after each step:

"After completing each step:
- Show what was implemented
- Show test results
- Wait for my approval before continuing"

Week 4: Build Verification Infrastructure

Add automated checks:

Pre-commit hooks
CI/CD pipelines
Test automation
Code quality gates

The Meta-Lesson: Loops All the Way Down

Here’s the profound realization:

Software engineering has always been about feedback loops.

TDD: Code → Test → Iterate
REPL: Expression → Evaluate → Print → Loop
Debugging: Hypothesis → Test → Observe → Refine
Compilation: Code → Compile → Fix Errors → Repeat
Deployment: Build → Test → Deploy → Monitor
User Research: Ship → Measure → Learn → Iterate
Agile: Plan → Sprint → Review → Retrospect

AI agents are just another loop.

The pattern is universal. Once you see it, you can’t unsee it.

Every effective system has:

Intent (what you want to achieve)
Execution (action taken)
Observation (measuring outcome)
Evaluation (comparing to intent)
Iteration (adjusting based on feedback)

This applies to:

Individual functions
Entire systems
Teams
Organizations
AI agents

Conclusion: The Feedback-First Future

As AI transforms software development, one thing remains constant:

The quality and velocity of engineering are determined by the quality and speed of feedback loops.

The engineers who will thrive:

Understand loops intuitively (from TDD, UI dev, debugging)
Design verification mechanisms (tests, checks, validations)
Build observable systems (logs, metrics, dashboards)
Review faster than they code (scanning, pattern recognition)
Teach agents to verify themselves (specs, tests, checkpoints)

From TDD to hot reload to Storybook to AI agents—it’s all the same pattern:

Expect → Execute → Observe → Evaluate → Iterate

Master this loop, and you’re ready for the future.

Read the rest of the series:

This post was co-created with Claude Code—an AI agent that exemplifies the feedback-driven development it describes. Every section was validated, reviewed, and refined through continuous iteration.