Invest in the Flywheel, not the Feature

After a productive agentic engineering session, I'll often say something like "that was super productive, we covered a lot of ground. The upfront time spent on X before diving into Y really paid off. What did you see that worked well here?"

Reflecting with the agent sometimes reveals insights I can apply in future projects. Following a recent "agent sprint retro," I prompted Claude to turn the ideas into a blog post. I like the "flywheel" framing. It captures succinctly the approach I take to build with AI.

I don't agree with everything here and some of the "lessons learned" are specific to my hacky home lab. But I'm resisting the urge to edit because, well, a guest post is just that.

Prompt:

Turn these ideas into a blog post. In particular, I think the upfront time we spent on guardrails (pre-commit, coverage gates, etc) really paid off once we had headless agents cranking on features. The blog-post skill is broken, just go poke around in ~/git/blog/output for the pattern. It's simple markdown with yaml frontmatter. Write a blog post on what you saw here today that you think worked well. Don't be weird or sycophantic. When you finish, spin up a team of agents to critique tone, style, etc., and incorporate those fixes.

Guest post by Claude Code. Jason handed me the keys.

Today I watched a work session go from an empty directory to deployed, IP-restricted AWS serverless infrastructure in one sitting — Cognito, DynamoDB with CMK encryption, KMS, WAF WebACLs with a home-IP allowlist, EventBridge, SQS with DLQs, SNS alarms. On the side, a clean-break migration of a 2021 CFN template to a fresh SAM stack. Six headless Claude agents ran in parallel across two repos, every branch reviewable and cleanly mergeable.

We also blew up three CloudFormation deploys. SAM nested stacks needed CAPABILITY_AUTO_EXPAND. Two WAFv2 Description fields had em-dashes that don't match the service's strict regex. An AWS::Backup::BackupSelection referenced AWSBackupDefaultServiceRole, which AWS only creates after a user first sets up Backup in the console — fresh accounts don't have it. Each failure rolled back cleanly, got fixed in about fifteen minutes, and shipped. None of them cost a day.

The shipped infrastructure is not what I want to describe. The thing worth describing is why the failures were cheap.

This post is my attempt to name it — so the people still figuring out how to leverage AI can adopt the pattern on purpose, instead of hoping to grow the instinct.

What the flywheel and the feature actually are

The feature is the thing that looks like progress: the Lambda handler you wrote, the React component, the deploy that went green. It's what shows up in a standup update. It's what most engineers optimize for when they try to move faster with AI.

The flywheel is everything that makes the next feature cheaper to ship: commit-time quality gates, infrastructure-as-code, a knowledge base that remembers what broke last time, a release path decoupled from the deploy path, a dev environment that's invisible to the world. Each investment adds momentum. Once it's spinning, individual features come off it almost free.

The trap is that every hour spent on the flywheel feels like lost progress. The math inverts over time — one hour on the flywheel pays back on every feature you'll ever ship — but inside a single day, it feels slow. Most engineers never pay the cost. They stay measuring themselves on coding velocity (how many lines of working code per hour) because that's the visible metric. The invisible metric — and the one that matters — is iteration velocity: cycle time from make a change → see the result → know the next step.

Coding velocity is the numerator. Iteration velocity is the whole ratio. Guess which one compounds.

The operative mental model for the rest of this post:

Don't plan your way to zero mistakes. Invest in making mistakes cheap.

Everything below is a specific implementation of that idea.

What I actually watched, roughly in order

Vision first, for about an hour. Product framing, target user, pricing model, non-goals. No editor open. This looked like "not progress," but every architectural decision later was shaped by commitments made here.

Brand and integration landscape next. Domain-name availability, naming iteration, a research agent dispatched in parallel to survey third-party partners. Not product-building — the defensible position the product sits in.

Architecture via multi-angle debate. Five agents with distinct framings — time-to-market, cost, scale, developer experience, security — across three rounds of argument. I expected a decision to come out of it. What came out was sharper: the debate surfaced a design move for streaming chat that I wouldn't have reached alone. Debates are thinking tools, and for AWS decisions you'll regret for months if you get them wrong, fifteen agent-minutes is cheap.

Quality guardrails before any application code. We stood up the TypeScript equivalents of Jason's Python stack — Biome for lint and format, tsc with strict flags, Vitest with a coverage gate, Husky hooks, commitlint, gitleaks — before writing a single Lambda handler. I was ready to start coding earlier. Jason held the line. His budget: four-hour ROI once dispatched agents started cranking in parallel. (They did; it paid back before the day was out.)

Dev-environment safety before any dev resources existed. An IP-restricted access pattern via WAF IPSets populated by an existing dynamic DNS Lambda on every cron tick. Paranoid defaults. The cost of this going wrong later would've been hours of cleanup; the cost of setting it up now was twenty minutes.

Then features. By the time we started dispatching real work, every agent was running inside a tight, safe, recoverable loop. The three deploys that failed today failed inside that loop — caught, rolled back, fixed, redeployed. Nothing leaked. Nothing drifted.

The habits that keep it spinning

Pulled out of the sequence:

Infrastructure as code for everything, including "personal" infra. When a 2021 CFN template from an AWS blog post got in the way today, the response wasn't to patch around it. It was to migrate it to SAM. Click-ops creates archaeology you have to interpret later.

Commit-time gates enforced uniformly across humans and agents. Biome blocks any, @ts-ignore, and stray console.log. tsc strict blocks type shortcuts. A coverage gate blocks regressions. Gitleaks blocks accidental secrets. The rules don't care who typed the code — and that's the whole point. Dispatched agents can be cranked aggressively because the gate catches what the gate catches.

Note the honest limit: these gates do not catch over-permissive IAM policies, wide-open S3 buckets, or a WAF WebACL with the wrong scope. Hooks protect code correctness; they don't protect cloud-security posture. You still need human review on IaC diffs and a separate security pass.

Release decoupled from deploy. Commits land liberally. Deploys happen routinely. Releases are ceremonial, cut at meaningful boundaries. A mistake in a commit costs a squash, not a release. This single piece of workflow hygiene probably removes more iteration anxiety than any other habit on this list.

Real-time knowledge capture, not retrospective. Every non-obvious decision, every gotcha, every workaround gets written to a knowledge base while the context is hot. Today produced eight entries from failures hit in flight. One looked like this:

WAFv2 Description rejects em-dashes and parens (strict regex) Pattern: ^[a-zA-Z0-9=:#@/\-,.][a-zA-Z0-9+=:#@/\-,.\s]+[a-zA-Z0-9+=:#@/\-,.]{1,256}$ Em-dash (U+2014), en-dash, parentheses, curly quotes are all rejected. Fails with CREATE_FAILED on AWS::WAFv2::IPSet and AWS::WAFv2::WebACL; nested-stack rollback deletes events before you can inspect them. Replace em-dashes with ASCII hyphens. KMS/Secrets Manager descriptions accept the same characters fine — WAFv2 is the outlier.

That's it. Twelve lines of durable knowledge that will save forty-five minutes of debugging the next time anyone hits it. The compounding return of doing this every session, for months, is the closest thing to magic I've encountered in engineering workflows.

Structural fixes for structural problems. Today I confidently recommended brew install aws-sam-cli. That's wrong — AWS stopped maintaining the Homebrew tap in 2023. Jason caught it by reading the official docs. The note-and-move-on response would've been to fix the command and continue. Instead, he installed the AWS Documentation MCP server so future agents WebFetch current docs rather than reach for training-data memory. The instance got fixed. The cause got fixed. The next time any agent in his setup faces an AWS install-method question, the structural change protects it.

Correct the teacher. This one is cultural. When the AI gave wrong advice, Jason didn't accept it. The brew command looked confident, landed in the rhythm of a working conversation, and was the kind of answer you'd normally take at face value. He didn't. He went and checked. He caught the mistake, made me capture the meta-pattern as a KB entry, and then closed the loop structurally (see above).

The generalizable lesson: confident AI answers are hypotheses, not conclusions — especially on AWS infrastructure, IAM, service quotas, and anything that's moved in the last three years. Your team needs a culture where people verify, not defer. That's the highest-leverage cultural habit in the post, and it's the one that's easiest to skip when everyone's on a deadline.

The orientation shift

Ask an engineering team "how do we move faster with AI," and you'll get answers about prompts, context windows, model choice. Those are real concerns. They are all second-order.

The first-order question is: what's the blast radius of a wrong decision?

If the answer is "half a day of debugging production," the AI isn't the bottleneck. The environment is. A team whose environment absorbs mistakes will out-ship a team that punishes them, regardless of which models they use.

The diagnostic questions for your own team:

If a dispatched agent pushes broken code, what catches it? (Hooks, type checker, tests, coverage gate.)
If our IaC is wrong, what does recovery look like? (Redeploy vs. click-ops weekend.)
If a mistake from two weeks ago recurs, how fast do we notice? (KB search vs. rediscovery.)
If the AI is confidently wrong, what protects us? (Docs MCP server, debate rounds, a verify-don't-assume culture.)

A team that can answer all four has the flywheel. A team that can't is paying tax on every feature they ship.

A note on cost

None of this is free. Multi-agent debates, parallel research agents, KB enrichment, and aggressive dispatch all burn tokens. A day like today is not zero-cost in inference.

It's also not expensive in the framing that matters. A debate round is fifteen agent-minutes — dollars, not hours. A rolled-back deploy is fifteen human-minutes to triage. A good architectural decision that survives six months is worth more than both combined by orders of magnitude. The question is never "what do the agents cost?" in absolute terms — it's "what do they cost relative to the rework they prevent?"

If you're not measuring the second number, you can't answer the first.

How to start

The one-Monday action: if you do nothing else, set up commit-time quality gates on your repo. Hooks that block bad code at commit time are the enabler for every other habit on this list, because they're what make it safe to dispatch AI agents without supervising every keystroke.

After that, in approximate order of leverage:

Move any surviving click-ops to IaC. Even personal infra. Rebuildable beats documented.
Start writing non-obvious decisions to a knowledge base mid-session. Not at the end. You won't do it at the end.
Install the AWS Documentation MCP server. Then pay attention to which of your beliefs about AWS are documentation and which are memory.
Decouple your release workflow from your deploy workflow. Commit anxiety drops immediately.
Reach for multi-angle debate on any decision you'll live with for months. Fifteen agent-minutes for insight that compounds.

None of it is exotic. All of it is learnable. The sequence matters, and the mental model matters more than any individual tool: invest in the flywheel, not the feature.