How I Built This Blog with Claude Code and a Team of Arguing Agents

I built this blog in an afternoon. Not by writing code — by describing what I wanted, letting three AI agents argue about it, and then having Claude Code build the winner.

The interesting part wasn't the code. It was the debate.

The prompt

I started with a loose idea: markdown files with YAML frontmatter, a build command that does a smart rebuild of only changed content, static HTML hosted in S3. Clean and minimal — just words on a page.

I also had some ideas that felt clever at the time. Incremental builds with SHA256 manifest tracking. Client-side search powered by WASM SQLite with FTS5. A tag cloud. Breadcrumbs.

I typed this into Claude Code along with an unusual request: don't just build it. Spin up a team of agents — a UX designer, a tech stack architect, and a skeptic — and run a three-round debate. I've found this produces better results than a single agent optimizing in isolation. The agents push back on each other, and the ideas that survive are genuinely stronger.

This project will be a very simple, clean themed, blog of my thoughts. Here's the high-level idea:

  • I'll save markdown files with YAML frontmatter into ./posts
  • I'll run a build command
  • That build command will do a "smart rebuild" of the static HTML content for the site.
    • Only build the changed content (new posts, static nav, tag cloud, etc)
  • I'll run a publish command that pushes the content to the "site"
  • The blog is entirely static content, so it will be hosted in S3

Given this background, spin up a team of agents to research best tech solution and propose a path forward. We need:

  • UX designer. I want a super clean, minimalist design. The site is pretty much just words. No pictures, no snaz.
  • Tech stack designer. Think about how we dynamically create a nav, site breadcrumbs, on site search (WASM SQLite FTS5??), etc.
  • Skeptic. Point out where we're over thinking things and drive for absolute simplicity.

Run a 3-round debate, I feel like we get the best ideas this way. Use your discretion for how to structure the debate.

The debate

Claude Code launched three agents in parallel. Each one got the same brief but a different lens.

In Round 1, they worked independently. The UX designer proposed a reading-first philosophy: system fonts, 65ch line width, near-black on off-white, dark mode via prefers-color-scheme. No sidebar, no hero images, no share buttons. The tech stack architect laid out a thorough plan: markdown-it-py for rendering, Jinja2 for templates, Pygments for syntax highlighting, a SHA256 manifest for incremental builds, Pagefind for search, click for the CLI. Comprehensive. Professional. The skeptic came in swinging.

In Round 2, each agent received the other two agents' Round 1 output and responded directly. This is where it got interesting.

The skeptic wrings it out

The skeptic's core argument against my incremental build idea was painfully simple: how many posts will this blog realistically have? A hundred? Two hundred? Over years? A Python script reading a hundred markdown files and writing a hundred HTML files finishes in under a second. Probably under 200 milliseconds.

Meanwhile, dependency tracking — knowing which posts changed, invalidating derived pages like tag indexes and the home page, tracking template changes — is a mini build system. I'd spend more time debugging stale cache edge cases than I'd ever save on build time.

The tech stack architect, to their credit, conceded immediately: "The Skeptic is right that a full rebuild of hundreds of posts is probably under two seconds. I was optimizing for a problem that doesn't exist yet."

This is textbook YAGNI — "You Aren't Gonna Need It," a principle from Extreme Programming coined by Ron Jeffries — and I couldn't see it from the inside. I was so focused on doing the build "right" that I'd skipped the step where I checked whether it mattered.

The skeptic applied the same razor everywhere. WASM SQLite for search? "Wildly over-engineered for a personal blog." Tag clouds? "Fashionable in 2006." Breadcrumbs? "For hierarchical content — where would they even go? Home > Post? That's redundant." Click for a two-command CLI? "Two functions do not constitute a CLI that needs a framework."

The one place the skeptic reversed course was Jinja2. They'd initially argued for f-strings — it's just string interpolation, right? But the tech stack architect pointed out that post titles with quotes, ampersands, and angle brackets need escaping, and doing that by hand in f-strings is exactly where subtle XSS bugs live. Jinja2's autoescape is the right tool. The skeptic conceded, specifically and only on those grounds.

Why the debate worked

A single agent will optimize for the goal you give it. If you say "design a build pipeline," you get a thorough build pipeline. If you say "pick a search solution," you get a well-reasoned search solution. The problem is that nobody's asking whether you need a build pipeline or search at all.

The multi-agent debate forces that question. The skeptic's job isn't to be right about everything — it's to make the other agents justify their complexity. When the tech stack architect had to defend incremental builds against "just rebuild everything, it takes 200ms," the defense collapsed. When they had to defend Jinja2 against "just use f-strings," the defense held — because security is a real concern, not a hypothetical one.

Three rounds was the right number. Round 1 gets the ideas on the table. Round 2 is the real fight — agents respond to each other, positions shift, weak ideas die. Round 3 converges. By the end, all three agents had aligned on a plan that was simpler than any of them would have produced alone.

The build

With a converged plan in hand, I had Claude Code implement the whole thing. Five dependencies. A markdown renderer, a build pipeline, Jinja2 templates with autoescape, about 200 lines of CSS, and a CLI that fits in one file. Full rebuild always. No search. No tag cloud. No breadcrumbs.

The entire generator is under 200 lines of Python. Twenty-three tests, 96% coverage, mypy clean. blog build runs in a fraction of a second.

If I ever have enough posts to need search, I'll add Pagefind. If full rebuilds ever take more than five seconds, I'll add incremental builds. But I won't solve those problems today, because today they don't exist. The skeptic made sure of that.