Opus 4.6 shipped in February. That was the turning point — not the launch of Claude Code a year earlier, which was already good. I used it all through 2025 and got a ton done, but I was still steering constantly: hand-holding, really, and reviewing every line like it might bite me. With 4.6, that stopped. The best engineers I know now tell me they haven't written code in three or four months. These are the same people who told me, late last year, that it "still isn't good enough for what I'm doing."
But that's not the interesting part. The interesting part is what happens when someone who doesn't primarily write code picks up the same tool. Specifically: the engineering leaders who never let their technical judgment atrophy.
I want to be precise about who I mean, because "manager" is a slippery word. I do not mean the pure manager — the one who manages from on high, runs the staff meeting, owns the org chart, and couldn't tell you why the last outage happened. That role exists at a lot of big companies, and it's part of the rot. Once in a while one of those people is a genuinely great leader; more often they just learned to play the game. The leaders I'm talking about are the opposite. They stayed close enough to the work to remain dangerous. They got promoted up the chain and never stopped understanding how the systems actually work. They can still read a system design and tell you what's wrong with it.
Those people are getting something different from this tool than many ICs are. The ICs are getting linear improvement — a faster version of how they already work. The technical leaders are getting multiplicative improvement — a way to do the work they already did, but at a scale that wasn't previously available to anyone.
That asymmetry is going to reshape the next decade of engineering careers, and the people in the affected seats deserve to hear about it before the role guidelines catch up.
What the leader is bringing to the tool
The standard frame for "getting good at AI" is "AI fluency." Prompt engineering. Context windows. Tool use. The dominant intervention is training: workshops, courses, internal evangelism. The implicit theory is that AI is a new technology and engineers will be effective at it once they learn how to operate it.
That theory is almost entirely wrong. The skill that separates effective from ineffective use of agentic AI is not prompting. It's something competent engineering leaders have spent a decade or two doing every day — I've been running technical teams for twenty years:
-
Decomposing ambiguous work into well-scoped pieces — taking "we should make checkout better" and turning it into a handful of crisp tasks, each with a clear boundary.
-
Setting acceptance criteria up front — knowing what "done" looks like before anyone starts.
-
Delegating effectively — handing off work with enough context that it comes back right. For a senior leader there's no urge to grab the keyboard; that urge belongs to the IC who just became a manager. Delegation is simply the shape of the job.
-
Inspecting the work, not producing it — and inspection in the broad sense, not line-by-line PR nitpicking. Is the output right? Does the team that produced it have a review process you'd trust? Where's the systemic weakness that's going to manufacture the next bad output? That's systems thinking, not code review.
-
Holding the bigger picture across many parallel work streams — knowing which thread can wait, which is on the critical path, and what depends on what.
These are the skills of running a team. And then a team of teams. It's layers on layers on layers — each one shielding the one above it from detail it shouldn't have to hold. (The Harness Is a Stack and It's Not the Model are both about exactly this: every layer's whole job is to protect the layer above it from toil.)
They are also, exactly and without modification, the skills of running a stack of agents. The agent doesn't care that it isn't human. It needs the same things from you: clear scope, clear acceptance criteria, and the willingness to let go of the work once you've handed it off. And it needs a competent reviewer waiting at the other end — which, if you're operating as a leader, probably isn't you. Your job isn't to be the reviewer. It's to make sure the work gets reviewed, in a loop that runs without you. Reviewing it yourself doesn't scale, and scaling yourself is the whole game.
A leader who has done this for decades walks up to Claude Code and immediately knows how to use it — not because they're "AI fluent" but because they've been operating at this layer of abstraction for their entire second or third decade in the industry.
The IC who has spent years optimizing personal throughput is doing something different and harder: they're trying to also learn how to operate at this layer, while continuing to default to the layer below it. AI fluency training won't fix that. Operating-model training might.
Why the chops matter
It would be tempting to read this as "leaders are better at AI than ICs." That isn't the claim. The claim is narrower, and the qualifier is load-bearing: it's leaders who kept their technical chops.
And by chops I don't mean the in-the-weeds, get-the-syntax-right kind. The model is better at that than you are now. I mean design judgment: what the system should look like, which features actually exist, how they fit together — and the instinct to notice when the model is confidently building against a world that no longer exists.
Here's one from last weekend. Opus 4.7 and 4.8 dropped temperature, top_p, and top_k entirely — set any of them to a non-default value and the API returns a 400. You're meant to steer the model with prompting now. I caught Opus 4.8 writing code that calls Opus 4.8 on Bedrock with temperature sprinkled all over it — the model confidently generating calls that its own API would reject. I caught it in seconds. Not because I'm a faster coder, but because I'd skimmed the Opus 4.8 "what's new" page, and "temperature isn't a thing anymore" is the kind of fact that jumps out and sticks. I made the model go read the page, fix the code, and write the gotcha into its knowledge base so the next agent wouldn't step on the same rake.
That's the whole skill in one anecdote. Skimming a "what's new" page costs five minutes. Knowing that it matters — and knowing to go check current reality instead of trusting the model's training-data priors — is the chops. A leader who kept them reads the model's confident answer, smells what's wrong, and overrules it. A leader who let them atrophy ships the wrong architecture, faster.
This is why the intersection is narrow. The orchestration skill doesn't help if you can't tell good work from bad. The chops don't help if you're still hand-rolling every PR. The combination is what's rare, and the combination is what wins.
Most leaders I work with aren't in that intersection — and the usual diagnosis gets it wrong. A lot of them promoted out of writing code years ago, and that, by itself, is fine. Code is the cheapest input in the stack now; nobody needs you to hand-produce it. The problem isn't the leaders who stopped writing code. It's the ones who stopped understanding systems. That skill still matters, and it's the expensive one to get back once you've let it atrophy.
Career advice the puzzle palace will get to in two years
If you're an IC right now — at any level — you're sitting on a question your manager probably can't answer yet: what should I be building toward over the next five years? It's sharpest for early-career devs, because the skills that carried the generation above them up the ladder aren't the skills this moment rewards — not all of them, and not only them. The role guidelines will catch up eventually. Probably two years too late, after the labor market has already repriced the answer.
Here's the advice you deserve right now, not later.
Get out of your IDE. I haven't opened an IDE in months. It's built for an operator I'm no longer being — a human typing into a buffer, finding the file, reading the code, navigating to the next one. That loop isn't my bottleneck anymore. Deciding what to build, decomposing it, dispatching it, and inspecting what comes back is — and the IDE helps with none of that. As long as you live in the editor, you'll keep defaulting to the layer of work the editor optimizes: the layer that no longer matters.
Build the thing that decomposes — don't be the thing that decomposes. Decomposition is still the single most valuable skill in the stack, and it can't be faked. Hand-writing the specs yourself is a machine's job, and most people have figured that out. The subtler trap is the one the tools are walking people into. AWS shipped Kiro IDE last fall with spec-driven development built in: an agent generates a requirements doc, then a design doc, then a task list, with a human sign-off at each step. It's a real improvement over vibe-coding. It's also already too slow — because it gates every spec on a human reviewing it.
That bottleneck is about to get brutal. Anthropic says Mythos-class models are landing in the coming weeks. When systems take shape that fast, your mental model of the system can't expand quickly enough to review the spec that defines it. Writing the spec by hand is wrong — but reviewing every spec by hand is the next bottleneck. You'd be gating the whole factory on how fast one human can build understanding.
So take the human off the critical path of spec review — not by skipping review, but by building the review into the mechanism. Mine is an adversarial grooming workflow: a drafter writes a spec, two critics try to tear it apart against the actual code, and a synthesizer reconciles them, so every spec that reaches a build agent has already survived a fight. It produces better specs than I can — for systems evolving faster than I can build my own picture of them. I write intent; the factory writes the specs and vets them.
Here's the part that actually matters. Dynamic workflows shipped in Claude Code, and a few days later I told it to build me that grooming workflow. I knew to ask for it. The agent didn't offer — it can't see the shape of the factory I'm building. That instinct — knowing what to construct at the edge of what the tools can already do — is the skill. And when a new pattern emerges, a failure I hit twice or a lesson that cost me an hour, I don't just remember it. I extract it and wire it into the infrastructure: a rule in the grooming workflow, a feature in Agent GTD, a new commit-time quality gate. This isn't (just) about smarter instructions in a CLAUDE.md. It's about codifying every hard-won lesson into the machine that builds, so the factory enforces it from then on. (Invest in the Flywheel, not the Feature is the long version: when the output is bad, you improve the factory, not just the code.)
Design a review process that doesn't depend on you. Yesterday Anthropic published When AI builds itself and reported that Claude wrote more than 80% of the code merged into its own codebase in May — and that human review has become the bottleneck, because people can't read code as fast as the model writes it. That's the whole future in one data point. If shipping depends on a human reading every diff, that human is the ceiling.
The way out isn't reviewing faster. It's trusting a system of signals instead of your own eyes on every line. Green linters, ninety-nine-percent test coverage all passing, integration tests green, the app doing what I expect when I poke it — that combination is a powerful indicator the code is probably right. Stack adversarial critics and agents reviewing agents on top, and reserve your own judgment for the few calls that genuinely need a human. Could a bug slip through? Sure. Engineers have shipped bugs for as long as there's been software, and most of them aren't fatal. The risk nobody prices correctly is the other one: moving slow while your competitors move fast. Right now I see a lot of good engineers stuck in the fast lane going slow — hand-reviewing every plan, fact-checking every research result — because they've decided that shipping "bad code" is the worst outcome. In this moment, it isn't.
Find the edge of the harness and build past it. A year ago, the advice I gave people was: use the tools, build intuition for what's now possible. That advice is stale. If you're still in the intuition-building phase, you're behind — and in more trouble than you realize. The work now is to manage teams of agents and to build past the end of the harness: notice where the tools stop, and construct the missing piece yourself, the way I did with the grooming workflow. Keep using the tools — use them like they're going out of style, because pushing them to their limit is how you find the edge.
The next decade
The next decade's most effective engineers will look a lot more like player-coaches than ICs — the ones who write a little, inspect a lot, decompose constantly, and run a fleet of agents with the same instincts a good leader runs an organization. And it won't stay a flat fleet for long. It'll be a stack: agents dispatching agents dispatching agents, all the way down, with the human operating only at the top, spending judgment on the decisions that can't be pushed any lower. The seat you're growing into isn't the senior-IC chair as currently defined. It's a seat the org chart hasn't drawn yet.
You can wait for the puzzle palace to publish new role guidelines. They'll get there — two years late, when the advantage is already priced in and gone. Don't. The skills that got the people above you promoted only partly transfer, and the seat you want isn't one anyone hands you with a title. You build it yourself, wherever you already are, by making the output impossible to ignore — until you're the person who defines it.
The seat is open.