Using AI to write better code more slowly

A lot of people seem convinced that the point of AI coding is to write low-quality code as fast as possible. Spew out barely-passable slop, open massive PRs, and merge them unvetted. Ship it!

But the thing is, LLMs are very flexible. And you can use them just as effectively to write high-quality code more slowly.

This statement seems completely obvious to me at this point, and I almost didn’t want to write this post for that reason. But there seem to be enough people convinced that LLMs are only good as slop cannons that it’s worth making the opposite case.

If Mythos taught us anything, it’s that LLM agents are really good at finding bugs. Throw them at a codebase enough times, and they will find so many bugs that you’ll barely know what to do with them.

Like many others, I’ve also found this is true of non-Mythos models – some may be better than others at finding subtle bugs or avoiding false positives, but the fact is that the latest public models from Anthropic and OpenAI are good enough to find plenty of bugs in an unscrutinized codebase.

The problem is not so much finding the bugs, but instead prioritizing and validating them. For this reason I have a Claude skill I adapted from this article‘s core insight, which is that the more, different models you throw at a PR review, the less likely you are to get hallucinations or bogus bugs.

The skill says (paraphrasing):

Run a Claude sub-agent, Codex, and Cursor Bugbot to find bugs in this PR ranked by critical/high/medium/low. Once they’re all done, review their findings, do your own research to rule out false positives, and write a final report.

That’s basically it. You can add your own definition of “bug” if you want – mine has stipulations about the KISS and DRY principles, writing accessible HTML/JSX, using proper indexes for SQL queries, etc.

In my experience, this skill always finds tons of bugs in a PR, and the false positive rate is near zero. It finds so many bugs that you’ll be bored senseless if you try to tackle them all. They’ll range from critical security or correctness bugs to the more mundane medium-level perf bugs to low-level “this comment is misleading”-type bugs.

My typical workflow is:

  • Have an agent fix all the criticals and highs (with my guidance on the proper solution), then repeat until no criticals/highs
  • Skip highs/mediums where the juice isn’t worth the squeeze (e.g. 100 lines of code to fix a narrow edge case)
  • Abandon the PR if it has so many criticals that I realize the whole approach is misguided

When I use this technique, I haven’t necessarily seen my velocity go up. If anything, the review process often finds pre-existing bugs, so I end up on a tangential side-quest where I’m writing unit tests and fixing subtle flaws that pre-date the PR. This is the opposite of the “10x productivity” slop-cannon style of development that most people imagine when they think of vibe coding, but I find it very satisfying.

It’s a great way to improve the overall health of the codebase while also teaching you about the odd corners of it. In my experience, the happy-path of a complex architecture is less interesting than its failure modes. And pre-LLMs, this is usually how I got familiar with a codebase anyway: understanding where the assumptions break down, and then getting my hands dirty to fix it.

If you’re the kind of person who is skeptical that AI coding is good for anything, then I doubt this post will persuade you. But if you’re the kind of developer who uses agents to write multi-hundred-line PRs that you barely understand yourself, I’d invite you to slow down a bit and try this other, slower style of “vibe coding.” Ask an agent how your PR works and how it might fail. Have it write Markdown docs with Mermaid charts if necessary. Use Matt Pocock’s /grill-me skill until you understand the entire PR front-to-back.

You might not be more “productive” in terms of raw lines of code. You might burn a ton of tokens just to find out that your entire plan was wrongheaded from the start. But I find this style of coding to be a more super-powered version of the kind of programming I was already trying to do before LLMs: careful, methodical, quality-obsessed, focused on making things better for the next coder.

So take a deep breath, slow down, try this technique, and see if you don’t enjoy writing better code more slowly.

20 responses to this post.

  1. heckj's avatar

    I’ve found the same technique — doing multiple sweeps — super effective for all kinds of review; I use the same for editorial review of grammar, punctuation, spelling, and so on. One thing I’ve realized is that wiping the context *between* sweeps also helps. And I’ve started to switch up my code reviews to “5-7 different lens” running in parallel — looking for different kinds of issues — and then collating the results and loosely ranking them.

    Reply

    • Nolan Lawson's avatar

      You’re right, clearing context really seems to help. That’s one of the reasons my reviewer skill specifies that the main agent shouldn’t do original research until all 3 sub-agents have returned – otherwise there’s a tendency to be influenced by the first result.

      I haven’t tried splitting up reviewers into different archetypes, but maybe it helps when you have a PR that spans multiple domains (frontend, backend, infra, etc.).

      Reply

    • Ashah's avatar

      Posted by Ashah on May 26, 2026 at 1:15 AM

      would you be able to share your skill?

      Reply

      • Nolan Lawson's avatar

        Sure thing, here is the skill. I lightly edited it since it contained some specifics of my particular codebase. Note that you’ll need gh installed, and also that I use Claude with Opus 4.7 on xhigh thinking and Codex with GPT 5.5 on high thinking. (I’m happy to wait 20 minutes for a better review!) You’ll probably want to tweak it for your particular codebase or the kinds of bugs you want it to find.

  2. Spencer Karenbauer's avatar

    Posted by Spencer Karenbauer on May 25, 2026 at 7:03 PM

    I agree in a way. I think the idea now more so with vibe coding is individuals do not know how to write code properly and take all of these advanced AI tools like Claude/Cursor/etc as the end all-be all of solutions. They are great at baselines and can work through. But they should NOT be used as stand-alone tools. Enablement and governance need to be occuring simultaneously prior to implementing things like this in production.

    Reply

    • Nolan Lawson's avatar

      Right, the way I think of it is that an LLM’s output is just the first draft. The real work starts with the code review. And there is a lot of scaffolding/documentation you can put in place to make this process way more effective.

      Reply

  3. Giuseppe's avatar

    Posted by Giuseppe on May 26, 2026 at 12:30 AM

    Can you explain how you run multiple models in only one prompt and how to handle the differents output?

    Reply

  4. Ollie's avatar

    This matches my experience closely. I’ve been building a voting app with Next.js and Supabase, and the most valuable thing an agent did wasn’t write features — it was flag that my RLS policies had a gap I hadn’t considered. I would have shipped that. Fixing it took a few hours and sent me down a rabbit hole on Postgres row-level security I didn’t expect to go down that week.Not a productivity win by any typical metric. But now I actually understand that part of the stack. The “pre-existing bugs as a side quest” description is exactly right — and honestly it’s more satisfying than the feature work.

    Reply

  5. Graham Wheeler's avatar

    in my team we built an adversarial code review tool with multiple personas each doing reviews (e.g. architect, test engineer, compliance PM,…) then a synthesizer to collate the results. And that tool goes back and forth with the “fixer” agent until “everyone” agrees the PR is good, at which point a human looks at it. It works well, but definitely takes time and burns a lot of tokens.

    So similar to what you are doing but with multiple personas versus multiple models.

    Reply

  6. Marcelo Lima's avatar

    Posted by Marcelo Lima on May 26, 2026 at 9:11 AM

    Estamos chamando de modelos abertos os modelos proprietários de aluguel quase livre ? Pensei que aberto era aberto, reprodutível era reprodutível , fechado era fechado e exclusivo era exclusivo.

    Minha visão é que: sim, subterfúgios podem ser utilizados para TENTAR revisar e melhorar o codigo. Mas isso não resolve o problema de janela de contexto para programas complexos. É so um truque que como outros milhares, são absorvidos pelas companhias detentoras de agentes de codificação, que recebem os prompts, as aceitações e pós processam verificando quais truques serao incorporados e QUAIS NÃO VALEM A PENA pois saberiam que os clientes reclamariam do alto custo.

    Reply

  7. Ashton Antony's avatar

    Posted by Ashton Antony on May 26, 2026 at 9:51 AM

    The “pre-existing bugs as a side quest” framing really clicks for me. I’d add that this workflow has an underrated onboarding use case. I’ve used it on unfamiliar codebases and it’s one of the fastest ways to build a mental model of where the bodies are buried. Better than reading docs, honestly.

    The one thing I’d push back on slightly: this approach still requires enough domain knowledge to triage what the agents surface. The false positive rate might be near zero for an experienced dev, but a junior who can’t distinguish a real race condition from a theoretical one is still going to get overwhelmed. The agents find the bugs; you still need to understand them.

    Reply

    • Nolan Lawson's avatar

      That’s a good point. Sometimes the bugs it finds are along the lines of: “if a future author adds a new enum here…” or “if this job happens to run before this other job…”, and this is either very unlikely (just put a comment on the enum warning people!) or impossible (Job B can’t run before Job A). But even in those cases it’s a code smell, so often worth a comment at least.

      Reply

  8. jrrembert's avatar

    Posted by jrrembert on May 26, 2026 at 10:19 AM

    Great to see you come over to the dark side (re: AI coding)!Only thing I would add is to occasionally consider reviewer archetypes in terms of both their seniority level and “primary” role. I find myself increasingly reviewing code created by product managers, designers, and other traditionally low/no-code roles (including lawyers and marketers). It’s often useful for me to review the code using a senior lens, but create the bug/explanation in language that they can both understand, and encourage them to learn.A verbatim example I used last week to explain Single Responsibility Principle:”Looks pretty good. You got it working, which is the hard part. One issue: this function is doing too many things which may make it easier to change later. It’s kind of like mixing campaign strategy, copywriting, and reporting into one giant spreadsheet.

    There’s a software idea called SOLID that gives names to this kind of thing (Single Responsibility Principle).

    Next time, ask the AI: ‘Can you refactor this so each function or file has one clear responsibility?’”

    Reply

  9. Unknown's avatar

    […] Using AI to write better code more slowly. I see a lot of folks writing about their angst and loss of joy because of what AI can now do. I […]

    Reply

  10. handsome tong's avatar

    I’ve been using Claude on a side project and your description of drowning in bug reports is spot-on—Mythos might just be the sanity check I need. It’s easy to fall into the slop cannon mentality, so thanks for making the case for going slow and high-quality.

    Reply

  11. Marek R's avatar

    Posted by Marek R on May 29, 2026 at 11:55 PM

    Good article. Totally agree you wrote.

    Reply

  12. Unknown's avatar

    Posted by Konstantin Mihaltsov on May 31, 2026 at 8:19 AM

    The only thing Mythos taught me is how much of marketing in the tech right now.

    https://www.flyingpenguin.com/the-boy-that-cried-mythos-verification-is-collapsing-trust-in-anthropic/

    Reply

  13. Will's avatar

    Posted by Will on June 1, 2026 at 8:27 AM

    One of the hopes of the early internet was that, because anyone could get news from anywhere, everyone would read more different views, and become better, more worldly people. And there were certainly people who did use the internet to that effect, but it ultimately became a system where the easiest, most enticing way to use it was actually contrary to our own best interests. The internet didn’t change people’s behavior, it just amplified it.

    Similarly, while AI can be used in the way that you describe, that takes discipline. The system, by default, makes it tantalizingly easy to ship unreviewed slop. AI doesn’t change people’s behavior, it just amplifies it. You said that your way of working seems too obvious to write about, but I don’t think that it is. You’re just a massive dork. (Which is really all I wanted to say.) 😘

    Reply

  14. Dmitry Jum's avatar

    This is exactly how I work, thank you for this article. I’ve been working solo for myself lately and it’s hard t compare myself to, although I’m trying to watch and read of what’s happening out there as much as I can.
    But I see they yell from out of every corner: AI generates slop! Senior engineers will have to fix all that! Or: “I build 10 features in parallel in 1 hour and ship an MVP in a weekend that makes me 50k MRR”

    I generally discuss a feature idea, then I break it in a couple of chats to discuss an architecture and user flow or UI. Then another chat plans out a list of tasks based on the conclusion of the previous 2-3 conversations. We come up with a style guide and UI design prompts and I create mockups with Stitch and GPT images.
    Then I go task by task with Codex or Claude, or both. And when Codex builds a task, I review it and leave a bunch of comments and ask how things work if I don’t understand them when I read it and ask why. I have Codex Github reviewer review. I ask Claude Code review the changes as well and if I have a Coderabbit subscription, I have that service review the code as well. And then I have Codex or Claude comment or address that dozen of comments (or more) that have accumulated from 2-3 AI reviewers and me. It may take a few iterations to achieve the final clean product out of the task we’ve been working on. But by the end of it multiple edge cases are revealed and covered and I actually understand and agree with 90% of AI written code and can describe it and advocate for it to another person who reads it or asks me about.

    My imposter syndrome bashes me for still being slow, but I’ve released a couple of application products to a couple of clients lately that have’t had a single bug (granted they aren’t that big, but still) for months since their release (and not yet) with my approach.

    I’ve been hearing that junior and mid level devs have AI build and AI review the code and they ship without even looking or reading it and deliver exponentially more and faster than me. But if that had been the right approach, I probably wouldn’t have been correcting and leading the state of the art models to different patterns or solutions, or correcting architectural flaws, that would only reveal themselves at scale down the road.

    I love working with AI, the development is still tedious, slow and difficult, but I’m building bigger and more complex, more reliable and better looking things on my own than ever before.

    Reply

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.