How I use AI agents to write code

Yes, this is the umpteenth article about AI and coding that you’ve seen this year. Welcome to 2025.

Some people really find LLMs distasteful, and if that’s you, then I would recommend that you skip this post. I’ve heard all the arguments, and I’m not convinced anymore.

I used to be a fairly hard-line anti-AI zealot, but with the release of things like Claude Code, OpenAI Codex, Gemini CLI, etc., I just can’t stand athwart history and yell “Stop!” anymore. I’ve seen my colleagues make too much productive use of this technology to dismiss it as a fad or mirage. It writes code better than I can a lot of the time, and that’s saying something because I’ve been doing this for 20 years and I have a lot of grumpy, graybeard opinions about code quality and correctness.

But you have to know how to use AI agents correctly! Otherwise, they’re kind of like a finely-honed kitchen knife attached to a chainsaw: if you don’t know how to wield it properly, you’re gonna hurt yourself.

Basic setup

I use Claude Code. Mostly because I’m too lazy to explore all the other options. I have colleagues who swear by Gemini or Codex or open-source tools or whatever, but for me Claude is good enough.

First off, you need a good CLAUDE.md (or AGENTS.md). Preferably one for the project you’re working in (the lay of the land, overall project architecture, gotchas, etc.) and one for yourself (your local environment and coding quirks).

This seems like a skippable step, but it really isn’t. Think about your first few months at a new job – you don’t know anything about how the code works, you don’t know the overall vision or design, so you’re just fumbling around the code and breaking things left and right. Ideally you need someone from the old guard, who really knows the codebase’s dirty little secrets, to write a good CLAUDE.md that explains the overall structure, which parts are stable, which parts are still under development, which parts have dragons, etc. Otherwise the LLM is just coming in fresh to the project every time and it’s going to wreak havoc.

As for your own personal CLAUDE.md (i.e. in ~/.claude), this should just be for your own coding quirks. For example, I like the variable name _ in map() or filter() functions. It’s like my calling card; I just can’t do without it.

Overall strategy

I’ve wasted a lot of time on LLMs. A lot of time. They are every bit as dumb as their critics claim. They will happily lead you down the garden path and tell you “Great insight!” until you slowly realize that they’ve built a monstrosity that barely works. I can see why some people try them out and then abandon them forever in disgust.

There are a few ways you can make them more useful, though:

  1. Give them a feedback loop, usually through automated tests. Automated tests are a good way for the agent to go from “I’ve fixed the problem!” to “Oh wait, no I didn’t…” and actually circle in on a working solution.
  2. Use the “plan mode” for more complicated tasks. Just getting the agent to “think” about what it’s doing before it executes is useful for something simpler than a pure refactor or other rote task.

For example, one time I asked an agent to implement a performance improvement to a SQL query. It immediately said “I’ve found a solution!” Then I told it to write a benchmark and use a SQL EXPLAIN, and it immediately realized that it actually made things slower. So the next step was to try 3 different variants of the solution, testing each against the benchmark, and only then deciding on the way forward. This is eerily similar to my own experience writing performance optimizations – the biggest danger is being seduced by your own “clever” solution without actually rigorously benchmarking it.

This is why I’ve found that coding agents are (currently) not very good at doing UI. You end up using something like the Playwright or Chrome DevTools MCP/skill, and this either slurps up way too many tokens, or it just slows things down considerably because the agent has to inspect the DOM (tokens galore) or write a Playwright script and take a screenshot to inspect it (slooooooow). I’ve watched Claude fumble over closing a modal dialog too often to have patience for this. It’s only worthwhile if you’re willing to let the agent run over your lunch break or something.

The AI made a mistake? Add more AI

This one should be obvious but it’s surprisingly not. AIs tend to make singular, characteristic mistakes:

  1. Removing useful comments from previous developers – “this is a dumb hack that we plan to remove in version X” either gets deleted or becomes some Very Official Sounding Comment that obscures the original meaning.
  2. Duplicating code. Duplicating code. I don’t know why agents love duplicating code so much, but they do. It’s like they’ve never heard of the DRY principle.
  3. Making subtle “fixes” when refactoring code that actually break the original intent. (E.g. “I’ll just put an extra null check in here!”)

Luckily, there’s a pretty easy solution to this: you shut down Claude Code, start a brand-new session, and tell the agent “Hey, diff against origin/main. This is supposed to be a pure refactor. Is it really though? Check for functional bugs.” Inevitably, the agent will find some errors.

This seems to work better if you don’t tell the agent that the code is yours (presumably because it would just try to flatter you about how brilliant your code is). So you can lie and say you’re reviewing a colleague’s PR or something if you want.

After this “code review” agent runs, you can literally just shut down Claude Code and run the exact same prompt again. Run it a few times until you’re sure that all the bugs have been shaken out. This is shockingly effective.

Get extra work done while you sleep

One of the most addictive things about Claude Code is that, when I sign off from work. I can have it iterate on some problem while I’m off drinking a beer, enjoying time with my family, or hunkering down for a snooze. It doesn’t get tired, it doesn’t take holidays, and it doesn’t get annoyed at trying 10 different solutions to the same problem.

In a sense then, it’s like my virtual Jekyll-and-Hyde doppelganger, because it’s getting work done that I never would have done otherwise. Sometimes the work is a dud – I’ll wake up and realize that the LLM got off on some weird tangent that didn’t solve the real problem, so I’ll git reset --hard and start from scratch. (Often I’ll use my own human brain for this stuff, since this situation is a good hint that it’s not the right job for an LLM.)

I’ve found that the biggest limiting factor in these cases is not the LLM itself, but rather just that Claude Code asks for permission on every little thing, to where I’ve developed an automation blindness where I just skim the command and type “yes.” This scares me, so I’ve started experimenting with running Claude Code in a Podman container in yolo mode. Due to the lethal trifecta, though, I’m currently only comfortable doing this with side projects where I don’t care if my entire codebase gets sent to the dark web (or whatever it is misbehaving agents might do).

This unfortunately leads to a situation where the agent invades my off-work hours, and I’m tempted to periodically check on its progress and either approve it or point it in another direction. But this becomes more a problem of work-life balance than of human-agent interaction – I should probably just accept that I should enjoy my hobbies rather than supervising a finicky agent round-the-clock!

Conclusion

I still kind of hate AI agents and feel ambivalent toward them. But they work. When I read anti-AI diatribes nowadays, my eyes tend to glaze over and I think of the quote from Galileo: “And yet, it moves.” All your arguments make a lot of sense, they resonate with me a lot, and yet, the technology works. I write an insane amount of code these days in a very short number of hours, and this would have been impossible before LLMs.

I don’t use LLMs for everything. I’ve learned through bitter experience that they are just not very good at subtle, novel, or nebulous projects that touch a lot of disparate parts of the code. For that, I will just push Claude to the side and write everything myself like a Neanderthal. But those cases are becoming fewer and further between, and I find myself spending a lot of time writing specs, reviewing code, or having AIs write code to review other AIs’ code (like some bizarre sorcerer’s apprentice policing another sorcerer’s apprentice).

In some ways, I compare my new role to that of a software architect: the best architects I know still get their hands dirty sometimes and write code themselves, if for no other reason than to remember the ground truth of the grunts in the trenches. But they’re still mostly writing design documents and specs.

I also don’t use AI for my open-source work, because it just feels… ick. The code is “mine” in some sense, but ultimately, I don’t feel true ownership over it, because I didn’t write it. So it would feel weird to put my name on it and blast it out on the internet to share with others. I’m sure I’m swimming against the tide on this one, though.

If I could go back in time and make it so LLMs were never a thing… I might still do it. I really had a lot more fun writing all the code myself, although I am having a different sort of fun now, so I can’t completely disavow it.

I’m reminded of game design – if you create a mechanic that’s boring, but which players can exploit to consistently win the game (e.g. hopping on turtle shells for infinite 1-Ups), then they’ll choose that strategy, even if they end up hating the game and having less fun. LLMs are kind of like that – they’re the obvious optimal strategy, and although they’re less fun, I’ll keep choosing them.

Anyway, I may make a few enemies with this post, but I’ve long accepted that what I write on the internet will usually attract some haters. Meanwhile I think the vast majority of developers have made their peace with AI and are just moving on. For better or worse, I’m one of them.

9 responses to this post.

  1. Ralph Haygood's avatar

    Posted by Ralph Haygood on December 23, 2025 at 3:22 AM

    “I also don’t use AI for my open-source work, because it just feels … ick. The code is ‘mine’ in some sense, but ultimately, I don’t feel true ownership over it, because I didn’t write it … If I could go back in time and make it so LLMs were never a thing … I might still do it.”

    That isn’t exactly a ringing endorsement of LLMs for coding. To be blunt, it seems like you’re making a virtue of necessity. Maybe not, but that’s how it strikes me.

    As a physics student, I started programming in FORTRAN. In the years since, I’ve adopted many technical innovations – object-oriented-ness, dynamic typing, multithreading, and many others. (Similarly, in my work as an evolutionary geneticist, I’ve adopted many statistical innovations.) There isn’t a single one of which I’d say, “If I could go back in time and make it so X was never a thing … I might still do it.” There also isn’t a single one that required a hard “sell”; their benefits were evident and compelling to me.

    Over the same period, there have been technical innovations I didn’t embrace, such as XHTML, SOAP, and microformats. It seems to me subsequent developments have affirmed my choices.

    If you write more about this subject, I’d suggest saying more about the character of your job. It sounds like maybe you receive “design documents and specs” that you’re supposed to turn into code. That kind of job is probably more amenable to automation than what many people called “programmers” spend much of their time doing. Indeed, I suspect the diversity of what people called “programmers” actually do contributes significantly to the confusion and rancor around LLMs for coding.

    I’m a freelance software developer. I design, build, and maintain web applications for my clients. Naturally, I don’t write all or even most of the code myself. I depend on services such as Nginx and PostgreSQL and libraries such as Rails and Vue. I don’t treat them as black boxes, but I do depend on clean boundaries between them and my own code. At times in the past, I’ve worked alongside other programmers (e.g., I was one of the people who created SICStus Prolog), and I depended on clean boundaries between what they and I were doing. Those boundaries shifted over time, but only in a careful, explicit manner.

    Much of my resistance to LLMs for coding comes down to this: it seems impractical to maintain clean boundaries between code generated by an LLM and code written by me within the same codebase. Without such boundaries, I expect my understanding of the codebase would slowly sublime, which would make maintaining and extending the codebase more difficult, eventually much more difficult.

    To put the matter metaphorically, working in one of my codebases feels something like living in my apartment. I know where things are in my apartment, including the organizing logic of each drawer, cupboard, and closet. Turning an LLM loose in one of my codebases would seem akin to inviting an energetic stranger to have at my apartment – rearrange the furniture, reorganize the drawers, cupboards, and closets, etc. Even if the new arrangements made sense to the stranger, they wouldn’t necessarily make sense to me, and in any case, the cognitive labor of adjusting to them would be significant.

    Because for all the talk about “best practices”, practically every aspect of software development, from high-level architecture to low-level conventions, is legitimately subject to disagreement or, to use an increasingly unfashionable term, creativity; different people, equally competent, may well make different choices. In the absence of clean boundaries between their responsibilities, the result is apt to be chaos.

    I’m not a “hater” or inclined to be argumentative, so I probably won’t comment here again. In conclusion, thanks for your work here. Over the years, I’ve learned a number of useful things from your posts.

    Reply

    • Nolan Lawson's avatar

      Thanks for the thoughtful comment. I agree, my feelings are mixed – hence “AI ambivalence” which still captures a lot of my feelings. Basically I feel the technology works, but it’s kind of insulting since it trivializes so much of what I spent my career trying to become adept at.

      The “rearranging the furniture” metaphor is spot-on. I think if you turn LLMs totally loose on a codebase, don’t read their output, don’t force them to refactor, etc., then yes you will quickly become lost and forget what the code does. However, in a sense this is what it’s like working on a big multi-person project anyway – there are constantly other people working on the code, and you wake up one day and something is architected totally differently. There are a few ways to adapt to this:

      • Have the LLM maintain a set of documents (README.md, ARCHITECTURE.md, etc.) and ensure it updates them as it goes
      • Take a break from LLMs every so often and write the code yourself
      • Only use an LLM when you’re sure you understand the architecture well enough yourself to not mess it up

      Honestly though I think this is kind of an unsolved problem in this space, and it’s one of the strongest arguments against using LLMs. I worry about the future where people work on code where they lack the depth of knowledge about how it works that they’d normally have. Maybe all the recent headline-grabbing outages from cloud providers are a sign of people over-using LLMs.

      Reply

      • Neil L's avatar

        Posted by Neil L on December 28, 2025 at 12:20 AM

        Thank you for sharing your perspective! I completely understand the ambivalence you’re expressing about AI in coding. It’s valid to feel that while these tools can streamline processes, they also risk underappreciating the skill and expertise that developers have honed over their careers.

        Your ‘rearranging the furniture’ metaphor resonates deeply. Much like navigating a large team project, AI can introduce complexity if not managed carefully. I agree that maintaining documentation like README.md and ARCHITECTURE.md is crucial for preserving context and knowledge, especially when using LLMs. Taking periodic breaks to engage directly with the code yourself is a wise strategy to ensure you retain a deep understanding of the architecture.

        The concern about over-reliance on AI is an important one, especially given the recent incidents in cloud computing – they highlight the risks of delegating too much responsibility to tools that may not fully grasp the nuances of a codebase. Finding a balance between leveraging AI capabilities and retaining a solid grasp of the underlying code is indeed an ongoing challenge that many in our field will face. It’s an evolving conversation, and I appreciate your insights on this matter!

  2. George Crawford's avatar

    Posted by George Crawford on December 23, 2025 at 5:59 AM

    Thanks for this timely post, Nolan. I’ve been following your writing for years, and I’m just about to embark on a deep investigation into the use of LLMs and agents for our engineers at work.

    I don’t suppose you’d consider sharing any of your AGENTS.md files would you? Either project-specific guidelines and rules, or personal preferences? I’d be extremely interested to see the patterns you’ve found to work through trial and error.

    Thanks again!

    Reply

    • Nolan Lawson's avatar

      Thanks! Unfortunately I can’t share them because they’re for private code, but I’d say:

      • It should provide a high-level overview of the system and its components, possibly separated into multiple files if it gets too long
      • You can actually have an LLM generate the first draft of this, or even inspect code changes in CI to ensure it stays up-to-date

      (Note I can’t claim credit for these ideas; my colleagues who are much more adept with LLMs than me came up with this!)

      For the personal AGENTS file, mine mostly contains stuff like “if you want to access my local Postgres, use this exact command” – i.e. things I got tired of seeing the agent mess up on. Note it’s apparently better to say “do this” than “don’t do this” – I guess it’s like how saying “don’t think of an elephant” immediately makes someone think of an elephant.

      Reply

  3. Manuel Jasso's avatar

    Posted by Manuel Jasso on December 25, 2025 at 10:25 AM

    ople ask me about my thoughts about AI, I will point them to yours!

    I like what you said here:

    In some ways, I compare my new role to that of a software architect: the best architects I know still get their hands dirty sometimes and write code themselves, if for no other reason than to remember the ground truth of the grunts in the trenches. But they’re still mostly writing design documents and specs.

    And yes, I’ve also made peaced and trying to move on… But, I’ve still drawn a line…

    Reply

    • Manuel Jasso's avatar

      Posted by Manuel Jasso on December 25, 2025 at 10:28 AM

      Ugh… Copy/paste truncated my first paragraph, I said:

      Hi Nolan! Great article again! I am going to shamelessly keep links to them and when people ask me about my thoughts about AI, I will point them to yours! 😆

      Reply

  4. Neil L's avatar

    Posted by Neil L on December 28, 2025 at 12:19 AM

    I really appreciate your insights on using AI agents for coding! Leveraging AI not only enhances productivity but also allows developers to focus more on creative problem-solving rather than repetitive tasks. It’s exciting to see how these tools can assist in streamlining workflows and potentially improve code quality. I’d love to hear more about specific challenges you’ve faced when integrating AI into your coding practices and how you overcame them!

    Reply

Leave a reply to George Crawford Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.