Archive for the ‘software engineering’ Category

How I use AI agents to write code

Yes, this is the umpteenth article about AI and coding that you’ve seen this year. Welcome to 2025.

Some people really find LLMs distasteful, and if that’s you, then I would recommend that you skip this post. I’ve heard all the arguments, and I’m not convinced anymore.

I used to be a fairly hard-line anti-AI zealot, but with the release of things like Claude Code, OpenAI Codex, Gemini CLI, etc., I just can’t stand athwart history and yell “Stop!” anymore. I’ve seen my colleagues make too much productive use of this technology to dismiss it as a fad or mirage. It writes code better than I can a lot of the time, and that’s saying something because I’ve been doing this for 20 years and I have a lot of grumpy, graybeard opinions about code quality and correctness.

But you have to know how to use AI agents correctly! Otherwise, they’re kind of like a finely-honed kitchen knife attached to a chainsaw: if you don’t know how to wield it properly, you’re gonna hurt yourself.

Basic setup

I use Claude Code. Mostly because I’m too lazy to explore all the other options. I have colleagues who swear by Gemini or Codex or open-source tools or whatever, but for me Claude is good enough.

First off, you need a good CLAUDE.md (or AGENTS.md). Preferably one for the project you’re working in (the lay of the land, overall project architecture, gotchas, etc.) and one for yourself (your local environment and coding quirks).

This seems like a skippable step, but it really isn’t. Think about your first few months at a new job – you don’t know anything about how the code works, you don’t know the overall vision or design, so you’re just fumbling around the code and breaking things left and right. Ideally you need someone from the old guard, who really knows the codebase’s dirty little secrets, to write a good CLAUDE.md that explains the overall structure, which parts are stable, which parts are still under development, which parts have dragons, etc. Otherwise the LLM is just coming in fresh to the project every time and it’s going to wreak havoc.

As for your own personal CLAUDE.md (i.e. in ~/.claude), this should just be for your own coding quirks. For example, I like the variable name _ in map() or filter() functions. It’s like my calling card; I just can’t do without it.

Overall strategy

I’ve wasted a lot of time on LLMs. A lot of time. They are every bit as dumb as their critics claim. They will happily lead you down the garden path and tell you “Great insight!” until you slowly realize that they’ve built a monstrosity that barely works. I can see why some people try them out and then abandon them forever in disgust.

There are a few ways you can make them more useful, though:

  1. Give them a feedback loop, usually through automated tests. Automated tests are a good way for the agent to go from “I’ve fixed the problem!” to “Oh wait, no I didn’t…” and actually circle in on a working solution.
  2. Use the “plan mode” for more complicated tasks. Just getting the agent to “think” about what it’s doing before it executes is useful for something simpler than a pure refactor or other rote task.

For example, one time I asked an agent to implement a performance improvement to a SQL query. It immediately said “I’ve found a solution!” Then I told it to write a benchmark and use a SQL EXPLAIN, and it immediately realized that it actually made things slower! So the next step was to try 3 different variants of the solution, testing each against the benchmark, and only then deciding on the way forward. This is eerily similar to my own experience writing performance optimizations – the biggest danger is being seduced by your own “clever” solution without actually rigorously benchmarking it.

This is why I’ve found that coding agents are (currently) not very good at doing UI. You end up using something like the Playwright or Chrome DevTools MCP/skill, and this either slurps up way too many tokens, or it just slows things down considerably because the agent has to inspect the DOM (tokens galore) or write a Playwright script and take a screenshot to inspect it (slooooooow). I’ve watched Claude fumble over closing a modal dialog too often to have patience for this. It’s only worthwhile if you’re willing to let the agent run over your lunch break or something.

The AI made a mistake? Add more AI

This one should be obvious but it’s surprisingly not. AIs tend to make singular, characteristic mistakes:

  1. Removing useful comments from previous developers – “this is a dumb hack that we plan to remove in version X” either gets deleted or becomes some Very Official Sounding Comment that obscures the original meaning.
  2. Duplicating code. Duplicating code. I don’t know why agents love duplicating code so much, but they do. It’s like they’ve never heard of the DRY principle.
  3. Making subtle “fixes” when refactoring code that actually break the original intent. (E.g. “I’ll just put an extra null check in here!”)

Luckily, there’s a pretty easy solution to this: you shut down Claude Code, start a brand-new session, and tell the agent “Hey, diff against origin/main. This is supposed to be a pure refactor. Is it really though? Check for functional bugs.” Inevitably, the agent will find some errors.

This seems to work better if you don’t tell the agent that the code is yours (presumably because it would just try to flatter you about how brilliant your code is). So you can lie and say you’re reviewing a colleague’s PR or something if you want.

After this “code review” agent runs, you can literally just shut down Claude Code and run the exact same prompt again. Run it a few times until you’re sure that all the bugs have been shaken out. This is shockingly effective.

Get extra work done while you sleep

One of the most addictive things about Claude Code is that, when I sign off from work. I can have it iterate on some problem while I’m off drinking a beer, enjoying time with my family, or hunkering down for a snooze. It doesn’t get tired, it doesn’t take holidays, and it doesn’t get annoyed at trying 10 different solutions to the same problem.

In a sense then, it’s like my virtual Jekyll-and-Hyde doppelganger, because it’s getting work done that I never would have done otherwise. Sometimes the work is a dud – I’ll wake up and realize that the LLM got off on some weird tangent that didn’t solve the real problem, so I’ll git reset --hard and start from scratch. (Often I’ll use my own human brain for this stuff, since this situation is a good hint that it’s not the right job for an LLM.)

I’ve found that the biggest limiting factor in these cases is not the LLM itself, but rather just that Claude Code asks for permission on every little thing, to where I’ve developed an automation blindness where I just skim the command and type “yes.” This scares me, so I’ve started experimenting with running Claude Code in a Podman container in yolo mode. Due to the lethal trifecta, though, I’m currently only comfortable doing this with side projects where I don’t care if my entire codebase gets sent to the dark web (or whatever it is misbehaving agents might do).

This unfortunately leads to a situation where the agent invades my off-work hours, and I’m tempted to periodically check on its progress and either approve it or point it in another direction. But this becomes more a problem of work-life balance than of human-agent interaction – I should probably just accept that I should enjoy my hobbies rather than supervising a finicky agent round-the-clock!

Conclusion

I still kind of hate AI agents and feel ambivalent toward them. But they work. When I read anti-AI diatribes nowadays, my eyes tend to glaze over and I think of the quote from Galileo: “And yet, it moves.” All your arguments make a lot of sense, they resonate with me a lot, and yet, the technology works. I write an insane amount of code these days in a very short number of hours, and this would have been impossible before LLMs.

I don’t use LLMs for everything. I’ve learned through bitter experience that they are just not very good at subtle, novel, or nebulous projects that touch a lot of disparate parts of the code. For that, I will just push Claude to the side and write everything myself like a Neanderthal. But those cases are becoming fewer and further between, and I find myself spending a lot of time writing specs, reviewing code, or having AIs write code to review other AIs’ code (like some bizarre sorcerer’s apprentice policing another sorcerer’s apprentice).

In some ways, I compare my new role to that of a software architect: the best architects I know still get their hands dirty sometimes and write code themselves, if for no other reason than to remember the ground truth of the grunts in the trenches. But they’re still mostly writing design documents and specs.

I also don’t use AI for my open-source work, because it just feels… ick. The code is “mine” in some sense, but ultimately, I don’t feel true ownership over it, because I didn’t write it. So it would feel weird to put my name on it and blast it out on the internet to share with others. I’m sure I’m swimming against the tide on this one, though.

If I could go back in time and make it so LLMs were never a thing… I might still do it. I really had a lot more fun writing all the code myself, although I am having a different sort of fun now, so I can’t completely disavow it.

I’m reminded of game design – if you create a mechanic that’s boring, but which players can exploit to consistently win the game (e.g. hopping on turtle shells for infinite 1-Ups), then they’ll choose that strategy, even if they end up hating the game and having less fun. LLMs are kind of like that – they’re the obvious optimal strategy, and although they’re less fun, I’ll keep choosing them.

Anyway, I may make a few enemies writing this post, but I’ve long accepted that what I write on the internet will usually attract some haters. Meanwhile I think the vast majority of developers have made their peace with AI and are just moving on. For better or worse, I’m one of them.

The fate of “small” open source

By far the most popular npm package I’ve ever written is blob-util, which is ~10 years old and still gets 5+ million weekly downloads.

It’s a small collection of utilities for working with Blobs in JavaScript. I wrote it because I found that PouchDB users were endlessly confused about how to work with Blobs and how to convert them to strings, ArrayBuffers, etc.

Given that some 80% of developers are now using AI in their regular work, blob-util is almost certainly the kind of thing that most developers would just happily have an LLM generate for them. Sure, you could use blob-util, but then you’d be taking on an extra dependency, with unknown performance, maintenance, and supply-chain risks.

And sure enough, Claude will happily spit out whatever Blob utilities you need when prompted:

> Write me a utility function in TypeScript to convert a Blob to an ArrayBuffer. It should return a Promise.

function blobToArrayBuffer(blob: Blob): Promise<ArrayBuffer> {
  return new Promise((resolve, reject) => {
    const reader = new FileReader();

    reader.onload = () => {
      if (reader.result instanceof ArrayBuffer) {
        resolve(reader.result);
      } else {
        reject(new Error('Failed to read Blob as ArrayBuffer'));
      }
    };

    reader.onerror = () => {
      reject(reader.error);
    };

    reader.readAsArrayBuffer(blob);
  });
}

Claude’s version is pretty close to the blob-util version (unsurprising, since it was probably trained on it!). Although it’s much more verbose, unnecessarily checking if readAsArrayBuffer actually gives you an ArrayBuffer (although this does make TypeScript happy). To be fair, it also improves on my implementation by directly rejecting with an error rather than the more awkward onerror event.

Note: for anyone wondering, yes Claude did suggest the new Blob.arrayBuffer() method, but it also generated the above for “older environments.”

I suppose some people would see this as progress: fewer dependencies, more robust code (even if it’s a bit more verbose), quicker turnaround time than the old “search npm, find a package, read the docs, install it” approach.

I don’t have any excessive pride in this library, and I don’t particularly care if the download numbers go up or down. But I do think something is lost with the AI approach. When I wrote blob-util, I took a teacher’s mentality: the README has a cutesy and whimsical tutorial featuring Kirby, in all his blobby glory. (I had a thing for putting Nintendo characters in all my stuff at the time.)

The goal wasn’t just to give you a utility to solve your problem (although it does that) – the goal was also to teach people how to use JavaScript effectively, so that you’d have an understanding of how to solve other problems in the future.

I don’t know which direction we’re going in with AI (well, ~80% of us; to the remaining holdouts, I salute you and wish you godspeed!), but I do think it’s a future where we prize instant answers over teaching and understanding. There’s less reason to use something like blob-util, which means there’s less reason to write it in the first place, and therefore less reason to educate people about the problem space.

Even now there’s a movement toward putting documentation in an llms.txt file, so you can just point an agent at it and save your brain cells the effort of deciphering English prose. (Is this even documentation anymore? What is documentation?)

Conclusion

I still believe in open source, and I’m still doing it (in fits and starts). But one thing has become clear to me: the era of small, low-value libraries like blob-util is over. They were already on their way out thanks to Node.js and the browser taking on more and more of their functionality (see node:glob, structuredClone, etc.), but LLMs are the final nail in the coffin.

This does mean that there’s less opportunity to use these libraries as a springboard for user education (Underscore.js also had this philosophy), but maybe that’s okay. If there’s no need to find a library to, say, group the items in an array, then maybe learning about the mechanics of such libraries is unnecessary. Many software developers will argue that asking a candidate to reverse a binary tree is pointless, since it never comes up in the day-to-day job, so maybe the same can be said for utility libraries.

I’m still trying to figure out what kinds of open source are worth writing in this new era (hint: ones that an LLM can’t just spit out on command), and where education is the most lacking. My current thinking is that the most value is in bigger projects, more inventive projects, or in more niche topics not covered in an LLM’s training data. For example, I look back on my work on fuite and various memory-leak-hunting blog posts, and I’m pretty satisfied that an LLM couldn’t reproduce this, because it requires novel research and creative techniques. (Although who knows: maybe someday an agent will be able to just bang its head against Chrome heap snapshots until it finds the leak. I’ll believe it when I see it.)

There’s been a lot of hand-wringing lately about where open source fits in in a world of LLMs, but I still see people pushing the boundaries. For example, a lot of naysayers think there’s no point in writing a new JavaScript framework, since LLMs are so heavily trained on React, but then there goes the indefatigable Dominic Gannaway writing Ripple.js, yet another JavaScript framework (and with some new ideas, to boot!). This is the kind of thing I like to see: humans laughing in the face of the machine, going on with their human thing.

So if there’s a conclusion to this meandering blog post (excuse my squishy human brain; I didn’t use an LLM to write this), it’s just that: yes, LLMs have made some kinds of open source obsolete, but there’s still plenty of open source left to write. I’m excited to see what kinds of novel and unexpected things you all come up with.

The collapse of complex software

In 1988, the anthropologist Joseph Tainter published a book called The Collapse of Complex Societies. In it, he described the rise and fall of great civilizations such as the Romans, the Mayans, and the Chacoans. His goal was to answer a question that had vexed thinkers over the centuries: why did such mighty societies collapse?

In his analysis, Tainter found the primary enemy of these societies to be complexity. As civilizations grow, they add more and more complexity: more hierarchies, more bureaucracies, deeper intertwinings of social structures. Early on, this makes sense: each new level of complexity brings rewards, in terms of increased economic output, tax revenue, etc. But at a certain point, the law of diminishing returns sets in, and each new level of complexity brings fewer and fewer net benefits, dwindling down to zero and beyond.

But since complexity has worked so well for so long, societies are unable to adapt. Even when each new layer of complexity starts to bring zero or even negative returns on investment, people continue trying to do what worked in the past. At some point, the morass they’ve built becomes so dysfunctional and unwieldy that the only solution is collapse: i.e., a rapid decrease in complexity, usually by abolishing the old system and starting from scratch.

What I find fascinating about this (besides the obvious implications for modern civilization) is that Tainter could have been writing about software.

Anyone who’s worked in the tech industry for long enough, especially at larger organizations, has seen it before. A legacy system exists: it’s big, it’s complex, and no one fully understands how it works. Architects are brought in to “fix” the system. They might wheel out a big whiteboard showing a lot of boxes and arrows pointing at other boxes, and inevitably, their solution is… to add more boxes and arrows. Nobody can subtract from the system; everyone just adds.

Photo of a man standing in front of a whiteboard with a lot of boxes and arrows and text on the boxes

“EKS is being deprecated at the end of the month for Omega Star, but Omega Star still doesn’t support ISO timestamps.” We’ve all been there. (Via Krazam)

This might go on for several years. At some point, though, an organizational shakeup probably occurs – a merger, a reorg, the polite release of some senior executive to go focus on their painting hobby for a while. A new band of architects is brought in, and their solution to the “big diagram of boxes and arrows” problem is much simpler: draw a big red X through the whole thing. The old system is sunset or deprecated, the haggard veterans who worked on it either leave or are reshuffled to other projects, and a fresh-faced team is brought in to, blessedly, design a new system from scratch.

As disappointing as it may be for those of us who might aspire to write the kind of software that is timeless and enduring, you have to admit that this system works. For all its wastefulness, inefficiency, and pure mendacity (“The old code works fine!” “No wait, the old code is terrible!”), this is the model that has sustained a lot of software companies over the past few decades.

Will this cycle go on forever, though? I’m not so sure. Right now, the software industry has been in a nearly two-decade economic boom (with some fits and starts), but the one sure thing in economics is that booms eventually turn to busts. During the boom, software companies can keep hiring new headcount to manage their existing software (i.e. more engineers to understand more boxes and arrows), but if their labor force is forced to contract, then that same system may become unmaintainable. A rapid and permanent reduction in complexity may be the only long-term solution.

One thing working in complexity’s favor, though, is that engineers like complexity. Admit it: as much as we complain about other people’s complexity, we love our own. We love sitting around and dreaming up new architectural diagrams that can comfortably sit inside our own heads – it’s only when these diagrams leave our heads, take shape in the real world, and outgrow the size of any one person’s head that the problems begin.

It takes a lot of discipline to resist complexity, to say “no” to new boxes and arrows. To say, “No, we won’t solve that problem, because that will just introduce 10 new problems that we haven’t imagined yet.” Or to say, “Let’s go with a much simpler design, even if it seems amateurish, because at least we can understand it.” Or to just say, “Let’s do less instead of more.”

Simplicity of design sounds great in theory, but it might not win you many plaudits from your peers. A complex design means more teams to manage more parts of the system, more for the engineers to do, more meetings and planning sessions, maybe some more patents to file. A simple design might make it seem like you’re not really doing your job. “That’s it? We’re done? We can clock out?” And when promotion season comes around, it might be easier to make a case for yourself with a dazzling new design than a boring, well-understood solution.

Ultimately, I think whether software follows the boom-and-bust model, or a more sustainable model, will depend on the economic pressures of the organization that is producing the software. A software company that values growth at all cost, like the Romans eagerly gobbling up more and more of Gaul, will likely fall into the “add-complexity-and-collapse” cycle. A software company with more modest aims, that has a stable customer base and doesn’t change much over time (does such a thing exist?) will be more like the humble tribe that follows the yearly migration of the antelope and focuses on sustainable, tried-and-true techniques. (Whether such companies will end up like the hapless Gauls, overrun by Caesar and his armies, is another question.)

Personally, I try to maintain a good sense of humor about this situation, and to avoid giving in to cynicism or despair. Software is fun to write, but it’s also very impermanent in the current industry. If the code you wrote 10 years ago is still in use, then you have a lot to crow about. If not, then hey, at least you’re in good company with the rest of us, who probably make up the majority of software developers. Just keep doing the best you can, and try to have a healthy degree of skepticism when some wild-eyed architect wheels out a big diagram with a lot of boxes and arrows.