The collapse of complex software

In 1988, the anthropologist Joseph Tainter published a book called The Collapse of Complex Societies. In it, he described the rise and fall of great civilizations such as the Romans, the Mayans, and the Chacoans. His goal was to answer a question that had vexed thinkers over the centuries: why did such mighty societies collapse?

In his analysis, Tainter found the primary enemy of these societies to be complexity. As civilizations grow, they add more and more complexity: more hierarchies, more bureaucracies, deeper intertwinings of social structures. Early on, this makes sense: each new level of complexity brings rewards, in terms of increased economic output, tax revenue, etc. But at a certain point, the law of diminishing returns sets in, and each new level of complexity brings fewer and fewer net benefits, dwindling down to zero and beyond.

But since complexity has worked so well for so long, societies are unable to adapt. Even when each new layer of complexity starts to bring zero or even negative returns on investment, people continue trying to do what worked in the past. At some point, the morass they’ve built becomes so dysfunctional and unwieldy that the only solution is collapse: i.e., a rapid decrease in complexity, usually by abolishing the old system and starting from scratch.

What I find fascinating about this (besides the obvious implications for modern civilization) is that Tainter could have been writing about software.

Anyone who’s worked in the tech industry for long enough, especially at larger organizations, has seen it before. A legacy system exists: it’s big, it’s complex, and no one fully understands how it works. Architects are brought in to “fix” the system. They might wheel out a big whiteboard showing a lot of boxes and arrows pointing at other boxes, and inevitably, their solution is… to add more boxes and arrows. Nobody can subtract from the system; everyone just adds.

Photo of a man standing in front of a whiteboard with a lot of boxes and arrows and text on the boxes

“EKS is being deprecated at the end of the month for Omega Star, but Omega Star still doesn’t support ISO timestamps.” We’ve all been there. (Via Krazam)

This might go on for several years. At some point, though, an organizational shakeup probably occurs – a merger, a reorg, the polite release of some senior executive to go focus on their painting hobby for a while. A new band of architects is brought in, and their solution to the “big diagram of boxes and arrows” problem is much simpler: draw a big red X through the whole thing. The old system is sunset or deprecated, the haggard veterans who worked on it either leave or are reshuffled to other projects, and a fresh-faced team is brought in to, blessedly, design a new system from scratch.

As disappointing as it may be for those of us who might aspire to write the kind of software that is timeless and enduring, you have to admit that this system works. For all its wastefulness, inefficiency, and pure mendacity (“The old code works fine!” “No wait, the old code is terrible!”), this is the model that has sustained a lot of software companies over the past few decades.

Will this cycle go on forever, though? I’m not so sure. Right now, the software industry has been in a nearly two-decade economic boom (with some fits and starts), but the one sure thing in economics is that booms eventually turn to busts. During the boom, software companies can keep hiring new headcount to manage their existing software (i.e. more engineers to understand more boxes and arrows), but if their labor force is forced to contract, then that same system may become unmaintainable. A rapid and permanent reduction in complexity may be the only long-term solution.

One thing working in complexity’s favor, though, is that engineers like complexity. Admit it: as much as we complain about other people’s complexity, we love our own. We love sitting around and dreaming up new architectural diagrams that can comfortably sit inside our own heads – it’s only when these diagrams leave our heads, take shape in the real world, and outgrow the size of any one person’s head that the problems begin.

It takes a lot of discipline to resist complexity, to say “no” to new boxes and arrows. To say, “No, we won’t solve that problem, because that will just introduce 10 new problems that we haven’t imagined yet.” Or to say, “Let’s go with a much simpler design, even if it seems amateurish, because at least we can understand it.” Or to just say, “Let’s do less instead of more.”

Simplicity of design sounds great in theory, but it might not win you many plaudits from your peers. A complex design means more teams to manage more parts of the system, more for the engineers to do, more meetings and planning sessions, maybe some more patents to file. A simple design might make it seem like you’re not really doing your job. “That’s it? We’re done? We can clock out?” And when promotion season comes around, it might be easier to make a case for yourself with a dazzling new design than a boring, well-understood solution.

Ultimately, I think whether software follows the boom-and-bust model, or a more sustainable model, will depend on the economic pressures of the organization that is producing the software. A software company that values growth at all cost, like the Romans eagerly gobbling up more and more of Gaul, will likely fall into the “add-complexity-and-collapse” cycle. A software company with more modest aims, that has a stable customer base and doesn’t change much over time (does such a thing exist?) will be more like the humble tribe that follows the yearly migration of the antelope and focuses on sustainable, tried-and-true techniques. (Whether such companies will end up like the hapless Gauls, overrun by Caesar and his armies, is another question.)

Personally, I try to maintain a good sense of humor about this situation, and to avoid giving in to cynicism or despair. Software is fun to write, but it’s also very impermanent in the current industry. If the code you wrote 10 years ago is still in use, then you have a lot to crow about. If not, then hey, at least you’re in good company with the rest of us, who probably make up the majority of software developers. Just keep doing the best you can, and try to have a healthy degree of skepticism when some wild-eyed architect wheels out a big diagram with a lot of boxes and arrows.

40 responses to this post.

  1. This article hits the nail on the head, well done.

    Reply

  2. I do not agree with this premodern boom-and-bust cyclical take, something else much more interesting and profound is going on (think of imanent antagonisms between Kants’s contingent pathological objects and universalities, the OOP vs FP debate is problably fits well here). For a guy who worked in software industry and perceived it as a mess, the real question to ask is “how that anything works at all?”

    Reply

    • Posted by Philip Oakley on June 17, 2022 at 12:24 AM

      Isn’t it a product of it’s environment. Just thinking of the societal complexity of individualism, with all the software logic focussing onto the lead developer. Then too many amendments to the base code.

      Reply

    • Any chance of you expanding on what you mean by something more important?

      I agree it is amazing anything works, but with testing at all stages the problem is going back to the business requirements, which may be complex and produced by people who cannot handle complexity. The problems may also be “wicked”.

      Reply

  3. Some great food for thought and an example of the value of applying knowledge of the humanities (History) to the world of business and technology. I’m sharing on LinkedIn if that’s ok. I’m very appreciative of the insights from folks like you in the HN community.

    Reply

  4. Posted by James Parker on June 9, 2022 at 6:16 PM

    Not all engineers embrace complexity, even in their own work. I often quote Turing award winner CAR Hoare, who said a system has either “no obvious defects” (due to over complexity) or ” obviously no defects”. It is typically those outside of engineering (marketing or management) that push for complexity, even if they don’t intend to.

    Reply

    • That’s fair. OTOH I do think it’s easy to be seduced by your own design, at least while it fits in your own head, and to forget about the potential unintended consequences.

      You’re right that business requirements often drive complexity. I think this is something I was trying to get at: sometimes software complexity is a symptom, not a cause, e.g. because the organization that produced it is complex (think: Conway’s Law).

      Reply

      • And I think sometimes organizations see a complex system, think “whoa, this is really complex”, and then try to replace it — without understanding why it was complex to begin with. Then, sometimes the result is… just as complicated.

        Software models a business domain. The better it models it, the better the complexity of reality is mirrored into the software. Complex domains produce complex software — or incomplete software.

    • Posted by Bebu sa Ware on June 11, 2022 at 8:03 PM

      In the same vein as the CAR Hoare quote St-Exupery’s 《La perfection est atteinte, non pas lorsqu’il n’y a plus rien à ajouter, mais lorsqu’il n’y a plus rien à retirer》
      (“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away”)
      Perfection is a questionable concept but the plea for minimalism in design and implementation has to be the voice of sanity.

      Software models something – usually something in the real world and often the
      thing modelled is insanely complex and probably not self consistent.

      Human activities, institutions and organisations at best border on insanity and evolve
      in less than rational ways so its no surprise to me that software designed for business etc is insanely complex and necessarily becomes more so.

      It pays the rent I suppose :)

      Reply

  5. Posted by tetsuoii on June 10, 2022 at 10:09 AM

    Yes, this is all true and for this exact reason I only write software in C and avoid modern paradigms. It’s better to write small programs with low complexity and high performance. That requires skill and craftmanship. Programs should be minimalist and have as few dependencies as possible. It’s often better to cannibalize a program than to extend it. The so-called modularity and extensibility of object orientation often causes the exact opposite result. Data is hidden behind abstractions, unnecessary layers of complexity, copied and handled adding performance loss to obscurity. Then this is iterated upon across components creating a tangled mess.

    Reply

    • I agree with this 100%

      Reply

    • Yes, this sounds nice, until you are given the job of building a “fire and forget” anti-air missile that must use data from a passive radar and IR sensor to track an enemy airplane that is actively trying to avoid being hit by using signal jamming, decoys and aggressive maneuvering. That was 15 years ago. Today, my software needs to control a walking robot that uses stereo vision to see rough outdoor ground and decide the best places to place its feet so it can walk on uneven ground. The robot uses the same stereo camera for navigation. It sees objects and looks up their location.

      My point is that some tasks are not simple and require not-simple software.

      We manage complexity with layers and abstractions. For example, stereo vision is hidden in some lower layers and upper layers do not need to “know” the data is from a pair of video cameras, it might be a LIDAR. We hide this complexity so that each component is “simple”

      Reply

  6. Posted by Louis on June 10, 2022 at 6:45 PM

    Lookin at you, Dependency Injection!

    Reply

  7. Posted by Cliff on June 11, 2022 at 11:22 AM

    …and yet Chesterton’s Fence applies as well. One man’s “blessedly…design[ed] new system from scratch” is another man’s Jacobin destruction of dozens of man years of value.

    Reply

  8. Posted by Geoff on June 15, 2022 at 3:00 AM

  9. Posted by Josh Maxwell on June 15, 2022 at 5:06 AM

    To me this feels like we are entering a new phase where we need to have better tools for quickly modeling and building complex systems so when the inevitable happens we can quickly pivot to a new complex solution. I think we’re seeing a bit of this evolution with low/no-code tools and their adjacent theories. Just imagine of rather than web apps and simple automation, if entire systems could be pieced together rapidly in this way (or simple rearranged).

    Reply

    • ENIGMA is a tool for piecing together applications rapidly. Document what the application does (Analysis and Design). Then let the software do the rest. Enigma needs simplicity and small programs. Rather like a Wall,it builds with small bricks. Check it out.

      Reply

  10. Posted by Ghent the slicer on June 16, 2022 at 6:59 PM

    Good Sir,

    Let me disagree with a couple of points you are making:

    Not every organization has appetite or budget for system replacement cycles – just ask any bank still maintaining a legacy COBOL system. These cycles may be typical for the Microsoft, Amazon, Google, Meta, Salesforce and others, I think Boeing, NASA or Bank of America have a much more conservative approach to their software systems. I’m sure they still replace stuff, but at a much slower rate.

    The software industry has been going trough an expansion since the 1960s. It has gone trough quite a few boom-bust cycles of the economy, but that does not affect the system replacement cycles as much as you imagine. What happens in a recession is that IT team size shrinks and projects are cut or reduced in scope. Where you had 100 engineer’s team to develop system X version 4.7, now you have 30 engineers and the task is to “keep it limping”. This naturally restricts the speed at which new boxes and arrows are added to the system. Also some systems survive and some collapse – the ones important to the organization are replaced, the rest are left to die.

    To your last point about engineers making complex systems for serving their promotion goals. Yes it is true, but the reasons are not as self serving as you might imagine: to get a promotion one has to solve a “big business problem” – naturally one picks the largest problem scope they can get people behind – why solve a small problem when you can solve a big problem and demonstrate how smart one is. You also have to factor that managers love this stuff – after all their leadership was the cause for solving the big bad problem – hence they tent to approve large splashy projects, not simple ones. Managers also need promotions after all. If you are a cynical person you can blame the bureaucratic machinery of large software companies, which promotes building complex solutions.

    Small software companies are a different beast. There are simply not enough resources to embark on system replacement cycles. Although someone from the gaming industry may disagree – they do love re-writing their game engines.

    Reply

  11. Suggest study of the Belady and Lehmann model (1972). Their paper from IBM Research so accurately forecast the fate of IBM’s flagship OS360 that IBM classified it.

    Reply

  12. Posted by Giorgio M on June 16, 2022 at 11:48 PM

    “A new band of architects is brought in, and their solution to the “big diagram of boxes and arrows” problem is much simpler”
    This mainly is so, since most often, the original development team had to do with half-baked requirements, probably changing every x months, in between also doing some demo’s for interested customers and perhaps implement some hack in the product just to be able to give these demo’s, and what else other things that popped up in the course of the initial development.
    It indeed is a lot easier for a fresh team to see what’s there (the specs and the source code) and dream up a much simpler, compacted/summarized version of that code.
    And if there’s a budget for that (which in most companies there never will be), great, let’s reset the counters to zero and have that rebuilt version.
    If it can keep stable like that for years: wonderful.
    But most products need to cope with framework upgrades, OS upgrades, deprecated functionalities in used libraries and new requirements that there never will be a status quo and it will start all over again.
    So in the long run, I think that gradual refactorings of the product to get rid of those historical debts makes more sense.
    It’s like when building a house where you start off with something simple and gradually, as you make more money growing older, doing some home improvements, like installing that cool new kitchen, or enlarging that living room coming with more structural work, or renewing the heating system,….
    And when all that is done, you’re hiring a construction firm for a new floor in the living room and they say: well, let’s start from scratch, we’ll tear the current house down, we’ll rebuild everything there was in your house just as it was before, only with some better finishing, and also lay that new floor you asked for. Well…

    Reply

    • Posted by ChickenDinner on June 17, 2022 at 2:47 AM

      Except the first release of your new house will be an MVP. Your MVP house will have a front door, a hallway, a sink tap, and nothing else but the finishing on this will be great. You’ll be expected to move back in immediately and be extremely happy and grateful and not miss any of those lame features like bedrooms and kitchens!

      Reply

  13. Very inspirational.

    Reply

  14. Posted by Robert Cohn on June 17, 2022 at 7:50 AM

    Unfortunately, there’s no avoiding complexity when writing most production-level software. The real question is what do we do about it? What tools, methodologies and processes help to manage the complexity?

    I, for one, do not like complexity, per se. But I know that it is a fact of software development and I have to live with it. I like the results that come from complex programs, but dealing with the minutia of languages, platforms, system configuration and questionable requirements all contribute to the complexity “stew”.

    Since writing software is an evolutionary process, we either learn to deal with the complexity and morph it into an understandable and manageable beast, or we’re continuously chopping the heads off of the snakes coming out of the medusa head (and getting bitten in the process).

    Thanks for the interesting viewpoint.

    Reply

  15. It is far too simplistic to claim that societies follow a boom-and-bust cycle. What gets replaced is the government, church, or leading powerful organization, while the society itself goes on gradually improving. We don’t revert to simple hunter-gatherers or climb back into the trees with each fallen empire. It’s the elites, not the peasants, who take it in the shorts.

    As far as organizations and software are concerned, the most we can say is that organizations and software systems that expand recklessly eventually reach a point where their complexity becomes unmanageable. There are organizations like the Roman Catholic Church that have evolved and remained strong for a thousand years, and software systems like the public switched telephone network and the SABRE airline reservation system, that have been evolving successfully for 50 years, which is forever in internet time.

    Reply

  16. Posted by Justin on June 17, 2022 at 11:40 AM

    The idea of rip and replace ignores one crucial factor. Complex software is often complex because it has been optimized for the complexity of the organization over many years. A new implementation will likely perform worse because it will not take all the underlying complexity into account. Therefore the new software may take years to get to the level of functionality of the old software. It fact, the new software may be such a bad match that it is a total failure.

    While every software reaches rip-and-replace state someday, those days are few and far between.

    Reply

    • Posted by Philip Oakley on June 17, 2022 at 12:31 PM

      While this is very true, the alternate approach is to also re-work, simplify and refactor the external organisation to eliminate some of those foibles.

      I had that at a company I worked for where, for example, extra copies of documents were forwarded to departments that weren’t included in the formal process, or where documents that couldn’t be issued until the receiving department had check and updated the content (Catch 22).

      Reworking the process simplifies the software. Sometimes the processes were ripped up just to fit the industry standard software’s approach!

      Reply

  17. Thoughts on why software ends up failing:

    Compilers do not automatically generate code that prevents known (such as those championed by OWASP) coding problems (e.g., reuse of memory via pointers; division by zero).
    Language designers attempt to create failsafe language constructs that work well as long as everything is generated every time for any software component, which is unrealistic as computers never are designed with identical hardware/firmware or logical components or using the identical version of the same compiler.
    Debugging friendly built-in tools for developers and users is usually ignored or grafted into a language design or developed software as an afterthought. Execution errors that indicate hardware or operating system detected failures or especially crashes is a total developer fail.
    Debugging support tends not to be nuanced (imagine 100 levels of debugging with less intensive debugging based on actual execution experience with external modules and internal logic) Every software unit and systems needs to default to being ultra-dubious about data they get from something else – especially the OS and hardware and software components developed by others. Assertion checks on all data, records, etc. that are received or delivered!
    Managers in charge of software development tend to be unaware of how different components (network, operating system, file system, CPUs, libraries, subroutine/function calling conventions, etc.) function separately and together. For an analogy, would you fly with a pilot who had no idea how the plane components work individually an separately?
    Programmers tend to have terrible technical backgrounds and little experience in the subject areas they are writing code for. A programmer who has little or no understanding of the OS, file system, compilers, network, etc. easily sandbags any system they work on. Programmers are not hired for documentation quality capabilities.
    Programmers tend to way over-estimate the progress at the start of development (hitting the 33% milestone of deliverable milestones is not the same as being 33% of the total effort/cost involved).
    Developers and their managers become married to a failed design and are reluctant to start over (multiple times if needed) to learn from a failed design.
    Documentation generated automatically for independent V&V or maintainers is rarely generated which leaves even current developers guessing on what is really working, let alone the unfortunate persons who inherit a failed system. Developers are rewarded for speed of delivery, not for long term viability and total (economic and financial) minimal cost of ownership by whomever really pays the bills.
    Developers are financially rewarded up front whether or not their design and coding work or cost big time in software lifetime rework, maintenance, and patching (there is no financial claw back in software development).
    Software design is never able to be specified in advance with confidence. Famous quotes about military campaign plans are that plans are useless once the first bullets fly. The same apply to software. Attempting to design everything upfront results in universal failures which tend to be acknowledgment delayed to dump the blame on someone else or a successor. This is why Boeing is spending hundred of millions fixing their Starliner spacecraft.
    Too many bid software designs (the federal government being a primo example) are designed to win a contract rather than be implementable and usable.
    Too many contracts feature an all-star set of key developers who magically have another job by the time development actually starts (another primo example found in the federal government).
    Too many contracts stick most of the risks on one party or sub-group so that they get none of the benefits and all of the risks. Not a good motivator for good coding.

    Reply

  18. Posted by Mark on June 19, 2022 at 5:58 AM

    A couple of reactions to this excellent post I had while reading it:
    1) I’ve long thought higher levels of problem complexity necessitates a certain amount of obscurity and performance loss. Kind of like junk dna. The more complex the organism, the longer the strand gets and the more sections of “what are those for?”

    2) The current management fad I refer to as “fungible monkey management” implicitly pushes back on the maintenance of complex systems. The assertion that programmers are essentially (or should be) interchangeable cogs and can be reassigned with a moment’s notice I think is fundamentally flawed. You end up with a revolving door where nobody is really good at what they’re doing. Leaving the differences of stacks and languages aside, any software that’s worthwhile is doing something and a management style that rotates in someone new every few months isn’t letting anyone really understand what that something is.

    Reply

    • Mark, your “fungible monkey” illustration is spot-on. There’s two of us left who get the end-to-end. Time to get a big raise or make a change.

      Reply

  19. Posted by Brian Reich on June 20, 2022 at 6:10 AM

    Great article. Are you speaking about a particular “kind” of complexity in software? To me there are a few kinds of complexity.

    Too many systems, abstractions, concepts that too few people truly understand.
    Too many third party dependencies that are too tightly coupled.
    Too few abstractions, which leads to “legacy code that just works” but cannot be easily understood or refactored or tested, and breaks if you squint at it.

    Reply

  20. “In 1988, the anthropologist Joseph Tainter published a book called The Collapse of Complex Societies. In it, he described the rise and fall of great civilizations such as the Romans, the Mayans, and the Chacoans.”

    A well-researched similar series of analyses on how civilizations end up self-destructing (with many parallels to activities such as software development) is the “Fall of Civilizations” (available for free on YouTube in audio and video versions). See https://www.youtube.com/channel/UCT6Y5JJPKe_JDMivpKgVXew. Failures tend to come from failure to plan for changes/modifications in externalities and due to hubris.

    Reply

  21. Posted by Ed Munster on June 20, 2022 at 10:17 AM

    I worked for more than 20 years on military and aerospace embedded software. There were detailed standards and procedures that supposedly had to be followed. If they had been followed, the software would have been easy to maintain. Unfortunately while some people were very conscientious (I like to think that I was one of them) a lot of people aren’t, and the rules were never really enforced.

    The customers insisted on the rules, but did nothing to enforce them. Managers in the companies I was working for, really didn’t give a damn unless the customers complained, which they never did, because they had no idea if the rules were being followed or not. The priority for managers was: get it done quick and cheap. They charged the customers an arm and a leg for “high-quality software“ but they were delivering crap.

    Reply

  22. […] letto recentemente questo interessante articolo sulla tematica e ad un certo punto […]

    Reply

  23. I like this a lot. It seems obvious to me, but most engineers ignore the problem. I always start my designs planning for maintainability. It just seems reasonable. That’s why I’m not thrilled with where C# and software in general is going. My software tells a story. It’s hard to do that now. The idea of abbreviated code is like a story without enough words to connect it together in a person’s mind. (Hey, need a C# dev?)

    Reply

  24. […] https://nolanlawson.com/2022/06/09/the-collapse-of-complex-software/ – The danger of complexity of solutions, ideas and plans. Sometimes less is better than more. I really recommend the discussion under the article. […]

    Reply

  25. […] Seneca Rock wird normalerweise im Zusammenhang mit dem Untergang von Zivilisationen oder erwähnt soziale Institution. Aber wir sehen es in vielen anderen Dingen – Geschäft, Investitionen, Ehe, Gesundheit, Ansehen, wirtschaftliche Entwicklung usw. Leider verläuft ein Absturz fast nie so reibungslos wie ein vorangegangener Aufstieg. Oft bricht alles zusammen wie ein Kartenhaus. Das gilt auch für komplexe Softwaresysteme. […]

    Reply

Leave a Reply to Alex Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.