The limitations of semantic versioning

1 Sep

The limitations of semantic versioning

Posted September 1, 2014 by Nolan Lawson in Webapps. Tagged: node, npm, semver. Leave a Comment

With the recent underscore 1.7.0 brouhaha, there’s been a lot of discussion about the value of semantic versioning. Most of the JavaScript community seems to take the side of Semver, with Dominic Tarr even offering a satirical Sentimental Versioning spec.

Semver is so deeply entrenched in the Node community, that it’s hard to question it without making yourself an easy target for ridicule. Plus, much of the value of Semver comes from everybody collectively agreeing on it, so as with vaccines, dissenters risk being labeled as a danger to the community at large.

To me, though, most of this discussion is missing the point. The issue is not semantic versioning, but rather the build systems we’ve created that assume and promote automatic updates based on semantic versioning – i.e. npm. We wouldn’t be so worried about a breaking change in underscore 1.7.0 if thousands of projects weren’t primed to auto-update their underscore dependencies.

As a developer, I divide my time pretty evenly between Java and JavaScript, so I may have unique perspective here. I love the npm and Node communities, and I’ve been happily using and publishing modules for the past year or so. But as a community, I think it’s time we started being honest with ourselves about what Semver and auto-updating are actually buying us.

Recap of auto-updating

Recently, npm changed its default settings to automatically add dependencies like this:

"some-dependency" : "^1.2.1"

instead of like this:

"some-dependency" : "~1.2.1"

The caret ^ means the dependency will automatically update to the latest minor version relative to 1.2.1 (e.g. 1.3.5), whereas the tilde ~ means the dependency will update to the latest patch version releative to 1.2.1 (e.g. 1.2.7).

So that means that when you do npm install some-dependency, by default your package.json will be modified to do caret-updating rather than the more humble tilde-updating. But in any case, the default has traditionally been some flavor of auto-updating.

Life in Javaland

A comparison with Java is useful. The standard packaging/dependency system in the Java world is Maven, where it’s always been recommended to nail down your dependencies to a very precise version in your pom.xml.

<dependency>
  <groupId>com.somecompany</groupId>
  <artifactId>somepackage</artifactId>
  <version>4.2.0</version>
</dependency>

This is partly a reflection of Java’s enterprisey-ness, where the thought of auto-updating would probably horrify the local Chief Security Officer (or whatever). But it’s also a reflection of the fact that Maven predates Semver’s current boom in popularity, back in the bad old days when there were few expectations about what might change between versions.

However, if you actually want to use auto-updating, Maven does have so-called “snapshot” dependencies (e.g. 1.2.1-SNAPSHOT), which can change anytime, according to the whims of the publisher. Usually these are only used for internal development, with the big uppercase letters SNAPSHOT designed to warn you in a stern voice that what you’re doing is dangerous.

Google actually flirted with auto-updating for awhile with the + modifier in Android dependencies (e.g. 1.2.1+), but now they’ve shied away from it, and if you try to add a + dependency in Android Studio, it’ll throw a big warning at you to let you know you should nail down your dependencies.

So okay, we have one community where auto-updating is a dirty word, and another where it’s the default. They can’t both be right, so as Node developers, let’s consider why we might want to be suspicious of auto-updating.

Drawback 1: minor/patch versions can still break something

Today I was the asshole who pushed a backward breaking change in a patch version. It happens. Use exact versions in your package.json deps

— RIP van t-witter (@antiserf) June 4, 2014

Since Node was one of the the first communities to really embrace Semver, it’s tempting to say “it’s different this time,” and that that’s why we can get away with auto-updating but nobody else can. However, humans make mistakes, and Semver isn’t as airtight of a guarantee as we’d like to believe.

When we publish a patch or a minor release, we try our darnedest not to include breaking changes, but sometimes a bug slips through. Or sometimes a change that we consider to be non-breaking actually turns out to be breaking for somebody else further downstream.

I help maintain a fairly large open-source project (PouchDB), so I’ve seen this play out plenty of times. It’s not common, but it happens. One day I push a new pull request, and suddenly the Travis tests are failing due to some mysterious error. My first instinct is to assume the bug is in the pull request, but after some digging I realize that the master branch is failing too. What gives?

Well, the author of the foo module, which is depended upon by the bar module, which is depended upon by PouchDB, decided to change something in a patch release, but we weren’t prepared for it upstream, so now our tests are broken. This is an annoying situations to debug, because a git bisect is not enough. The same exact code that worked yesterday is broken today.

Typically what I do in these cases is step through the code to identify the offending module, check to see when the last master branch was pushed, try to figure out what versions of that module were published in the interim, and then either file a bug on that module or write new code to work around it.

It’s a tedious process. And it can be especially irritating, because when you’re writing a PR, you’re usually trying to fix some unrelated issue. Hunting down bugs in somebody else’s module is just an unwelcome distraction.

Drawback 2: it’s a security problem

This is such a big issue, I’m surprised nobody else in the Node community has mentioned it, to my knowledge.

npm has made it trivially easy to publish modules, which is awesome. I love that when I want to publish a new Node module, it’s just an npm publish away. Whereas if I want to publish a Java project to Maven Central, there’s a lot of ceremony in configuring my Maven credentials, doing a gradle uploadArchives, and then clicking around in the Sonatype Nexus interface. It’s a pain.

npm’s ease-of-use has a weakness, though. Given that the majority of Node projects use caret- or tilde-versioning, and given that it’s so easy to npm publish, what’s to stop some nogoodnik from stealing a prolific Node developer’s laptop (let’s say Substack or Mikeal Rogers), and then publishing some malware as a patch release to all their popular libraries? Bam, suddenly everybody’s continuous integration systems are downloading malware from npm and pushing it out to thousands of running systems.

You may trust Substack, but do you trust that he’s secured his laptop?

Of course, if you avoid caret- and tilde-versioning in your package.json, then this isn’t a problem. You can already inspect the code you’re running, and make sure you trust it. One might argue that this is the “more secure” approach, but that would negate one of auto-updating’s main selling points, which is that patch releases can supposedly contain security patches.

Drawback 3: auto-updating has a limited shelf life

This is the point I would really like to get across to other Node developers.

Right now it’s mostly fine for dependencies to break upstream, because we can remain pretty confident that if we file a bug on a project, the author will respond quickly and fix it.

For instance, a month ago I found a bug in Express, and not only did the maintainer (the awesome Doug Wilson) fix it in a matter of minutes, he also took it upon himself to come into the express-pouchdb project and submit a bunch of PRs. Experiences like that really exemplify what’s great about OSS development.

However, right now Node and npm are in their heyday. Changes are coming fast and furious, the community is active and engaged, and so of course caret- and tilde-versioning are pretty low-risk. Even if a bug is introduced in a minor or patch version upstream, it’ll probably get resolved quickly.

Imagine a future after the current boom, though, where npm occupies a position more like CPAN – still useful, but long in the tooth. Popular modules have fallen into disrepair, GitHub issues go unresolved. Maybe everyone’s moved on to Go.

In this post-apocalpytic future for Node, I can easily imagine developers saying, “Oh yeah, npm? That’s that thing where whenever you require() something, you have to immediately go in and remove all the tildes and carets.” Or worse, maybe someone will have to write a proxy in front of npm to act as a sort of Wayback Machine, shrinkwrapping each module to the dependencies it had when it was last published.

Don’t kid yourself, Noders – someday this future will be upon us. Project maintainers will eventually lose interest, move on to other projects, or maybe find that the obligations of family/work/whatever have reduced their ability to respond to bugs on GitHub. Maintainers will even die – yes, young coder, you too are mortal – and ideally whatever software we write should remain useful even after we’re gone. Ideally.

I don’t have the answers, but I do know that we as a community need to start preparing for eventualities like this. Right now it may feel like the party’s never going to end, but eventually the booze will run out, the music will stop, and we will have to make a sober evaluation of our software’s legacy. Recognizing the limitations of Semver and caret- and tilde-versioning are a step in that direction.

Postscript: why do people rebel against Semver?

I have a hunch about this: I think it’s because the larger culture hasn’t adjusted to Semver yet.

For someone steeped in Node practices, it may be obvious that version 3.0.0 of a module has introduced breaking changes since 2.0.0. To the average layman, though, a major version change indicates some big overhaul of the software, along with a slew of new features. This is a holdover from the shrinkwrap era, when a new major version meant a shiny new box in the store, and it’s still the prevailing view in popular understanding: “web 2.0,” “government 2.0,” etc.

What Semver ignores is that bumping a major version has marketing value.

We definitely experienced that recently in PouchDB. I found it funny that after we released version 3.0.0, we suddenly got a lot more traffic and stargazers, and we were even featured in JavaScript Weekly. However, the biggest change in 3.0.0 was subtracting features – that’s why we incremented the major version!

By contrast, the previous version, 2.2.3, constituted a huge internal restructuring that brought better stability, but you wouldn’t really know it, since it was just a patch version. And it got much less attention than 3.0.0.

I suppose the payoffs from incrementing a major version may dwindle once you get into Chrome-like territory with version 36, 37, etc., but for the low versions, it definitely seems to help boost your project’s public visibility.

Read the Tea Leaves Software and other dark arts, by Nolan Lawson