npm | Read the Tea Leaves

Posts Tagged ‘npm’

19 Oct

The struggles of publishing a JavaScript library

Posted by Nolan Lawson in Webapps. Tagged: javascript, modules, npm. 19 comments

If you’ve done any web development in the past few years, then you’ve probably typed something like this:

$ bower install jquery

Or maybe even:

$ npm install --save lodash

For anyone who remembers the dark days of combing Github for jQuery plugins, this is a miracle. But as with all software, somebody had to write that code in order for you to be able to download it. And in the case of tools like Bower and npm, somebody also had to do the legwork to publish it. This is one of their stories.

The Babelification of JavaScript

I tweeted this recently:

https://twitter.com/nolanlawson/status/653610332989059072

I got some positive feedback, but I also saw some incredulous responses from people telling me I only need to support npm and CommonJS, or more snarkily, that supporting “just JavaScript” is good enough. As a fairly active open-source JavaScript author, though, I’d like to share my thoughts on why it’s not so simple.

The JavaScript module ecosystem is a mess these days. For module definitions, we have AMD, UMD, CommonJS, globals, and ES6 modules ¹. For distribution, we have npm, Bower, and jspm, as well as CDNs like cdnjs, jsDelivr, and Github itself. For translating between Node and browser code, we have Browserify, Webpack, and Rollup.

Supporting each of these categories comes with its own headaches, but before I delve into that, here’s my take on how we got into this morass in the first place.

What is a JS module?

For the longest time, JavaScript didn’t have any commonly-accepted module system, so the most straightforward way to distribute your code was as a global variable. jQuery plugins also worked this way – they would just look for the global window.$ or window.jQuery and hook themselves onto that.

But thanks largely to Node and the influx of people who care about highfalutin computer-sciencey stuff like “not polluting the global namespace,” we now have a lot more ways of modularizing our code. npm is famous for using CommonJS, with its module.exports and require(), whereas other tools like RequireJS use an alternative format called AMD, known for its define() and asynchronous loading. (It’s never ceased to confuse me that RequireJS is the one that doesn’t use require().) There’s also UMD, which seeks to harmonize all of them (the “U” stands for “universal”).

In practice, though, there’s no good “universal” way to distribute your code. Many libraries try to dynamically determine at runtime what kind of environment they’re in (here’s a pretty gnarly example), but this makes modularizing your own code a headache, because you have to repeat that boilerplate anywhere you want to split up your code into separate files.

More recently, I’ve seen a lot of modules migrate to just using CommonJS everywhere, and then bundling it up for distribution with Browserify. This can be fraught with its own difficulties though, if you aren’t aware of the subtleties of how your code gets consumed. For instance, if you use Browserify’s --standalone flag (-s), then your code will get built as an AMD-ready, UMD-ready, and globals-ready bundle file, but you might not think to add it as a build step, because the stated use of the --standalone flag is to create a global variable ².

However, my new personal policy is to use this flag everywhere, even when I can’t think of a good global variable name, because that way I don’t get issues filed on me asking for AMD support or UMD support. (Speaking of which, it still tickles me that someone had to actually open an issue asking me to support a supposedly “universal” module system. Not so universal after all, is it!)

Package managers and pseudo-package managers

So let’s say you go the CommonJS + Browserify route: now you have an interesting problem, which is that you have both a “source” version and a “distributed” version of your code. (Commonly these are organized into a src/lib folder and a dist folder, but those are just conventions.) How do you make sure your users get the right one?

npm is a package manager that expects CommonJS modules, so typically in your package.json, you set the "main" key to point to whatever your source "src/index.js" file is. Bower, however, expects a bundle file that can be directly included as a <script> tag, so in that case you’ll want to set the "main" inside the bower.json to point instead to your "dist/mypackage.js" or "dist/mypackage.min.js" file. jspm complicates things further by defaulting to npm’s package.json file while actually expecting non-CommonJS modules, but you can override that behavior by including a {"jspm": "main": "dist/mypackage.js"}} in your package.json. Whew! We’re all done, right?

Not so fast. As it turns out, Bower isn’t really a package manager so much as a CLI over Github. What that means is that you actually need to check your bundle files into Git, to ensure that those dist/ files are available to Bower users. At the same time, you’ll have to be very cognizant not to check in anything you don’t want people to download, because Bower’s "ignore" list doesn’t actually avoid downloading anything; it just deletes the ignored files after they’re downloaded, which can lead to some enormous Bower downloads. Couple this with the fact that you’re probably also juggling .gitignore files and .npmignore files, and you can end up with some fairly complicated release scripts!

Of course, many users will also just download your bundle file from Github. So it’s important to be consistent with your Git tags, so that you can have a nice tidy Github releases page. As it turns out, Bower will also depend on those Git tags to determine what a “release” is – actually, it flat-out ignores the "version" field in bower.json. To make sense of all this complexity, our policy with PouchDB is to just do an explicit commit with the version tag that isn’t even a part of the project’s main master branch, purely as a “release commit” for Bower and Github.

What about CDNs?

Github discourages using their hosted JavaScript files directly from <script> tags (in fact their HTTP headers make it impossible), so often users will ask if they can consume your library via a CDN. CDNs are also great for code snippets, because you can just include a <script> tag pointing to the latest CDN release. So lots of libraries (including PouchDB) also support jsDelivr and cdnjs.

You can add your library manually, but in my experience this is a pain, because it usually involves checking out the entire source for the CDN (which can be many gigabytes) and then opening a pull request with your library’s code. So it’s better to follow their automated instructions so that they can automatically update whenever your code updates. Note that both jsDelivr and cdnjs rely on Git tags, so the above comments about Github/Bower also apply.

Correction: Both jsDelivr and cdnjs can be configured to point to npm instead of Github; my mistake! The same applies to jspm.

Browser vs Node

For anyone who’s written a popular JavaScript library, the situation inevitably arises that someone tries to use your Node-optimized library in the browser, or your browser-optimized library in Node, and invariably they run into issues.

The first trick you might employ, if you’re working with Browserify, is to add if/else switches anytime you want to do something differently in Node or the browser:

function md5(str) {
  if (process.browser) {
    return require('spark-md5').hash(str);
  } else {
    return require('crypto').createHash('md5').update(str).digest('hex');
  }
}

This is convenient at first, but it causes some unexpected problems down the line.

First off, you end up sending unnecessary Node code to the browser. And especially if the Browserified version of your dependencies is very large, this can add up to a lot of bytes. In the example above, Browserifying the entire crypto library comes out to 93KB (after uglify+gzip!), whereas spark-md5 is only 2.6KB.

The second issue is that, if you are using a tool like Istanbul to measure your code coverage, then properly measuring your coverage in Node can lead to a lot of /* istanbul ignore next */ comments all over the place, so that you can avoid getting penalized for browser code that never runs.

My personal method to avoid this conundrum is to prefer the "browser" field in package.json to tell Browserify/Webpack which modules to swap out when building. This can get pretty complicated (here’s an example from PouchDB), but I prefer to complicate my configuration code rather than my JavaScript code. Another option is to use Calvin Metcalf’s inline-process-browser, which can automatically strip out process.browser switches ³.

You’ll also want to be careful when using Browserify transforms in your code; any transforms need to be a regular dependency rather than a devDependency, or else they can cause problems for library users.

Wait, you tried to run my code where?

After you’ve solved Node/browser switching in your library, the next hurdle you’ll likely encounter is that there is some unexpected bug in an exotic environment, often due to globals.

One way this might manifest itself is that you expect a global window variable to exist in the browser – but oh no, it’s not there in a web worker! So you check for the web worker’s self as well. Aha, but NW.js has both a Node-style global and browser-style window as global variables, so you can’t know in advance which other globals (such as Promise or console) are attached to which! Then you can get into even stranger environments like iOS’s JSCore (which is used by React Native), or Electron, or Qt WebKit, or Rhino/Nashorn, or Java FXWebView, or Adobe Air…

If you want to see what kind of a mess this can create, check out these lines of code from Lodash, and weep for poor John-David Dalton!

My own solution to this issue is to never ever check for window or global or anything like that if I can avoid it, and instead use typeof whatever === 'undefined' to check. For instance, here’s my typical Promise shim:

function PromiseShim() {
  if (typeof Promise !== 'undefined') {
    return Promise;
  }
  return require('lie');
}

Trying to access a global variable that doesn’t exist is a runtime error in most JavaScript environments, but using the typeof check will prevent the error.

Browserify vs Webpack

100% of my experience with webpack is users opening issues on my libraries because it does something subtly different then @browserify

— Professional Calvin Architect (@CWMma) October 12, 2015

Most library authors I know tend to prefer Browserify for building JavaScript modules, but especially with the rise of React and Flux, Webpack is increasingly becoming a popular option.

Webpack is mostly consistent with Browserify, but there are points of divergence that can lead to unexpected errors when people try to require() your library from Webpack. The best way to test is to simply run webpack on your source CommonJS file and see if you get any errors.

In the worst case, if you have a dependency that doesn’t build with Webpack, you can always tell users to specify a custom loader to work around the issue. Webpack tends to give more control to the end-user than Browserify does, so the best strategy is to just let them build up your library and dependencies however they need to.

Enter ES6

This whole situation I’ve described above is bad enough, but once you add ES6 to the mix, it gets even more complicated. ES6 modules are the “future-proof” way of authoring JavaScript, but as it stands, there are very few tools that can consume ES6 directly, including most versions of Node.

(Yes, even if you are using Node 4.x with its many lovely ES6 features like Promises and arrow functions, there are still some missing features, like spread arguments and destructuring, that are not supported by V8 yet.)

So, what many ES6 authors will do is add a "prepublish" script to build the ES6 source into a version consumable by Node/npm (here’s an example). (Note that your "main" field in package.json must point to the Node-ready version, not the ES6 version!) Of course, this adds a huge amount of additional complexity to your build script, because now you have three versions of your code: 1) source, 2) Node version, and 3) browser version.

When you add an ES6 module bundler like Rollup, it gets even hairier. Rollup is a really cool bundler that offers some big benefits over Browserify and Webpack (such as smaller bundle sizes), but to use it, it expects your library’s dependencies to be exported in the ES6 format.

Now, because npm normally expects CommonJS, not ES6 modules, there is an informal “jsnext:main” field that some libraries use to point to their ES6 source. Usage is not very widespread, though, so if any of your dependencies don’t use ES6 or don’t have a "jsnext:main", then you’ll need to use Rollup’s --external flag when bundling them so that it knows to ignore them.

"jsnext:main" is a nice hack, but it also brings up a host of unanswered questions, such as: which features of ES6 are supported? Is it a particular stage of recommendation for the spec, ala Babel? What about popular ES7 features that are already starting to creep into codebases that use Babel, such as async/await? It’s not clear, and I don’t think this problem will be resolved until npm takes a stance one way or the other.

Making sense of this mess

At the end of the day, if your users want your code bad enough, then they will find a way to consume it. In the worst case scenario, they can just copy-paste your code from Github, which is how JavaScript was consumed for many years anyway. (StackOverflow was a decent package manager long before cooler kids like npm and Bower came along!)

Many folks have advised me to just support npm and CommonJS, and honestly, for my smaller modules I’m doing just that. It’s simply too much work to try to support everything at once. As an example of how complicated it is, I’ve created a hello-javascript module that only contains the code you need to support all the environments above. Hopefully it will help someone trying to figure out how to publish to multiple targets.

If you happen to be thinking about hopping into the world of JavaScript library authorship, though, I recommend starting with npm’s publishing guide and working your way up from there. Trying to support every JavaScript user on the planet is an ambitious proposition, and you don’t want to wear yourself out when you’re having enough trouble testing, writing documentation, checking code coverage, triaging issues, and hey – at some point, you’ll also need to write some code!

But as with everything in software, the best advice is to focus on the user and all else will follow. Don’t listen to the naysayers who tell you that Bower users are “wrong” and you’re doing them a favor by “educating” them ⁴. Work with your users to try to support their use case, and give them alternatives if they’re unsatisfied with your current publishing approach. (I really like wzrd.in for on-demand Browserification.)

To me, this is somewhat like accessibility. Some users only know Bower, not npm, or maybe they don’t even understand the difference between the two! Others might be unfamiliar with the command line, and in that case, a big reassuring “Download” button on a github.io page might be the best way to accommodate them. Still others might be power users who will try to include your ES6 code directly and then Browserify it themselves. (Ask those users for a pull request!)

At the end of the day, you are giving away your labor for free, so you shouldn’t feel obligated to bend over backwards for anybody. But if your driving motivation is to make your code as usable as possible for other people, then I’d say you can’t go wrong by supporting the two most popular options: direct downloads for casual users, and npm/CommonJS for power users. If your library grows in popularity, you can always worry about the thousand and one other methods later. ⁵

Thanks to Calvin Metcalf, Nick Colley, and Colin Skow for providing feedback on a draft of this post.

Footnotes

1. I’ve seen no compelling reason to call it “ES2015,” except to signal my own status as a smarty-pants. So I don’t.

2. Another handy tool is derequire, which can remove all require()s from your bundle to ensure it doesn’t get re-interpreted as a CommonJS module.

3. Calvin Metcalf pointed out to me that you can also work around this issue by using crypto sub-modules, e.g. require('crypto-hash'), or by fooling Browserify via require('cryp' + 'to').

4. With npm 3, many developers are starting to declare Bower to be obsolete. I think this is mostly right, but there are still a few areas where Bower beats npm. First off, for isomorphic libraries like PouchDB, an npm install can be more time-consuming and error-prone than a bower install, due to native LevelDB dependencies that you’ll never need if you’re only using PouchDB on the frontend. Second, not all libraries are publishing their dist/ code to npm, meaning that former Bower users would have to learn the whole Browserify/Webpack stack rather than just include a <script> tag. Third, not all Bower modules are even on npm – Ionic framework is a popular one that springs to mind. Fourth, there’s the social cost of migrating folks from Bower to npm, throwing away a wealth of tutorials and accumulated knowledge in the process. It’s not so simple to just tell people, “Okay, now start using npm instead of Bower.”

5. I’ve ragged a lot on the JavaScript community in this post, but I still find authoring for JavaScript to be a very pleasurable experience. I’ve been a consumer of Python, Java, and Perl modules, as well as a publisher of Java modules, and I still find npm to be the nicest to work with. The fact that my publish process is as simple as npm version patch|minor|major plus a npm publish is a real dream compared to the somewhat bureaucratic process for asking permission to publish to Maven Central. (If I ever have to see the Sonatype Nexus web UI again, I swear I’m going to hurl.)

1 Sep

The limitations of semantic versioning

Posted by Nolan Lawson in Webapps. Tagged: node, npm, semver. Leave a comment

With the recent underscore 1.7.0 brouhaha, there’s been a lot of discussion about the value of semantic versioning. Most of the JavaScript community seems to take the side of Semver, with Dominic Tarr even offering a satirical Sentimental Versioning spec.

Semver is so deeply entrenched in the Node community, that it’s hard to question it without making yourself an easy target for ridicule. Plus, much of the value of Semver comes from everybody collectively agreeing on it, so as with vaccines, dissenters risk being labeled as a danger to the community at large.

To me, though, most of this discussion is missing the point. The issue is not semantic versioning, but rather the build systems we’ve created that assume and promote automatic updates based on semantic versioning – i.e. npm. We wouldn’t be so worried about a breaking change in underscore 1.7.0 if thousands of projects weren’t primed to auto-update their underscore dependencies.

As a developer, I divide my time pretty evenly between Java and JavaScript, so I may have unique perspective here. I love the npm and Node communities, and I’ve been happily using and publishing modules for the past year or so. But as a community, I think it’s time we started being honest with ourselves about what Semver and auto-updating are actually buying us.

Recap of auto-updating

Recently, npm changed its default settings to automatically add dependencies like this:

"some-dependency" : "^1.2.1"

instead of like this:

"some-dependency" : "~1.2.1"

The caret ^ means the dependency will automatically update to the latest minor version relative to 1.2.1 (e.g. 1.3.5), whereas the tilde ~ means the dependency will update to the latest patch version releative to 1.2.1 (e.g. 1.2.7).

So that means that when you do npm install some-dependency, by default your package.json will be modified to do caret-updating rather than the more humble tilde-updating. But in any case, the default has traditionally been some flavor of auto-updating.

Life in Javaland

A comparison with Java is useful. The standard packaging/dependency system in the Java world is Maven, where it’s always been recommended to nail down your dependencies to a very precise version in your pom.xml.

<dependency>
  <groupId>com.somecompany</groupId>
  <artifactId>somepackage</artifactId>
  <version>4.2.0</version>
</dependency>

This is partly a reflection of Java’s enterprisey-ness, where the thought of auto-updating would probably horrify the local Chief Security Officer (or whatever). But it’s also a reflection of the fact that Maven predates Semver’s current boom in popularity, back in the bad old days when there were few expectations about what might change between versions.

However, if you actually want to use auto-updating, Maven does have so-called “snapshot” dependencies (e.g. 1.2.1-SNAPSHOT), which can change anytime, according to the whims of the publisher. Usually these are only used for internal development, with the big uppercase letters SNAPSHOT designed to warn you in a stern voice that what you’re doing is dangerous.

Google actually flirted with auto-updating for awhile with the + modifier in Android dependencies (e.g. 1.2.1+), but now they’ve shied away from it, and if you try to add a + dependency in Android Studio, it’ll throw a big warning at you to let you know you should nail down your dependencies.

So okay, we have one community where auto-updating is a dirty word, and another where it’s the default. They can’t both be right, so as Node developers, let’s consider why we might want to be suspicious of auto-updating.

Drawback 1: minor/patch versions can still break something

Today I was the asshole who pushed a backward breaking change in a patch version. It happens. Use exact versions in your package.json deps

— RIP van t-witter (@antiserf) June 4, 2014

Since Node was one of the the first communities to really embrace Semver, it’s tempting to say “it’s different this time,” and that that’s why we can get away with auto-updating but nobody else can. However, humans make mistakes, and Semver isn’t as airtight of a guarantee as we’d like to believe.

When we publish a patch or a minor release, we try our darnedest not to include breaking changes, but sometimes a bug slips through. Or sometimes a change that we consider to be non-breaking actually turns out to be breaking for somebody else further downstream.

I help maintain a fairly large open-source project (PouchDB), so I’ve seen this play out plenty of times. It’s not common, but it happens. One day I push a new pull request, and suddenly the Travis tests are failing due to some mysterious error. My first instinct is to assume the bug is in the pull request, but after some digging I realize that the master branch is failing too. What gives?

Well, the author of the foo module, which is depended upon by the bar module, which is depended upon by PouchDB, decided to change something in a patch release, but we weren’t prepared for it upstream, so now our tests are broken. This is an annoying situations to debug, because a git bisect is not enough. The same exact code that worked yesterday is broken today.

Typically what I do in these cases is step through the code to identify the offending module, check to see when the last master branch was pushed, try to figure out what versions of that module were published in the interim, and then either file a bug on that module or write new code to work around it.

It’s a tedious process. And it can be especially irritating, because when you’re writing a PR, you’re usually trying to fix some unrelated issue. Hunting down bugs in somebody else’s module is just an unwelcome distraction.

Drawback 2: it’s a security problem

This is such a big issue, I’m surprised nobody else in the Node community has mentioned it, to my knowledge.

npm has made it trivially easy to publish modules, which is awesome. I love that when I want to publish a new Node module, it’s just an npm publish away. Whereas if I want to publish a Java project to Maven Central, there’s a lot of ceremony in configuring my Maven credentials, doing a gradle uploadArchives, and then clicking around in the Sonatype Nexus interface. It’s a pain.

npm’s ease-of-use has a weakness, though. Given that the majority of Node projects use caret- or tilde-versioning, and given that it’s so easy to npm publish, what’s to stop some nogoodnik from stealing a prolific Node developer’s laptop (let’s say Substack or Mikeal Rogers), and then publishing some malware as a patch release to all their popular libraries? Bam, suddenly everybody’s continuous integration systems are downloading malware from npm and pushing it out to thousands of running systems.

You may trust Substack, but do you trust that he’s secured his laptop?

Of course, if you avoid caret- and tilde-versioning in your package.json, then this isn’t a problem. You can already inspect the code you’re running, and make sure you trust it. One might argue that this is the “more secure” approach, but that would negate one of auto-updating’s main selling points, which is that patch releases can supposedly contain security patches.

Drawback 3: auto-updating has a limited shelf life

This is the point I would really like to get across to other Node developers.

Right now it’s mostly fine for dependencies to break upstream, because we can remain pretty confident that if we file a bug on a project, the author will respond quickly and fix it.

For instance, a month ago I found a bug in Express, and not only did the maintainer (the awesome Doug Wilson) fix it in a matter of minutes, he also took it upon himself to come into the express-pouchdb project and submit a bunch of PRs. Experiences like that really exemplify what’s great about OSS development.

However, right now Node and npm are in their heyday. Changes are coming fast and furious, the community is active and engaged, and so of course caret- and tilde-versioning are pretty low-risk. Even if a bug is introduced in a minor or patch version upstream, it’ll probably get resolved quickly.

Imagine a future after the current boom, though, where npm occupies a position more like CPAN – still useful, but long in the tooth. Popular modules have fallen into disrepair, GitHub issues go unresolved. Maybe everyone’s moved on to Go.

In this post-apocalpytic future for Node, I can easily imagine developers saying, “Oh yeah, npm? That’s that thing where whenever you require() something, you have to immediately go in and remove all the tildes and carets.” Or worse, maybe someone will have to write a proxy in front of npm to act as a sort of Wayback Machine, shrinkwrapping each module to the dependencies it had when it was last published.

Don’t kid yourself, Noders – someday this future will be upon us. Project maintainers will eventually lose interest, move on to other projects, or maybe find that the obligations of family/work/whatever have reduced their ability to respond to bugs on GitHub. Maintainers will even die – yes, young coder, you too are mortal – and ideally whatever software we write should remain useful even after we’re gone. Ideally.

I don’t have the answers, but I do know that we as a community need to start preparing for eventualities like this. Right now it may feel like the party’s never going to end, but eventually the booze will run out, the music will stop, and we will have to make a sober evaluation of our software’s legacy. Recognizing the limitations of Semver and caret- and tilde-versioning are a step in that direction.

Postscript: why do people rebel against Semver?

I have a hunch about this: I think it’s because the larger culture hasn’t adjusted to Semver yet.

For someone steeped in Node practices, it may be obvious that version 3.0.0 of a module has introduced breaking changes since 2.0.0. To the average layman, though, a major version change indicates some big overhaul of the software, along with a slew of new features. This is a holdover from the shrinkwrap era, when a new major version meant a shiny new box in the store, and it’s still the prevailing view in popular understanding: “web 2.0,” “government 2.0,” etc.

What Semver ignores is that bumping a major version has marketing value.

We definitely experienced that recently in PouchDB. I found it funny that after we released version 3.0.0, we suddenly got a lot more traffic and stargazers, and we were even featured in JavaScript Weekly. However, the biggest change in 3.0.0 was subtracting features – that’s why we incremented the major version!

By contrast, the previous version, 2.2.3, constituted a huge internal restructuring that brought better stability, but you wouldn’t really know it, since it was just a patch version. And it got much less attention than 3.0.0.

I suppose the payoffs from incrementing a major version may dwindle once you get into Chrome-like territory with version 36, 37, etc., but for the low versions, it definitely seems to help boost your project’s public visibility.

Read the Tea Leaves Software and other dark arts, by Nolan Lawson