Progressive enhancement isn’t dead, but it smells funny

Update: this blog post sparked a lively debate. You may want to read the responses from Laurie Voss, Jeremy Keith, Aaron Gustafson, and Christian Heilmann.

Progressive enhancement is a touchy subject. It can be hard to discuss dispassionately because, like accessibility, it’s often framed as an issue of empathy and compassion:

The insinuation is that if you don’t practice progressive enhancement, then maybe you’re just a careless elitist, building slick websites for the wealthy 1% who have modern browsers and fast connections, and offering only a sneering “let them eat cake” to everybody else. Using progressive enhancement can be seen as a moral decision as much as a technical one.

So what exactly is progressive enhancement? At the risk of grossly oversimplifying things, here are the two major interpretations I’ve seen:

  1. Broad version: start with a baseline of functionality, enhance upwards based on capabilities.
  2. Narrow version: start with HTML, then add CSS, then add JavaScript.

In this post, I’d like to argue that, while the broad version of progressive enhancement is still enormously useful, the narrow version doesn’t make much sense in the modern context of web applications running on smartphones in evergreen browsers. It doesn’t make sense for the western world, it doesn’t make sense for the developing world, and it doesn’t make sense given the state of web tooling. It is a holdover from a bygone era, repeated endlessly by those who have not recognized that the world has moved on without it, and publicly unchallenged by those who have already privately (although perhaps guiltily) abandoned it.

Before making my case, though, let’s explore the meaning of “progressive enhancement” a bit more.

What even is progressive enhancement?

Ask 10 different people, and you’ll likely get 10 different definitions of progressive enhancement. One of the main points of contention, though, is around whether or not a website should work without JavaScript.

In a poll by Remy Sharp, he says, “out of 800 responses, 25% said that progressive enhancement was making the site work without JavaScript.” This viewpoint is apparently shared by PE advocates who disable JavaScript and are disturbed by what they see. (Spoiler alert: many top websites do not bother to make their core functionality work without JavaScript.)

There are plenty of progressive enhancement “moderates,” though, who don’t take such a hard-line stance. Jake Archibald says “each phase of the enhancement needs a user,” and that sometimes a no-JS version wouldn’t have any users at all. Paul Lewis is a big fan of progressive rendering for performance reasons, and given the popularity of server-side React, Ember FastBoot, and Angular 2 universal JavaScript, I’d say plenty of web developers agree with them.

For many proponents of progressive enhancement, however, the issue of JavaScript remains a “magic line that must not be crossed.” I discovered this myself when I somewhat clumsily crossed this line, live on stage at Fronteers Conference in Amsterdam. I had a slide in my talk that read:

In 2016, it’s okay to build a website that doesn’t work without JavaScript.

To me, and to the kind of JavaScript-focused crowd I run with, this isn’t such a controversial statement. For the majority of websites I’ve built in my career, the question of how it functions without JavaScript has been largely irrelevant (except from a performance perspective).

However, it turned out that Fronteers was perhaps the crowd least likely to be amenable to this particular message. When I showed this slide, all hell broke loose:

The condemnation was as swift as it was vocal. Many prominent figures in the web community – Eric Meyer, Sara Soueidan, Jen Simmons – felt compelled not only to disagree with me, but to disagree loudly and publicly. Subtweets and dot-replies ran rampant. As one commentator put it, “you’d swear you had killed a baby panda the way some people react.”

Now, I have nothing against these folks personally. (In fact, I’m a big fan of Jen Simmons’ Web Ahead podcast, and of Sara Soueidan’s articles.) But the fact that their reaction wasn’t just of disagreement but of anger or exasperation is worth dissecting. I believe it harks back to what I said earlier about progressive enhancement being conflated with access – the assumption is that I’m probably just some privileged white dude advocating for a kind of web design that leaves anyone who’s less fortunate out in the cold.

Is that really true, though? Is JavaScript actually harmful for a certain segment of web users? As Jake Archibald pointed out, it’s not really about users who have disabled JavaScript, so who exactly are we helping when we make our websites work without it?

Progressive enhancement for the next billion

Tom Dale (who once famously declared progressive enhancement to be dead, but has softened since then) has a fabulous talk that pretty much cemented my thinking on progressive enhancement, so this next section owes a huge debt to him.

As Benedict Evans has noted, the next billion people who are poised to come online will be using the internet almost exclusively through smartphones. And if Google’s plans with Android One are any indication, then we have a fairly good idea of what kind of devices the “next billion” will be using:

  • They’ll mostly be running Android.
  • They’ll have decent specs (1GB RAM, quad-core processors).
  • They’ll have an evergreen browser and WebView (Android 5+).
  • What they won’t have, however, is a reliable internet connection.

In a world where your lowest common denominator is a very capable browser with a modern JavaScript engine, running on a smartphone that would have been classified as desktop-class ten years ago, but the network is now the bottleneck, what does that mean for progressive enhancement?

Simple: it means that, if you care about those users, you should be focusing on offline-first, i.e. treating the network as an enhancement. After the first load (which yes, should be server-rendered via isomorphic JavaScript), you’ll want to run as much logic as possible on the user’s device so that it can operate autonomously – regardless of whether the network conditions are good, bad, or nonexistent. And today, the way we accomplish this on the web is by using IndexedDB and Service Workers, i.e. with JavaScript.

Personally, I’ve found this method remarkably effective for building performant progressive web apps. I find that, by starting with a baseline of a low-end Android phone, throttled to 2G or 3G, and using that as my primary test device, I can build a web application that works quite well on a variety of hardware, browser, and network conditions. Optimizing for such devices tends to naturally lead to a heavily client-side approach, because by avoiding network round-trips the UI interactions become snappy and app-like. And thanks to advances in JavaScript frameworks, it’s easier than ever to move UI logic from the client to the server (using Node.js), in order to achieve a fast initial render.

The insight of offline-first is that, when you optimize for conditions where the network is unavailable, you end up making a better experience for everyone, even those on blazing-fast connections. The local cache is nearly always faster than the network, and even users on supposed “4G” connections will occasionally experience some amount of 2G speeds or offline, so the local cache is a good bet for them as well.

Offline-first is a form of progressive enhancement that directly targets the baseline experience that a high-quality progressive web app ought to support, rather than focusing on the more reductionist mindset of “first HTML, then CSS, then JavaScript.”

Truly robust web apps

Tom Dale and I aren’t the only ones who have come to this conclusion. The Chrome team has been pushing both for offline-first and the app shell architecture, which advocates for a server-rendered “shell” that then manages most of the actual app content using JavaScript. This is the way that most progressive web apps are being built these days, including applications designed for folks in developing countries, by folks in developing countries.

To demonstrate, here are screenshots of the progressive web apps Housing.com (India), Konga (Nigeria), and Flipkart (India), each with JavaScript deliberately disabled. What you’ll notice is that the authors of these apps have elected to show their script-less users an endless loading state. The “no-JS” case is clearly irrelevant to them, even if the offline case is not. (Each of these apps uses a Service Worker to cache data offline, and works fabulously well when JavaScript is enabled.)

Screenshots of Housing.com, Konga, and Flipkart without JavaScript

Screenshots of Housing.com, Konga, and Flipkart without JavaScript.

Now, you might argue, as Jeremy Keith has in “Regressive web apps,” that maybe these folks have been led astray by Google’s cheerleading for the app-shell model, and in fact it’d be nice to see some examples of PWAs that don’t require JavaScript. In his words:

“I hope we’ll see more examples of Progressive Web Apps that don’t require JavaScript to render content.”

My question to Jeremy, however, is: why? Why is it considered an unqualified good to make a website that works without JavaScript? Is it possible that this mindset – “start with HTML, add CSS, sprinkle on JavaScript” – is only a best practice in a world of incapable browsers (such as IE6) hooked up to stable desktop connections, and now that we’re in a world of smart, JavaScript-enabled browsers with unreliable connections, we need to re-evaluate our thinking?

I help maintain an open-source project called PouchDB, which enables offline storage and synchronization using (you guessed it) JavaScript. One of the more interesting projects PouchDB has been involved in was the fight against Ebola in West Africa, where it was used in an Angular app to store patient data offline (including symptom and treatment details, as well as photos), which were then synced to the server whenever a connection was re-established. In a region of the world where the network was neither fast nor reliable, this was a key feature.

Now, even with some very clever usage of AppCache, there’s no way the authors of this app could have built such an offline experience without JavaScript. And yet, it was perfectly adapted for the task at hand, and I’m proud that PouchDB actually played a role in stamping out Ebola. For anyone who is convinced that “first HTML, then CSS, then JavaScript” is the best approach for users in developing countries, I’d encourage them to actually talk to folks building apps for that demographic, and ask them if they don’t find offline-first (with JavaScript!) to be a more effective strategy.

My assertion is that, because of the reality of network and device conditions in those countries, the “HTML-first” approach has become almost completely obsolete (with the minor exception of server-side rendering), and the offline-first approach now reigns supreme. In those markets, PWAs as currently promoted are a big hit, which is clear from a fascinating Opera interview with developers in Nigeria, a Google report by Flipkart on their increased engagements with PWAs, and similar feedback from Konga.

The web is a big tent

On the other hand, I wouldn’t be so presumptuous as to say that I’ve unlocked The One True Way™ to build websites. I don’t believe every website needs to be an offline-first JavaScript app, any more than I believe every website needs to be an HTML5 canvas game, or a static blog, or a WebGL sandbox world, or whatever weirdo WebVR thing I might be strapping onto my face in the future.

When I said that in 2016 it’s okay to build a site that requires JavaScript, what I’m getting at is this: by 2016, the web has fundamentally changed. It’s expanded. It’s blossomed. There are more people building more kinds of websites than ever before, and no one-size-fits-all set of “best practices” can cut the mustard anymore.

There’s still plenty of room on the web for sites that rely primarily on HTML and CSS; in many cases, it’s still the best choice! Especially if it’s better suited to the skill set of your team, or if you’re primarily focused on static content, then “traditional” progressively-enhanced sites are a no-brainer. Such sites are certainly easier to manage and maintain than client-side webapps, and plus you often get accessibility and performance for free. Declarative languages like HTML and CSS are also easier to learn than imperative ones like JavaScript, and in many cases they’re also more robust. There are lots of good reasons to choose this architecture.

Different sites are optimized for different use cases, and I wouldn’t be so presumptuous as to tell folks that they all need to be building websites exactly the way I like them built. I certainly don’t think we should be chiding people for not building websites that work without JavaScript, or to make damning statements like this one:

“Pages that are empty without JS: dead to history, unreliable for search results, and thus ignorable. No need to waste time reading or responding.

This attitude, and others like it, stem from a set of myths about JavaScript that web developers have internalized based on conditions of the past. These days, JavaScript actually does run on search engines, in screen readers, and even in Opera Mini (for a strict but fair 5 seconds). JavaScript is a well-established part of the web platform – and unlike Flash (to which it’s often unflatteringly compared) JavaScript is standardized to ensure it will pass the test of time. Expending extra effort to make your website work without JavaScript is often not only fruitless; in the cases I mentioned above with PWAs, it can actually lead to a poorer user experience.

But besides just finding these attitudes wrong, I find them toxic. Any community that’s eager to tear each other down at the slightest whiff of unorthodoxy is not a community that I want to be a part of. I want the web to be a place where we celebrate each other’s accomplishments, where we remain ever curious and inquisitive and questioning, and where above all else, we make newcomers (who might not know everything already!) feel welcome. That’s the web I love – a big tent that’s always growing bigger.

Final thoughts

We as a community need to realize that the question of “JavaScript – yes or no?” is less about access and ubiquity, and more about performance and robustness. Only then can we stop all this ugly shaming and vitriol towards those who happen to choose JavaScript as their primary tool for building for the web. I believe that, once the moral and emotional dimension is removed, the issue of JavaScript can be more clearly viewed as just another tradeoff among the many tradeoffs we inevitably make when we build websites. So next time your gut instinct is to blame and shame: try to ask and understand instead.

And to the advocates of progressive enhancement: if you still believe requiring JavaScript is a universally bad idea, then don’t hesitate to voice that opinion! Any idea deserves to be evaluated in a public forum – that’s the only way we can suss out the good ideas from the bad ones. But purely on a strategic level, I would invite you to make your case in a less caustic and condescending way. Communities that welcome outsiders with open arms are usually better at winning converts than those that sneer and denigrate (something about flies and honey and vinegar and all that).

So if you believe in empathy – if you believe that the web is about building up good experiences for everyone, regardless of their background or ability – then I would encourage you to demonstrate that empathy, especially towards those you disagree with. Thankfully, I will admit that even those at Fronteers Conference who disagreed with me were perfectly polite and respectful in person; often these things only get out of hand on Twitter, which is not famous for enabling subtlety or nuance. So keep that in mind, and try to remember the human behind the keyboard.

The web is for everyone. The web platform is for everyone. Let’s keep it that way.

Thanks to Tom Dale, Jan Lehnardt, and Addy Osmani for reviewing a draft of this blog post.

Also, apologies to folks whose tweets I called out in this post, but I consider turnabout to be fair play. 😉 And to clarify: Sara Soueidan was perfectly courteous and respectful towards me online; I wouldn’t lump any of her comments in with the “caustic” ones.

The title is a reference to Frank Zappa.

The cost of small modules

Update (30 Oct 2016): since I wrote this post, a bug was found in the benchmark which caused Rollup to appear slightly better than it would otherwise. However, the overall results are not substantially different (Rollup still beats Browserify and Webpack, although it’s not quite as good as Closure anymore), so I’ve merely updated the charts and tables. Additionally, the benchmark now includes the RequireJS and RequireJS Almond bundlers, so those have been included as well. To see the original blog post without these edits, check out this archived version.

About a year ago I was refactoring a large JavaScript codebase into smaller modules, when I discovered a depressing fact about Browserify and Webpack:

“The more I modularize my code, the bigger it gets. 😕”
Nolan Lawson

Later on, Sam Saccone published some excellent research on Tumblr and Imgur‘s page load performance, in which he noted:

“Over 400ms is being spent simply walking the Browserify tree.”
Sam Saccone

In this post, I’d like to demonstrate that small modules can have a surprisingly high performance cost depending on your choice of bundler and module system. Furthermore, I’ll explain why this applies not only to the modules in your own codebase, but also to the modules within dependencies, which is a rarely-discussed aspect of the cost of third-party code.

Web perf 101

The more JavaScript included on a page, the slower that page tends to be. Large JavaScript bundles cause the browser to spend more time downloading, parsing, and executing the script, all of which lead to slower load times.

Even when breaking up the code into multiple bundles – Webpack code splitting, Browserify factor bundles, etc. – the cost is merely delayed until later in the page lifecycle. Sooner or later, the JavaScript piper must be paid.

Furthermore, because JavaScript is a dynamic language, and because the prevailing CommonJS module system is also dynamic, it’s fiendishly difficult to extract unused code from the final payload that gets shipped to users. You might only need jQuery’s $.ajax, but by including jQuery, you pay the cost of the entire library.

The JavaScript community has responded to this problem by advocating the use of small modules. Small modules have a lot of aesthetic and practical benefits – easier to maintain, easier to comprehend, easier to plug together – but they also solve the jQuery problem by promoting the inclusion of small bits of functionality rather than big “kitchen sink” libraries.

So in the “small modules” world, instead of doing:

var _ = require('lodash')
_.uniq([1,2,2,3])

You might do:

var uniq = require('lodash.uniq')
uniq([1,2,2,3])

Rich Harris has already articulated why the “small modules” pattern is inherently beginner-unfriendly, even though it tends to make life easier for library maintainers. However, there’s also a hidden performance cost to small modules that I don’t think has been adequately explored.

Packages vs modules

It’s important to note that, when I say “modules,” I’m not talking about “packages” in the npm sense. When you install a package from npm, it might only expose a single module in its public API, but under the hood it could actually be a conglomeration of many modules.

For instance, consider a package like is-array. It has no dependencies and only contains one JavaScript file, so it has one module. Simple enough.

Now consider a slightly more complex package like once, which has exactly one dependency: wrappy. Both packages contain one module, so the total module count is 2. So far, so good.

Now let’s consider a more deceptive example: qs. Since it has zero dependencies, you might assume it only has one module. But in fact, it has four!

You can confirm this by using a tool I wrote called browserify-count-modules, which simply counts the total number of modules in a Browserify bundle:

$ npm install qs
$ browserify node_modules/qs | browserify-count-modules
4

What’s going on here? Well, if you look at the source for qs, you’ll see that it contains four JavaScript files, representing four JavaScript modules which are ultimately included in the Browserify bundle.

This means that a given package can actually contain one or more modules. These modules can also depend on other packages, which might bring in their own packages and modules. The only thing you can be sure of is that each package contains at least one module.

Module bloat

How many modules are in a typical web application? Well, I ran browserify-count-modules on a few popular Browserify-using sites, and came up with these numbers:

For the record, my own Pokedex.org (the largest open-source site I’ve built) contains 311 modules across four bundle files.

Ignoring for a moment the raw size of those JavaScript bundles, I think it’s interesting to explore the cost of the number of modules themselves. Sam Saccone has already blown this story wide open in “The cost of transpiling es2015 in 2016”, but I don’t think his findings have gotten nearly enough press, so let’s dig a little deeper.

Benchmark time!

I put together a small benchmark that constructs a JavaScript module importing 100, 1000, and 5000 other modules, each of which merely exports a number. The parent module just sums the numbers together and logs the result:

// index.js
var total = 0
total += require('./module_0')
total += require('./module_1')
total += require('./module_2')
// etc.
console.log(total)
// module_0.js
module.exports = 0
// module_1.js
module.exports = 1

(And so on.)

I tested five bundling methods: Browserify, Browserify with the bundle-collapser plugin, Webpack, Rollup, and Closure Compiler. For Rollup and Closure Compiler I used ES6 modules, whereas for Browserify and Webpack I used CommonJS, so as not to unfairly disadvantage them (since they would need a transpiler like Babel, which adds its own overhead).

In order to best simulate a production environment, I used Uglify with the --mangle and --compress settings for all bundles, and served them gzipped over HTTPS using GitHub Pages. For each bundle, I downloaded and executed it 15 times and took the median, noting the (uncached) load time and execution time using performance.now().

Bundle sizes

Before we get into the benchmark results, it’s worth taking a look at the bundle files themselves. Here are the byte sizes (minified but ungzipped) for each bundle (chart view):

100 modules 1000 modules 5000 modules
browserify 7982 79987 419985
browserify-collapsed 5786 57991 309982
webpack 3955 39057 203054
rollup 1265 13865 81851
closure 758 7958 43955
rjs 29234 136338 628347
rjs-almond 14509 121612 613622

And the minified+gzipped sizes (chart view):

100 modules 1000 modules 5000 modules
browserify 1650 13779 63554
browserify-collapsed 1464 11837 55536
webpack 688 4850 24635
rollup 629 4604 22389
closure 302 2140 11807
rjs 7940 19017 62674
rjs-almond 2732 13187 56135

What stands out is that the Browserify and Webpack versions are much larger than the Rollup and Closure Compiler versions (update: especially before gzipping, which still matters since that’s what the browser executes). If you take a look at the code inside the bundles, it becomes clear why.

The way Browserify and Webpack work is by isolating each module into its own function scope, and then declaring a top-level runtime loader that locates the proper module whenever require() is called. Here’s what our Browserify bundle looks like:

(function e(t,n,r){function s(o,u){if(!n[o]){if(!t[o]){var a=typeof require=="function"&&require;if(!u&&a)return a(o,!0);if(i)return i(o,!0);var f=new Error("Cannot find module '"+o+"'");throw f.code="MODULE_NOT_FOUND",f}var l=n[o]={exports:{}};t[o][0].call(l.exports,function(e){var n=t[o][1][e];return s(n?n:e)},l,l.exports,e,t,n,r)}return n[o].exports}var i=typeof require=="function"&&require;for(var o=0;o<r.length;o++)s(r[o]);return s})({1:[function(require,module,exports){
module.exports = 0
},{}],2:[function(require,module,exports){
module.exports = 1
},{}],3:[function(require,module,exports){
module.exports = 10
},{}],4:[function(require,module,exports){
module.exports = 100
// etc.

Whereas the Rollup and Closure bundles look more like what you might hand-author if you were just writing one big module. Here’s Rollup:

(function () {
        'use strict';
        var module_0 = 0
        var module_1 = 1
        // ...
        total += module_0
        total += module_1
        // etc.

The important thing to notice is that every module in Webpack and Browserify gets its own function scope, and is loaded at runtime when require()d from the main script. Rollup and Closure Compiler, on the other hand, just hoist everything into a single function scope (creating variables and namespacing them as necessary).

If you understand the inherent cost of functions-within-functions in JavaScript, and of looking up a value in an associative array, then you’ll be in a good position to understand the following benchmark results.

Results

Update: as noted above, results have been re-run with corrections and the addition of the r.js and r.js Almond bundlers. For the full tabular data, see this gist.

I ran this benchmark on a Nexus 5 with Android 5.1.1 and Chrome 52 (to represent a low- to mid-range device) as well as an iPod Touch 6th generation running iOS 9 (to represent a high-end device).

Here are the results for the Nexus 5:

Nexus 5 results

And here are the results for the iPod Touch:

iPod Touch results

At 100 modules, the variance between all the bundlers is pretty negligible, but once we get up to 1000 or 5000 modules, the difference becomes severe. The iPod Touch is hurt the least by the choice of bundler, but the Nexus 5, being an aging Android phone, suffers a lot under Browserify and Webpack.

I also find it interesting that both Rollup and Closure’s execution cost is essentially free for the iPod, regardless of the number of modules. And in the case of the Nexus 5, the runtime costs aren’t free, but they’re still much cheaper for Rollup/Closure than for Browserify/Webpack, the latter of which chew up the main thread for several frames if not hundreds of milliseconds, meaning that the UI is frozen just waiting for the module loader to finish running.

Note that both of these tests were run on a fast Gigabit connection, so in terms of network costs, it’s really a best-case scenario. Using the Chrome Dev Tools, we can manually throttle that Nexus 5 down to 3G and see the impact:

Nexus 5 3G results

Once we take slow networks into account, the difference between Browserify/Webpack and Rollup/Closure is even more stark. In the case of 1000 modules (which is close to Reddit’s count of 1050), Browserify takes about 400 milliseconds longer than Rollup. And that 400ms is no small potatoes, since Google and Bing have both noted that sub-second delays have an appreciable impact on user engagement.

One thing to note is that this benchmark doesn’t measure the precise execution cost of 100, 1000, or 5000 modules per se, since that will depend on your usage of require(). Inside of these bundles, I’m calling require() once per module, but if you are calling require() multiple times per module (which is the norm in most codebases) or if you are calling require() multiple times on-the-fly (i.e. require() within a sub-function), then you could see severe performance degradations.

Reddit’s mobile site is a good example of this. Even though they have 1050 modules, I clocked their real-world Browserify execution time as much worse than the “1000 modules” benchmark. When profiling on that same Nexus 5 running Chrome, I measured 2.14 seconds for Reddit’s Browserify require() function, and 197 milliseconds for the equivalent function in the “1000 modules” script. (In desktop Chrome on an i7 Surface Book, I also measured it at 559ms vs 37ms, which is pretty astonishing given we’re talking desktop.)

This suggests that it may be worthwhile to run the benchmark again with multiple require()s per module, although in my opinion it wouldn’t be a fair fight for Browserify/Webpack, since Rollup/Closure both resolve duplicate ES6 imports into a single hoisted variable declaration, and it’s also impossible to import from anywhere but the top-level scope. So in essence, the cost of a single import for Rollup/Closure is the same as the cost of n imports, whereas for Browserify/Webpack, the execution cost will increase linearly with n require()s.

For the purposes of this analysis, though, I think it’s best to just assume that the number of modules is only a lower bound for the performance hit you might feel. In reality, the “5000 modules” benchmark may be a better yardstick for “5000 require() calls.”

Conclusions

First off, the bundle-collapser plugin seems to be a valuable addition to Browserify. If you’re not using it in production, then your bundle will be a bit larger and slower than it would be otherwise (although I must admit the difference is slight). Alternatively, you could switch to Webpack and get an even faster bundle without any extra configuration. (Note that it pains me to say this, since I’m a diehard Browserify fanboy.)

However, these results clearly show that Webpack and Browserify both underperform compared to Rollup and Closure Compiler, and that the gap widens the more modules you add. Unfortunately I’m not sure Webpack 2 will solve any of these problems, because although they’ll be borrowing some ideas from Rollup, they seem to be more focused on the tree-shaking aspects and not the scope-hoisting aspects. (Update: a better name is “inlining,” and the Webpack team is working on it.)

Given these results, I’m surprised Closure Compiler and Rollup aren’t getting much traction in the JavaScript community. I’m guessing it’s due to the fact that (in the case of the former) it has a Java dependency, and (in the case of the latter) it’s still fairly immature and doesn’t quite work out-of-the-box yet (see Calvin’s Metcalf’s comments for a good summary).

Even without the average JavaScript developer jumping on the Rollup/Closure bandwagon, though, I think npm package authors are already in a good position to help solve this problem. If you npm install lodash, you’ll notice that the main export is one giant JavaScript module, rather than what you might expect given Lodash’s hyper-modular nature (require('lodash/uniq'), require('lodash.uniq'), etc.). For PouchDB, we made a similar decision to use Rollup as a prepublish step, which produces the smallest possible bundle in a way that’s invisible to users.

I also created rollupify to try to make this pattern a bit easier to just drop-in to existing Browserify projects. The basic idea is to use imports and exports within your own project (cjs-to-es6 can help migrate), and then use require() for third-party packages. That way, you still have all the benefits of modularity within your own codebase, while exposing more-or-less one big module to your users. Unfortunately, you still pay the costs for third-party modules, but I’ve found that this is a good compromise given the current state of the npm ecosystem.

So there you have it: one horse-sized JavaScript duck is faster than a hundred duck-sized JavaScript horses. Despite this fact, though, I hope that our community will eventually realize the pickle we’re in – advocating for a “small modules” philosophy that’s good for developers but bad for users – and improve our tools, so that we can have the best of both worlds.

Bonus round! Three desktop browsers

Normally I like to run performance tests on mobile devices, since that’s where you see the clearest differences. But out of curiosity, I also ran this benchmark on Chrome 54, Edge 14, and Firefox 48 on an i5 Surface Book using Windows 10 RS1. Here are the results:

Chrome 54

Chrome 54 Surfacebook results

Edge 14 (tabular results)

Edge 14 Surfacebook results

Firefox 48 (tabular results)

Firefox 48 Surfacebook results

The only interesting tidbits I’ll call out in these results are:

  1. bundle-collapser is definitely not a slam-dunk in all cases.
  2. The ratio of network-to-execution time is always extremely high for Rollup and Closure; their runtime costs are basically zilch. ChakraCore and SpiderMonkey eat them up for breakfast, and V8 is not far behind.

This latter point could be extremely important if your JavaScript is largely lazy-loaded, because if you can afford to wait on the network, then using Rollup and Closure will have the additional benefit of not clogging up the UI thread, i.e. they’ll introduce less jank than Browserify or Webpack.

Update: in response to this post, JDD has opened an issue on Webpack. There’s also one on Browserify.

Update 2: Ryan Fitzer has generously added RequireJS and RequireJS with Almond to the benchmark, both of which use AMD instead of CommonJS or ES6.

Testing shows that RequireJS has the largest bundle sizes but surprisingly its runtime costs are very close to Rollup and Closure. (See updated results above for details.)

Update 3: I wrote optimize-js, which alleviates some of the performance costs of parsing functions-within-functions.

On joining Microsoft Edge and moving to Seattle

TL;DR: I work for Microsoft now. Hit me up to tell me what bugs you about Edge – I want to hear it!

My relationship with the web has had a funny trajectory. It took me a long time to figure out what this weird nebulous thing called “the web” even was, and why it’s so remarkable.

As it turns out, I wrote a lot of Android apps long before I began to tinker with the web. Why Android? Well, I had an Android phone, and I knew Java, so it seemed like the sensible choice. I had just graduated from university in 2008, with limited programming experience, and I wanted to practice my craft with some hobby projects.

Most of the Android apps I wrote were little one-off sketches, designed to scratch a personal itch. I wrote a Japanese transliterator, a Pokédex (no, not that one), a debug logger, and many others. Looking back, they form a pretty motley portfolio.

For instance, I loved playing guitar, but my vocal range is Ringo-esque at best, so I wrote an app to transpose chord charts, shifting the key into a more comfortable range. Another app was born of a night playing board games at the pub, where my friends and I found there weren’t any good scorekeeping apps on the Play Store. So I wrote one.

Board games at the Royal Oak in Ottawa, where I used to hang. Source: Lauren Rockburn.

Board games at the pub. Source: Lauren Rockburn.

These apps were fun to write, and I often got positive feedback from friends, colleagues, and countless folks on the Internet. The feeling of creating something, seeing it in use by hundreds of thousands of people, and then hearing their stories about how it impacted their lives is something I can’t adequately describe.

From Android to the web

However, I always had this nagging thought in the back of my head: sure, I could write apps for Android, and that was fine because I had an Android phone, as did most of my friends. But what about people on iPhones, or Windows Phones, or desktops? Often I’d get a feature request to support some other platform, but the idea of learning Objective-C or C# was a daunting proposition.

So I started turning my attention to the web – that one platform that truly is “write once, run anywhere.” The web as a platform had always scared me: I imagined it as this big amorphous thing with vestigal junk jutting out everywhere, compared to the smooth linear path of writing an Android app.

The web, as I imagined it. Source: Katamari Damacy

The web, as I imagined it. Source: Katamari Damacy

However, around 2012 Android was already started to accumulate its own evolutionary baggage, as Ice Cream Sandwich added Fragments, Action Bars, and a panoply of new features that, from my perspective, only served to aggravate the fragmentation problem. It seemed like a good time to give the web a go.

So I started building web apps, often with Cordova, but sometimes just as pure web sites. And I discovered that, yes, although the web was messy, it was amazing! My friends with iPhones could use my app just as easily as my Android friends. And “installing” it was as simple as clicking a link.

The web: still messy

However, my experience with Android often led me to be frustrated and dissatisfied with the tools available on the web. Things that are easy in native apps – storing data locally, animating at 60FPS, smooth scrolling – proved to be a challenge for web apps. Sometimes the APIs were there, but they were deprecated, or half-baked, or inconsistent across browsers.

But I didn’t give up. Instead, I followed a progression that might be familiar to many folks who work with the web:

  1. Build an app, become dissatisfied that something doesn’t work cross-browser.
  2. Use a library or polyfill, become dissatisfied due to bugs or missing features.
  3. Contribute to a library or write a new one, become dissatisfied that the solution isn’t performant or elegant.
  4. File issues on browser vendors, become dissatisfied at the pace of adoption.
  5. Go work for a browser vendor. [1]

In 2016, I find myself at step #5. I love the web, I want to see it grow in new and exciting ways, and I want to be a part of that transformation. That’s why I’ve decided to join Microsoft on the Edge team. Starting next week, I’ll be a Program Manager with a focus on the Web Platform.

Going to Microsoft is a big decision, which may surprise some folks given my cred in the open-source community. So it’s worth explaining my thought process.

Why Microsoft?

I maintain a lot of open-source projects, mostly in the JavaScript and Node.js communities. As part of that crowd, I frequently interact with folks from various browser vendors: Mozilla, Google, Microsoft, Opera, even Apple. In fact, the person I collaborate with the most – PouchDB co-maintainer Dale Harvey – is a Mozillian working on Firefox.

I admire the work that all of the browser vendors are doing, and I’ve shared drinks, code, and conversation with many of them. However, when I thought about where I could go to have the biggest impact on the web, I found myself drawn to the same conclusions as Christian Heilmann, and I turned toward the browser vendor that puts the big blue “e” in “Redmond.”

To add to what Christian already said, Microsoft has come a long way since the dark days of IE6. They’ve licked their wounds, acknowledged their mistakes, and are doubling down on the web platform with a renewed zeal. They’ve open-sourced the Chakra JavaScript engine, signaling a new commitment to openness. In terms of HTML5 support, Edge is now neck-and-neck with Firefox, and at the rate it’s been improving with each release, I wouldn’t be shocked if it surpassed Chrome this year or the next.

Web standards are about more than just scoring points on HTML5Test, though. Hard work has to be done at the fringes, in order to make the web platform a truly painless experience for developers. When writing JavaScript libraries, I often find nasty little bugs in Edge (as well as other browsers) that either call for elaborate workarounds or force me to just forgo some useful feature. I’ve tried to solve a lot of these problems at the library and bugtracker levels, but I want to go deeper.

How will this affect your open-source projects?

If anything, I’m hoping this new direction will deepen my relationship with the open-source community. Rather than just filing bugs on Edge, I’ll be in a position to actually fix those bugs, or at least to vote internally for the kinds of improvements I think are important. (As always, IndexedDB is top of my list, but everyone has their own pet API.)

To be an effective browser vendor, I believe it’s important to keep an eye on what’s cooking over in Library-Framework-Polyfill Land, listening to both developers and users, and then figure out which features and bugfixes ought to be prioritized. To that end, I hope my friends in the open-source world will let me know when a bug in Edge is blocking them, or when there’s some unsupported feature that would just really be a home run for their use case.

I know browser vendors can often seem distant and aloof. But having filed many bugs on browsers in the past, I can tell you from personal experience that if you just come to them on their turf, they’re usually very receptive.

Are you going to switch to Windows?

This is a tough one for me. I was an ardent Linux user from 2007 on, until I finally relented to the programmer hive-mind and switched to a Mac in 2012. Phonewise, I’ve been an Android user since the very first one – the HTC Dream in 2008.

However, even though Microsoft doesn’t require employees to use any particular operating system, I plan on switching over to Windows. I’ll probably get a Surface Book and a Lumia 950, since both run Windows 10 and the latest version of Edge. The craftsmanship on both devices seems really great, and the recent unveiling of Bash on Windows eases the transition quite a bit.

For me, though, switching to Windows is a matter of principle rather than of convenience. My buddy Nick Hehr likes to talk a lot about empathy, and to add to the points he’s already made, I believe this is just a case of showing empathy for the people who use my software. I’m simply not going to understand the day-to-day pain points and frustrations of Edge users unless I become one myself.

Also, I’ve been inspired by Dave Rupert’s quest to go Windows, and, like him, I worry that our current Mac monoculture is driving us to a homogeneity of tools and products. During my interview at Microsoft, I saw Jacob Rossi type into his keyboard and then seamlessly flick the screen to scroll down a list. How many web developers are totally unaware that such a UI paradigm even exists, and how many consider it when coding for their Windows users? (Who still account for 90% of desktop browser share, by the way.)

Web browsers and diversity

Furthermore, I think that using Edge is a good act of web citizenship. I’ve been a Firefox user (on both desktop and mobile!) for the past couple of years, both because I admire Mozilla as a company, and because I think it’s important to get an alternative perspective on the web.

At a previous job, my coworkers would sometimes rib me for not using the One True Browser (or at least its respectable cousin, Safari), but honestly, being a Firefox user gave me a superpower: I could immediately discover bugs in our product, usually due to improper use of nonstandard WebKit features. For instance, someone might decide to use -webkit-background-clip: text; on a gradient background, which made the text invisible on Firefox and IE. Oops! These kinds of problems are incredibly easy to miss when you live in a Blink/WebKit bubble.

This also points back to why I’m joining Microsoft in the first place. I think the web is healthiest when there is a diversity of browsers, each bringing their unique perspective to the table. Web developers who sigh and say, “Ugh, everything would be so much easier if everyone was using Chrome” would be wise to remember that people were saying the same thing back in 2001 about IE6. The web succeeds when there’s competition, and it stagnates without it.

Now to be sure, Chrome is an excellent browser, and Google is taking the web in some exciting new directions. In particular, I think folks like Alex Russell and Jake Archibald are 1000% correct about Progressive Web Apps, and I’ll be gunning hard for those features to land in Edge. (Spoiler alert: it’s on the roadmap!) Progressive Web Apps are, in my opinion, just a consummation of everything HTML5 was meant to be – a pure web experience that’s fast, immersive, and reliable. It can’t land soon enough.

However, I don’t believe it’s the duty of browser vendors to blindly follow the Chrome Consensus. Web standards shouldn’t be about one browser dominating and everybody else playing catch-up. This is why I’m excited to join up on the side of a smaller player like Microsoft (how weird is to be calling them that?). I want to help influence the future direction of the web platform, and Edge – being a browser with a little something to prove – seems like the perfect place to do that.

Leaving New York

I’m also moving from New York back to my home city of Seattle. To be honest, my decision was primarily for family and relationship reasons – my stepdad is undergoing serious health issues, and my girlfriend (another Seattleite) agreed it was better to settle here than in New York. Seeing as I was already moving back to the Emerald City, Microsoft was an easy choice.

I’m going to miss Squarespace, to which I’m grateful for contributing to my personal growth and for giving me a relaxed yet challenging work environment. I hope to keep in close contact with my former coworkers, so they can let me know how Edge can best improve the web experience for Squarespace and its users. (I’ve already been told that mix-blend-mode is high on the wishlist!)

Most of all, though, I’m going to miss BoroJS – the family of NYC JavaScript meetups that include BrooklynJS, ManhattanJS, QueensJS, JerseyScript, NodeBots NYC, and probably another one by the time you finish this sentence. It’s an amazing group of talented people, and the community is constantly growing thanks to a welcoming environment, a grassroots vibe, and a focus on fun.

I was the first to speak at four different BoroJS meetups – superfecta!

I was the first to speak at four different BoroJS meetups – superfecta! Source: @brooklyn_js

I could never adequately describe the magic of BoroJS, but Jed Schmidt has already done an excellent job, so go read that. Suffice it to say that the BoroJS community meant a lot to me, and I’m leaving it with a heavy heart.

Conclusion

The web is the largest open platform (or medium!) for expression that human beings have ever created. It isn’t owned by any one individual or organization, but it brings direct benefit to the lives of billions of people. It is a wondrous and precious thing, which gives a global voice to everyone, from indie bloggers and hobby-app creators to multibillion-dollar businesses.

As Anne van Kesteren recently said, the web is a public good. I look forward to serving it on the Microsoft Edge team.

Many thanks to Nick Hehr and Jan Lehnardt for reviewing a draft of this blog post.

Footnotes

[1] Note that I’m not saying I think everyone needs to follow this progression. If you feel comfortable at step 1, you should stay there, and keep building awesome stuff for the web! However, this flowchart seems to match the careers of lots of folks that I see working for browser vendors.

Introducing the Cordova SQLite Plugin 2

TL;DR: I rewrote the Cordova SQLite Plugin; it’s faster and better-tested. Try it out!

For better or worse, WebSQL is still a force to be reckoned with in web development. Although the spec was deprecated over 5 years ago, it still lives on, mostly as a fallback from its more standards-friendly successor, IndexedDB. (See LocalForage, PouchDB, IndexedDBShim, and YDN-DB for popular examples of this.)

Thankfully, this WebSQL-as-polyfill practice is becoming less and less necessary, as pre-Kitkat Android slowly fades into memory, and Safari fixes its lingering IndexedDB issues. That said, there is still good reason to doubt that web developers will be able to safely hop onto the IndexedDB bandwagon anytime soon, at least without fallbacks.

For one, it’s unclear when the fixes from WebKit will be released in Safari proper (and how soon we can stop worrying about old versions of Safari). Secondly, although Safari’s “modern IndexedDB” rewrite has resolved many of its gnarliest bugs, their implementation is still around 50x slower (!) than WebSQL, even in the WebKit nightlies. (It depends on the use case, but see my database comparison tool for a demonstration of batch insert performance).

Even more saddening for the web platform as a whole is that, despite being blessed with no less than three storage engines (LocalStorage, WebSQL, and IndexedDB), many developers are still electing to go native for their storage needs. The Cordova SQLite plugin (which mimics the WebSQL API via native access to SQLite) remains a popular choice for hybrid developers, and may even be influencing the decision to go hybrid.

As a proponent of web standards, I’ve always felt a bit uneasy about the SQLite Plugin. However, after struggling with the alternatives, I must admit that it does have some nice properties:

  1. It works in iOS’s WKWebView, the successor to UIWebView, which boasts better performance but unfortunately dropped WebSQL support.
  2. It allows unlimited storage in iOS: no hard cutoff after 50MB.
  3. It allows durable storage – i.e. the browser cannot start arbitrarily deleting data when disk space runs low. This is something neither IndexedDB or WebSQL can provide until the Durable Storage API has shipped (and no browser currently has). If you think this isn’t a real problem in practice, watch this talk.
  4. It offers the ability to bundle prepopulated database files within the app, avoiding the overhead of initializing a large database at startup.

So while IndexedDB is definitely the future of storage on the web (how many years have we been saying that?), the SQLite Plugin still has its place.

I’ve actually contributed to the project before, but over the past couple years I’ve found myself unable to keep up with the changing project direction, and from my vantage point on PouchDB, I’ve watched several regressions, breaking changes, and API complexities creep into the project. I wanted to contribute, but I think my goals for the SQLite Plugin differed too much from that of the current maintainer.

So I did what’s beautiful in open source: I forked it! Actually I mostly rewrote it, while taking some snippets here and there, but in spirit it’s a fork. The new library, which I’ve creatively christened SQLite Plugin 2, diverges from its forebear in the following ways:

  1. It (mostly) just implements the WebSQL spec – no extra API complexity where possible. Under the hood, node-websql is used to maximize code reuse.
  2. It’s heavily tested – I ported over 600 tests from node-websql and PouchDB, which I’ve verified pass on Android 4.0+ and iOS 8+.
  3. In order to keep the footprint and API surface small, it only uses the built-in SQLite APIs on Android and iOS, rather than bundling SQLite itself with the plugin.

In all other ways, it works almost exactly the same as the original SQLite Plugin, on both iOS and Android. (For Windows Phone, cordova-plugin-websql already has us covered.)

Performance test

I didn’t set out to write the fastest possible WebSQL shim, but I figured folks would be interested in how my remake stacks up against the original. So I put together a small benchmark.

Again, these tests were borrowed from PouchDB: one test mostly involves reads, and the other mostly involves writes. As it turns out, PouchDB “writes” are not purely INSERTs, and PouchDB reads are not simple SELECTs (due to the CouchDB-style revision model), but hopefully this test should serve as a pretty good representation of what an actual app would do.

Each test was run 5 times with 1000 iterations each, with the median of the 5 runs taken as the final result. The test devices were a 6th generation iPod Touch running iOS 9.3.1 and a Nexus 5X running Android 6.0.1. For completeness, I also tested against pure WebSQL.

Here are the results:

SQLite Plugin 2 benchmark

SQLite Plugin 2 Original SQLite Plugin WebSQL
Writes (iOS) 29321ms 30374ms 21764ms
Reads (iOS) 8004ms 9588ms 3053ms
Writes (Android) 29043ms 33173ms 23806ms
Reads (Android) 8172ms 11540ms 7277ms

And a summary comparing SQLite Plugin 2 to the competition:

vs Original SQLite Plugin vs WebSQL
Writes (iOS) 3.59% faster 25.77% slower
Reads (iOS) 19.79% faster 61.86% slower
Writes (Android) 14.22% faster 22% slower
Reads (Android) 29.19% faster 12.3% slower

(Full results are available in this spreadsheet.)

As it turns out, SQLite Plugin 2 actually outperforms the original SQLite Plugin by quite a bit, which I credit to a smaller data size when communicating with the native layer, as well as some minor optimizations to the way SQLite itself is accessed (e.g. avoiding calculating the affected rows for a straight SELECT query).

Of course, one should also note that pure WebSQL is much faster than either plugin. This doesn’t surprise me; any Cordova plugin will always be at a disadvantage to straight WebSQL, due to the overhead of serializing the messages that are sent between the WebView and the native layer. (N.B.: just because something is “native” doesn’t necessarily mean it’s faster!)

Furthermore, if you’re storing large binary data (images, audio files, etc.), the performance will probably get even worse relative to regular WebSQL, since that large data needs to be encoded as a string (base64 or otherwise) when sent to the native side. In those cases, the most efficient choice is undoubtedly IndexedDB on Android and WebSQL on iOS, since Safari IndexedDB lacks Blob support and is already quite slow as-is. (Both PouchDB and LocalForage will intelligently store Blobs in this manner, preferring built-in Blob support where available.)

So please, heed some advice from the author himself: avoid this plugin whenever possible. Unless you absolutely need WKWebView support, unlimited storage, durable storage, or prepopulated databases, just use regular IndexedDB or WebSQL instead. Or at the very least, try to architect your app so that you can easily swap in a more standards-based approach in the future (i.e., IndexedDB!). LocalForage, PouchDB, and YDN-DB are great libraries for this, since they largely abstract away the underlying storage engine.

Conclusion

Hopefully the SQLite Plugin 2 will serve as a useful tool for hybrid developers, and can help ease the transition to the rosy future where IndexedDB and Durable Storage are well-supported in every browser. Until then, please try it out, file bugs, and let me know what you think!

High-performance Web Worker messages

In recent posts and talks, I’ve explored how Web Workers can vastly improve the responsiveness of a web application, by moving work off the UI thread and thereby reducing DOM-blocking. In this post, I’ll delve a bit more deeply into the performance characteristics of postMessage(), which is the primary interface for communicating with Web Workers.

Since Web Workers run in a separate thread (although not necessarily a separate process), and since JavaScript environments don’t share memory across threads, messages have to be explicitly sent between the main thread and the worker. As it turns out, the format you choose for this message can have a big impact on performance.

TLDR: always use JSON.stringify() and JSON.parse() to communicate with a Web Worker. Be sure to fully stringify the message.

I first came across this tip from IndexedDB spec author and Chrome developer Joshua Bell, who mentioned offhand:

We know that serialization/deserialization is slow. It’s actually faster to JSON.stringify() then postMessage() a string than to postMessage() an object.

This insight was further confirmed by Parashuram N., who demonstrated experimentally that stringify was a key factor in making a worker-based React implementation that improved upon vanilla React. He says:

By “stringifying” all messages between the worker and the main thread, React implemented on a Web Worker [is] faster than the normal React version. The perf benefit of the Web Worker approach starts to increase as the number of nodes increases.

Malte Ubl, tech lead of the AMP project, has also been experimenting with postMessage() in Web Workers. He had this to say:

On phones, [stringifying] is quickly relevant, but not with just 3 or so fields. Just measured the other day. It is bad.

This made me curious as to where, exactly, the tradeoffs lie with stringfying messages. So I decided to create a simple benchmark and run it on a variety of browsers. My tests confirmed that stringifying is indeed faster than sending raw objects, and that the message size has a dramatic impact on the speed of worker communication.

Furthermore, the only real benefit comes if you stringify the entire message. Even a small object that wraps the stringified message (e.g. {msg: JSON.stringify(message)}) performs worse than the fully-stringified case. (These results differ between Chrome, Firefox, and Safari, but keep reading for the full analysis.)

Test results

In this test, I ran 50,000 iterations of postMessage() (both to and from the worker) and used console.time() to measure the total time spent posting messages back and forth. I also varied the number of keys in the object between 0 and 30 (keys and values were both just Math.random()).

Clarification: the test does include the overhead of JSON.parse() and JSON.stringify(). The worker even re-stringifies the message when echoing it back.

First, here are the results in Chrome 48 (running on a 2013 MacBook Air with Yosemite):

Chrome 48 test results

And in Chrome 48 for Android (running on a Nexus 5 with Android 5.1):

Nexus 5 Chrome test results

What’s clear from these results is that full stringification beats both partial stringification and no-stringification across all message sizes. The difference is fairly stark on desktop Chrome for small messages sizes, but this difference start to narrow as message size increases. On the Nexus 5, there’s no such dramatic swing.

In Firefox 46 (also on the MacBook Air), stringification is still the winner, although by a smaller margin:

Firefox test results

In Safari 9, it gets more interesting. For Safari, at least, stringification is actually slower than posting raw messages:

Safari test results

Based on these results, you might be tempted to think it’s a good idea to UA-sniff for Safari, and avoid stringification in that browser. However, it’s worth considering that Safari is consistently faster than Chrome (with or without stringification), and that it’s also faster than Firefox, at least for small message sizes. Here are the stringified results for all three browsers:

Stringification results for all browsers

So the fact that Safari is already fast for small messages would reduce the attractiveness of any UA-sniffing hack. Also notice that Firefox, to its credit, maintains a fairly consistent response time regardless of message size, and starts to actually beat both Safari and Chrome at the higher levels.

Now, assuming we were to use the UA-sniffing approach, we could swap in the raw results for Safari (i.e. showing the fastest times for each browser), which gives us this:

Results with the best time for each browser

So it appears that avoiding stringification in Safari allows it to handily beat the other browsers, although it does start to converge with Firefox for larger message sizes.

On a whim, I also tested Transferables, i.e. using ArrayBuffers as the data format to transfer the stringified JSON. In theory, Transferables can offer some performance gains when sending large data, because the ArrayBuffer is instantly zapped from one thread to the other, without any cloning or copying. (After transfer, the ArrayBuffer is unavailable to the sender thread.)

As it turned out, though, this didn’t perform well in either Chrome or Firefox. So I didn’t explore it any further.

Chrome test results, with arraybuffer

Firefox results with arraybuffer

Transferables might be useful for sending binary data that’s already in that format (e.g. Blobs, Files, etc.), but for JSON data it seems like a poor fit. On the bright side, they do have wide browser support, including Chrome, Firefox, Safari, IE, and Edge.

Speaking of Edge, I would have run these tests in that browser, but unfortunately my virtual machine kept crashing due to the intensity of the tests, and I didn’t have an actual Windows device handy. Contributions welcome!

Correction: this post originally stated that Safari doesn’t support Transferables. It does.

Update: Boulos Dib has gracious run the numbers for Edge 13, and they look very similar to Safari (in that raw objects are faster than stringification):

Edge 13 results

Conclusion

Based on these tests, my recommendation would be to use stringification across the board, or to UA-sniff for Safari and avoid stringification in that browser (but only if you really need maximum performance!).

Another takeaway is that, in general, message sizes should be kept small. Firefox seems to be able to maintain a relatively speedy delivery regardless of the message size, but Safari and Chrome tend to slow down considerably as the message size increases. For very large messages, it may even make sense to save the data to IndexedDB from the worker, and then simply fetch the saved data from the main thread, but I haven’t verified this idea with a benchmark.

The full results for my tests are available in this spreadsheet. I encourage anybody who wants to reproduce these results to check out the test suite and offer a pull request or the results from their own browser.

And if you’d like a simple Web Worker library that makes use of stringification, check out promise-worker.

Update: Chris Thoburn has offered another Web Worker performance test that adds some additional ways of sending messages, like MessageChannels. Here are his own browser results.

How to think about databases

As a maintainer of PouchDB, I get a lot of questions from developers about how best to work with databases. Since PouchDB is a JavaScript library, and one with fairly approachable documentation (if I do say so myself), many of these folks tend toward the more beginner-ish side of the spectrum. However, even with experienced developers, I find that many of them don’t have a clear picture of how a database should fit into their overall app structure.

The goal of this article is to lay out my perspective on the proper place for a database within your app code. My focus will be on the frontend – e.g. SQLite in an Android app, CoreData in an iOS app, or IndexedDB in a webapp – but the discussion could apply equally well to a server-side app using MongoDB, MySQL, etc.

What is a database, anyway?

I have a friend who recently went through a developer bootcamp. He’s a smart guy, but totally inexperienced with JavaScript (or any kind of coding) before he started. So I found his questions endlessly fascinating, because they reminded me what it was like learning to code.

Part of his coursework was on MongoDB, and I recall spending some time coaching him on Mongoose queries. As I was explaining the concepts to him, he got a little frustrated and asked, “What’s the point of a database, anyway? Why do I need this thing?”

For a beginner, this is a perfectly valid question. You’ve already spent a long time learning to work with data in the form of objects and arrays (or “dictionaries” and “lists,” or whatever your language calls them), and now suddenly you’re told you need to learn about this separate thing called a “database” that has similar kinds of operations, but they’re a lot more awkward. Instead of your familiar for-loops and assignments, you’re structuring queries and defining schemas. Why all the overhead?

To answer that question, let’s take a step back and remember why we have databases in the first place.

#1 goal of a database: don’t forget stuff

When you create an object or an array in your code, what you have is data:

var array = [1, 2, 3, 4, 5];

This data feels tangible. You can iterate through it, you can print it out, you can insert and remove things, and you can even .map() and .filter() it to transform it in all sorts of interesting ways. Data structures like this are the raw material your code is made of.

However, there’s an ephemeral side to this data. We happen to call the space that it lives in “memory” or “RAM” (Random Access Memory), but in fact “memory” is kind of a nasty misnomer, because as soon as your application stops, that data is gone forever.

You can imagine that if computers only had memory to work with, then computer programs would be pretty frustrating to use. If you wanted to write a Word document, you’d need to be sure to print it out before you closed Word, because otherwise you’d lose your work. And of course, once you restarted Word, you’d have to laboriously type your document back in by hand. Even worse, if you ever had a power outage or the program crashed, your data would vanish into the ether.

Thankfully, we don’t have to deal with this awful scenario, because we have hard disks, i.e. a place where data can be stored more permanently. Sometimes this is called “storage,” so for instance when you buy a new laptop with 200GB of storage but only 8GB of RAM, you’re looking at the difference between disk (or storage) and memory (or RAM). One is permanent, the other is fleeting.

So if disk is so awesome, why don’t computers just use that? Why do we have RAM at all?

Well, the reason for that is that there’s a pretty big tradeoff in speed between “storage” and “memory.” You’ve felt it if you’ve ever copied a large file to a USB stick, or if you’ve seen an old low-RAM machine that look a long time to switch between windows. That’s called paging, and it’s when your computer runs out of RAM, so it starts hot-swapping between RAM and disk.

Latency numbers every programmer should know

Latency numbers, visualized.

This performance difference cannot be overstated. If you look at a chart of latency numbers every programmer should know, you’ll see that reading 1MB sequentially from memory takes about 250 microseconds, whereas reading 1MB from disk is 20 milliseconds. If those numbers both sound small, consider the scale: if 250 microseconds were the time it took to brush your teeth (5 minutes, if you listen to your dentist!), then 20 milliseconds would be 4.6 days, which is enough time to drive east-to-west across North America, with plenty of breaks in between.

And if you think reading 1MB from SSD is much better (1 millisecond), then consider that in our toothbrush-scale, it would still be 5.5 hours. That’s the time it would take for you to fly from New York to San Francisco, which is quite a bit shorter than our road trip, but still something you’d need to pack your bags for.

In a computer program, the kind of operations you can “get away with” in the toothbrush-scale of 5 minutes are totally different than what you can do in 5 hours or 4 days. This is the difference between a snappy application and a sluggish application, and it’s also at the heart of how you should be thinking about databases within your app.

Storage vs memory

Let’s move away from toothbrushes for a moment and try a different analogy. This is the one I find most useful myself when I’m writing an app.

Memory (including objects, arrays, variables, etc.) are like the counter space in your kitchen when you’re preparing a meal. You have all the tools available to you, you can quickly chop your carrots and put them into a bowl, you can mix the onions with the celery, and all of these things can be done fairly quickly without having to move around the kitchen.

Storage, on the other hand (including filesystems and databases) are like the freezer. It’s a place where you put food that you know you’re going to need later. However, when you pull it out of the freezer, there’s often a thawing period. You also don’t want to be constantly opening your freezer to pull ingredients in and out, or your electric bill is going to go through the roof! Plus, your food will probably end up tasting awful.

Probably the biggest mistake I see beginners make when working with databases is that they want to treat their freezer like their counter space. They want their application data to be perfectly mirrored in their database schemas, and they don’t want to have to think about where their food comes from – whether it’s been sitting on the counter for a few seconds, or in the freezer for a few days.

This is at the root of a lot of suffering when working with databases. You either end up constantly reading things in and out of disk, which means that your app runs slowly (and you probably blame your database!), or you have to meticulously manage your schemas and painstakingly migrate your data whenever anything in your in-memory representation changes.

Unfortunately, this idea that we can treat our databases like our RAM is a by-product of the ORM (Object-Relational Mapping) mentality, which in my opinion is one of the most toxic and destructive ideas in software engineering, because it sells you a false vision of hope. The ORM salesman promises that you can work with your in-memory objects and make them as fancy as you like, and then magically those objects will be persisted to the database (exactly as you left them!), and you’ll never even have to think about what a database is or how you’re accessing it.

In my experience, this is never how it works out with ORMs. It may seem easy at first, but eventually your usage of the database will become inefficient, and you’ll have to drop down into the murky details of the ORM layer, figure out the queries you wish it were doing, and then try to guess the incantation needed to make it perform that query. In effect, the promise of not having to think about the database is a sham, because you just end up just having to learn two layers: the database layer and the ORM layer. It’s a classic leaky abstraction.

Even if you manage to tame your ORM, you usually end up with a needlessly complex schema format, as the inflexibility of working with stored data collides with the needs of a flexible in-memory format. You might find that you wind up with a SQLite table with 20 columns, merely because your class has 20 variables – even if none of those 20 columns are ever used for querying, and in fact are just wasted space.

This partially explains the attraction of NoSQL databases, but I believe that even without rigid schemas, this problem of the “ORM mindset” remains. Mongoose is a good example of this, as it tries to mix JavaScript and MongoDB in a way that you can’t tell where one starts and the other ends. Invariably, though, this leads developers to hope that their in-memory format can exactly match their database format, which leads to irreconcilable situations (such as classes with behavior) or slowdowns (such as over-fetching or over-storing).

All of this is pretty abstract, so let me take some concrete examples from a recent app I wrote, Pokedex.org, and how I carefully modeled my database structure to maximize performance. (If you’re unfamiliar with Pokedex.org, you may want to read the introductory blog post.)

Case study: Pokedex.org

The first consideration I had to make for Pokedex.org was which database to use in the first place. Without going into the details of browser databases, I ended up choosing two:

  • LocalForage, because it has a simple key-value API that’s good for storing application state.
  • PouchDB, because it has good APIs for working with larger datasets, and can serve as an offline-first layer in front of Cloudant or CouchDB.

PouchDB can also store key-value data, so I might have used it for both. However, another benefit of LocalForage is that the bundle size is much smaller (8KB vs PouchDB’s 45KB). And in my case I had three JavaScript bundles (one for the service worker, one for the web worker, and one for the main JavaScript app), so I didn’t want to push 45KB down the wire three times. Hence I chose LocalForage for the simple stuff.

Pokedex.org database usage

Pokedex.org database usage

You can see what kind of data I stored in LocalForage if you go into the Chrome Dev Tools on Pokedex.org and open the “Resources” tab. You’ll see I’m using it to store the ServiceWorker data version (so it knows when to update), as well as "informedOffline", which just tells me whether I’ve already shown the dialog that says, “Hey, this app works offline.” If I had more app data to store (such as the user’s favorite Pokémon, or how many times they’ve opened the app), I might store that in LocalForage.

PouchDB, however, is responsible for storing the majority of the Pokémon data – i.e. the 649 species of monsters, their stats, and their moves. So this is much more interesting.

First off, you’ll notice that as you type into the search bar, you immediately get a filtered list showing Pokémon that match your search string. This is a simple prefix search, so if you type “bu” you will see “Bulbasaur” and “Butterfree” amongst others.

 

This search bar is super fast, and it ought to be, because it’s supposed to respond to user input. There’s a debounce on the actual <input> handler, but in principle every keystroke represents a database query, meaning that there’s a lot of data flying back and forth.

I considered using PouchDB for this, but I decided it would be too slow. PouchDB does offer built-in prefix search, but I don’t want to have to go back and forth to IndexedDB (i.e. disk) for every keystroke. So instead, I wrote a simple in-memory database layer that stores Pokémon summary data, i.e. only the things that are necessary to show in the list, which happens to be their name, number, and types. (The sprite comes from a CSS class based on their number.)

To perform the search itself, I just used a sorted array of String names, with a binary search to ensure that lookups take O(log n) time. If the list were larger, I might try to condense it as a trie, but I figured that would be overkill for this app.

For a small amount of data, this in-memory strategy works great. However, when you click on a Pokémon, it brings up a detail page with stats, evolutions, and moves, which is much too large to keep in memory. So for this, I used PouchDB.

 

Given that I am the primary author of PouchDB map/reduce, relational-pouch, and pouchdb-find, you may be surprised to learn that I didn’t use any of them for this task. Obviously I put a lot of care into those libraries, and I do think they’re useful for beginners who are unsure how to structure their data. But from a performance standpoint, none of them can beat the potential gains from rolling your own, so that’s what I did.

In this case, I used my knowledge of IndexedDB performance subtleties to get the maximum possible throughput in the shortest amount of time. Essentially, what I did was split up my data into seven separate PouchDB databases, representing seven different IndexedDB databases on disk:

  • Monster basic data
  • Monster descriptions
  • Monster evolutions
  • Monster supplemental data (anything not covered above)
  • Types
  • Monster moves
  • Moves

The first four all use IDs based on the number of the Pokémon (e.g. Bulbasaur is 1, Ivysaur is 2, etc.), and map to data such as evolutions, stats, and descriptions. This means that tapping on a Pokémon involves a simple key-value lookup.

The reason I segmented this data into multiple databases is because IndexedDB happens to do a lot of transaction-level blocking at the database level. If you have the luxury of specifying separate IndexedDB objectStores, you can allow your databases queries to run in parallel under the hood, but in the case of PouchDB all of the objectStores are predefined (due to the CouchDB-style revision semantics written on top of IndexedDB).

In practice, this usually means that read/write operations (such as the initial import of the data) will run sequentially unless you use separate PouchDB objects. Sequential is bad – we want the database to do as much work as quickly as possible – so I avoided using one large PouchDB database. (If you were using a lower-level library like Dexie, though, you could use a single database with separate objectStores and probably get a similar result.)

So when you tap on a Pokémon, the app fires off six concurrent get() requests, which the underlying IndexedDB layer is free to run in parallel. This is why you barely have to wait at all to see the Pokémon data, although it helps that I have a snazzy animation while the lookup is in progress. (Animations are a great way to mask slow operations!) The query is also run in a web worker, which is why you won’t see any UI blocking from IndexedDB during database interactions.

Pokémon's type(s) determine its strengths/weaknesses

A Pokémon’s type(s) determine its strengths/weaknesses relative to other types

Now, two of the six requests I described above are for a Pokémon’s “type” information, which merit some explanation. Each Pokémon has up to two types (e.g. Fire and Water), and types also have strengths and weaknesses relative to each other: Fire beats Grass, Water beats Fire, etc. The “types” database contains this big rock-paper-scissors grid, which isn’t keyed by Pokémon ID like the other four, but rather by type name.

However, since the type names of each Pokémon are already available in-memory (due to the summary view), the queries for a Pokémon’s strengths and weaknesses can be fired off in parallel with the other queries. And since they’re equally simple get() requests, they take about the same amount of time to complete. This was a nice side effect of my previous in-memory optimizations.

The last two databases are a bit trickier than the others, and are quite relation-y. I called these the “monster moves” and “moves” databases, and I modeled their implementation after relational-pouch (although I didn’t feel the need to use relational-pouch itself).

 

Essentially, the “monster moves” database contains a mapping from monster IDs to a list of learned moves (e.g. Bulbasaur learns Razor Leaf at level 27), while the “moves” database contains a mapping from move IDs to information about the move (e.g. Razor Leaf has a certain power, accuracy, and description). If you’re familiar with SQL, you might recognize that I would need a JOIN to combine this data together, although in my case I just did the join explicitly myself, in JavaScript.

Since this is a many-to-many relationship (Pokémon can learn many moves, and moves can be learned by many Pokémon), it would be prohibitively redundant to include the “move” data inside the “monster move” database – that’s why I split it apart. However, the relational query (i.e. the JOIN) has a cost, and I saw it while developing my app – it takes nearly twice as long to fetch the full “moves” data (75ms on a Nexus 5X) as it does to fetch the more basic data (40ms – these numbers are much larger on a slow device). So what to do?

Well, I pulled off a sleight-of-hand. You’ll notice that, especially in a mobile browser, the list of Pokémon moves is “below the fold.” Thus, I can simply load the above-the-fold data first, and then lazily fetch the rest before the user has scrolled down. On a fast mobile browser, you probably won’t even notice that anything was fetched in two stages, although on a huge monitor you might be able to glimpse it. (I considered adding a loading spinner, but the “moves” data was already fast enough that I felt it was unnecessary.)

So there you have it: the queries that ought to feel “instant” are done in memory, the queries that take a bit longer are fetched in parallel (with an animation to keep the eye busy), and the queries that are super slow are slipped in below-the-fold. This is a subtle ballet with lots of carefully orchestrated movements, and the end result is an app that feels pretty seamless, despite all the work going on behind the scenes.

Conclusion

When you’re working with databases, it’s worthwhile to understand the APIs you’re dealing with, and what they’re doing under the hood. Unfortunately, databases are not magic, and there’s no abstraction in the world (I believe) that can obviate the need to learn at least a little bit about how a database works.

So when you’re using a database, be sure to ask yourself the following questions:

  1. Is this database in-memory (Redis, LokiJS, MemDOWN, etc.) or on-disk (PouchDB, LocalForage, Lovefield, etc.)? Is it a mix between the two (e.g. LevelDB)?
  2. What needs to be stored on disk? What data should survive the application being closed or crashing?
  3. What needs to be indexed in order to perform fast queries? Can I use an in-memory index instead of going to disk?
  4. How should I structure my in-memory data relative to my database data? What’s my strategy for mapping between the two?
  5. What are the query needs of my app? Does a summary view really need to fetch the full data, or can it just fetch the little bit it needs? Can I lazy-load anything?

Once you’ve answered these questions, you can write an app that is fast, responsive, and doesn’t lose user data. You’ll also make your own job easier as a programmer, if you try to maintain a good grasp of the differences between your in-memory data (your counter space) and your on-disk data (your freezer).

Nobody likes freezer burn, but spoiled meat that’s been left on the counter overnight is even worse. Understand the difference between the two, and you’ll be a master chef in no time.

Notes

Of course there are more advanced topics I could have covered here, such as indexes, sync, caching, B-trees, and on and on. (We could even extend the metaphor to talk about “tagging” food in the freezer as an analogy for indexing!) But I wanted to keep this blog post small and focused, and just communicate the bare basics of the common mistakes I’ve seen people make with databases.

I also apologize for all the abstruse IndexedDB tricks – those probably merit their own blog post. In particular, I need to provide some experimental data to back up my claim that it’s better to break up a single IndexedDB database into multiple smaller ones. This trick is based on my personal experience with IndexedDB, where I noticed a high cost of fetching and storing large monolithic documents, but I should do a more formal study to confirm it.

Thanks to Nick Colley, Chris Gullian, Jan Lehnardt, and Garren Smith for feedback on a draft of this blog post.

How to fix a bug in an open-source project

So, you’ve found a bug in an open-source project. First off: don’t panic! This is perfectly normal. Software is written by humans, and humans make mistakes.

You might also be thinking to yourself, “Gee, I’d love to fix this bug.” I mean, who wouldn’t want to be the hero who swoops in and fixes a project used by thousands, if not millions, of people? You’d feel the warm glow of knowing you gave back to the open-source community, and plus it’s a nice notch in the belt of your Github résumé. [1]

Bug in the "buffer" module

A typical open-source bug.

 

For new coders, however, the idea of contributing to an open-source project can be intimidating. One of my friends, who is currently learning JavaScript from online tutorials, told me she found Github’s UI “bewildering.”

There’s also the social aspect: “Will the project maintainer accept my pull request?” “What if they criticize or dismiss me?” These are legitimate concerns to early coders, who may be self-conscious of the perceived gap between their own knowledge and that of the experts.

There’s no need to be timid, though – open-source folks love to get pull requests from newcomers! Recent efforts like First Timers Only and Your First PR have shown that, given enough of a helping hand, anyone can contribute to an open-source project.

The purpose of this blog post is a bit different. Rather than give detailed instructions for a particular bug in a particular project, I’d like to explain how I go about fixing any bug in any project. To illustrate my problem-solving process, I’ll use the example of a recent bug in the “buffer” module, which I solved in about 1 hour, despite having almost no prior experience with the project.

If I can fix a bug in an unfamiliar project, so can you!

Setting the stage

“buffer” is a JavaScript implementation of the Node.js Buffer API for the browser. It allows you to use Node.js modules that depend on the Buffer object even in the browser, where that API doesn’t exist. (Instead, browsers have other APIs for working with binary data, like Uint8Array and ArrayBuffer.)

It might seem like a weird esoteric library, but in fact “buffer” is downloaded nearly 2 million times per month, since it is a core dependency of both Browserify and Webpack. But despite being such a high-profile project, I noticed an issue that had been languishing, unresolved, for over three months.

Open bug in the "buffer" project

This bug was opened in September, but was still open three months later.

 

This issue is no minor glitch, either – it’s a showstopper that causes “buffer” to fail entirely on certain browsers (notably Chrome 43+) and in certain build setups (notably with Babel). Many people hopped onto the thread to confirm the issue (“+1”, “same here,” etc.), and a few offered workarounds using project-specific Webpack configuration. But nobody fixed it.

I decided to take a crack at this bug, because at first glance it seemed easier than everybody was making it out to be. Also, I thought it would be instructive to fix a bug in an unfamiliar codebase and document how I went about solving it.

Note: to be fair, I have contributed to “buffer” before, but these were very minor pull requests, and I still consider myself pretty inexperienced with the project. In taking up this bug, I basically had to start from scratch, to jog my memory about how the project works. [2]

Step 1: download the code

Before I could investigate the bug, I needed to be sure I could build the code myself and run the tests. This is an important step, because it confirms that the project’s tests work on my machine, as written, with my current setup.

For instance: am I using the correct version of Node for this project? The correct version of npm? Is there a global dependency (such as a linter or test runner) that I need to install? Does it work on Mac, or only on Linux or Windows? If you try to fix a bug before establishing that the code works on your machine as-is, you can end up going down a rabbit hole before you even start.

Note: the following steps are JavaScript-specific, but they can also be applied to other languages. It helps to know the conventions for your particular language, such as the typical package manager, linter, and test runner.

First, I cloned the code. You can usually find the Git URL at the top of the project, and HTTPS is recommended unless you have committer rights:

Where to find the Git URL in the Github UI

Once I had the URL, I went to my terminal (iTerm 2 in my case), and I typed:

git clone https://github.com/feross/buffer.git
cd buffer

After this, I had the code on my machine, representing the master branch of the remote Git repository.

Step 2: run the tests

Once I had the code, I needed to figure out how to run the tests. Usually this information is provided in the README.md, but in this case, I searched for the word “test” and didn’t find anything. I also checked for a CONTRIBUTING.md (a document that gives instructions for contributors), but this project didn’t have one.

This is one of those cases where knowledge of the language and ecosystem can be helpful. I happen to know that most JavaScript projects are distributed with npm and can be installed and tested using:

npm install
npm test

Unfortunately, in this case, the above steps completed with an error:

Test output showing Saucelabs failure

However, I noticed the most important part of the error message, which was:

Error: Zuul tried to run tests in saucelabs, 
  however no saucelabs credentials were provided.

From reading this, I understood that this project was using Zuul and Saucelabs to run automated browser tests. Saucelabs is a remote browser-testing service, and I didn’t have my Saucelabs username or password declared as environment variables, so the tests didn’t run.

Furthermore, I didn’t really want to use Saucelabs in this case. I just wanted to test on my own machine, in my own browser. So I needed to figure out how to do that.

Luckily, in most JavaScript projects, you can just snoop around the package.json file and see what other commands are in the "scripts" section. In this case, I looked in the package.json and saw:

...
"scripts": {
  "test": "standard && node ./bin/test.js",
  "test-browser": "zuul -- test/*.js test/node/*.js",
  "test-browser-local": "zuul --local -- test/*.js test/node/*.js",
...

Aha! test-browser-local. That sounds promising!

So I ran:

npm run test-browser-local

And this time, I got the output:

open the following url in a browser:
http://localhost:62466/__zuul

Opening this URL in Chrome, I saw a nice UI where all the tests passed:

tests passing in Chrome

Yay! Success! At this point, I was confident that I could build and test the code on my own machine.

Note: if you’re thinking to yourself, “Wow, you shouldn’t have to do all that detective work just to test a project,” then you’re right! So I also took the time to open a pull request to document the testing procedure. This is one of the most valuable things you can do as a new contributor to a project, because seasoned contributors may be so accustomed to their workflow that they forget to include basic instructions for newbies.

Step 3: find a failing test

Next, we want to find a failing test, to confirm that we have reproduced the issue. (This is a core part of test-driven development, which is vital for many open-source projects.)

In this case, I had to read through the Github thread to try to figure out the source of the problem. Based on the discussion, it seemed that some combinations of tools (notably Babel and Webpack) might force a JavaScript module to run in strict mode. However, “buffer” is apparently not written in strict mode, so it fails in Chrome 43+ due to that browser’s interpretation of strict mode.

Based on this information, I figured I could reproduce the issue by simply adding 'use strict' to the top of the index.js file. (I knew index.js was the source file by checking the "main" field in package.json. But it’s also kinda obvious, because it’s the only top-level JavaScript file in this project.)

So I added 'use strict' to the top of index.js:

adding "use strict" to index.js

And lo and behold, when I refreshed the Zuul test page, I immediately saw the bug that everyone was talking about:

test failure

(Notice that the tests didn’t even manage to run. The page is yellow rather than green, and it says “0 failing, 0 passing.”)

At this point, I had successfully reproduced the bug, using the project’s own test suite. This is an invaluable step, for a few reasons:

  1. Even if you can’t solve the bug, you can open a pull request with just the failing test. This helps bypass a lot of lengthy discussion about how to reproduce the bug.
  2. And if you can solve the bug, then this gives you a way to prove that you’ve fixed it.

Here I just got lucky, because the existing tests were enough to suss out the problem. Sometimes, though, you may need to modify the tests yourself to reproduce the issue. In those cases, my workflow is usually:

  1. Try to break an existing test, e.g. by changing an assertTrue() to assertFalse(), then confirm that you see the test failing. (Believe me, this is a good sanity check!)
  2. Next, copy-paste a test that looks similar to the one you want to write. Then modify that new test until it fails.

For this particular bug, though, I already had a failing test. So I could move on to the next step.

Step 4: fix the bug

Unfortunately, the stacktrace didn’t provide a lot of guidance. And even the line numbers were messed up, because Zuul seems to mangle the code when it transpiles it. So that didn’t help.

unhelpful stacktrace

At this point, I was a bit puzzled. But I tried to think logically: it says we’re setting a property called length on an object that only supports a getter, not a setter. So is there any place where we do something like foo.length = bar? I tried searching the code for instances of .length =:

searching the code for instances of "length"

I found three places where .length is set. The most interesting is the first one, because it’s wrapped in a conditional if/else on Buffer.TYPED_ARRAY_SUPPORT. I have no idea what TYPED_ARRAY_SUPPORT is, but it immediately set off alarm bells for me that the .length assignment was guarded in one case, but not in the other two.

Trying to figure out what this TYPED_ARRAY_SUPPORT thing is, I came across this line of code:

if (Buffer.TYPED_ARRAY_SUPPORT) {
  Buffer.prototype.__proto__ = Uint8Array.prototype
  Buffer.__proto__ = Uint8Array
}

Aha, so when we have TYPED_ARRAY_SUPPORT (whatever that is), we set the prototype of the Buffer object (which is what index.js exports) to be the same as the prototype of the built-in Uint8Array. Hmm, and then in some cases, we’re setting that same prototype.length ourselves. Could it be that Chrome is blocking us from modifying the prototype of the built-in object, in strict mode? A theory for the bug was starting to materialize.

So I did a very simple fix: I took the two cases where length was being set, and I wrapped them both in checks for if (!Buffer.TYPED_ARRAY_SUPPORT):

wrapping the offending code in an if () check

(I also wrapped the .parent assignment, even though I wasn’t sure what it was doing, because it seemed related to the .length assignment.)

Then I refreshed the browser tests, and suddenly all 84 tests were passing! So apparently, that did the trick.

Step 5: open a pull request

At this point, it’s tempting to pat yourself on the back and declare victory. However, with most browser libraries, you can only be sure that your fix works if you test it against a wide variety of browsers. In this case, the Buffer.TYPED_ARRAY_SUPPORT fix seemed to be working for Chrome, but what about other browsers?

Rather than run the tests myself against every browser installed on my laptop (which would exclude browsers like IE and Android, because I’m on a Mac), a simpler trick is to just open a pull request on the project. Most well-run open-source projects have automated tests that run on every commit, including pull requests. This is an invaluable part of the pull request process, because it gives maintainers the peace of mind to know that the patch doesn’t break any tests.

I could tell that “buffer” was indeed using automated tests against many browsers, because I saw the Saucelabs badge in the project README:

The Saucelabs badge in the "buffer" README.

The Saucelabs badge shows the current state of the tests across various browsers.

 

Before opening my pull request, though, I also needed to check that my code conformed to the project style, which in this case is the somewhat presumptuously-named Standard. Personally I don’t much like Standard (semicolons for life!), but this isn’t my project, so I follow the age-old advice of “when in Rome, do as the Romans do.”

To check if your code conforms to Standard style, you can simply run:

npm install -g standard && standard

This passed, so I knew I was following the style guide of the project.

Next, to commit the fix, I created a separate Git branch. I called it 79, because that’s the issue number, and it’s a habit of mine to just name branches after the issue:

git checkout -b 79
git add index.js
git commit -m 'Proper strict mode support. Fixes #79'

Then, I forked the project and submitted a pull request using hub, which is a convenient git wrapper with some Github-specific tools:

hub fork
git push nolanlawson 79
hub pull-request

At this point, hub will create the pull request and print out a URL so you can view it in a browser. After waiting a bit for the tests to complete, I saw that I had a green checkmark – the tests passed in all browsers!

Github UI showing the test results

If the tests failed and you’re unsure why, you can always click the “Show all checks” link, then click “Details” for any failed test:

Github UI showing all checks

In this case, I could see that the tests were run by Travis CI, and I could also see the full log output for the tests:

Travis CI UI

Reassuringly, it was also clear that the tests were run in multiple browsers:

Travis CI output

(If your tests are not passing, you can simply keep pushing your commits to that branch, and Travis will re-run the tests for each commit.)

Final step: wait for PR approval or feedback

At this point, I felt confident that my pull request was a good candidate for merging. Even though it made a behavioral change to the code (using strict mode instead of non-strict mode), I predicted it would be uncontroversial, because most JavaScript projects prefer strict mode anyway. Also, strict code will run in non-strict environments, but the reverse is not true. So there is no practical reason to keep the code non-strict.

You might wonder: couldn’t I just remove the 'use strict' now, since we have the fix anyway? Yes, I could, but then it’s always possible that the project will regress in the future, because if anybody changes the code in a way that makes it non-strict, the automated tests won’t catch it. It’s important to guard against future regressions, because as Dale Harvey put it (paraphrasing):

Any untested code will eventually break.
– Harvey’s Law of Software Entropy

This might sound a bit hyperbolic or paranoid, but I’ve seen this truism play out time and time again in my career as an open-source maintainer. If you don’t test something, then eventually someone will commit code (maybe in a seemingly unrelated part of the codebase), and the untested code will start silently failing.

In any case, the maintainer seemed to agree with my choice, because he merged the code and published a new version within a few days. And this fix actually managed to get the “buffer” project down to 0 open PRs and 0 open issues!

PR was merged!

Yay, the PR got merged!

Conclusion

I hope that this blog post demonstrates that it’s neither impossible, nor even particularly difficult, to fix a bug in an open-source project. Open-source projects tend to play by different rules than other code (they’re more heavily tested, they discuss bugs out in the open, etc.), but if you’re comfortable committing code to personal or closed-source projects, then there’s no practical reason you couldn’t apply those same skills to the open-source world.

In the case of “buffer,” I found this issue to be sadly emblematic of a lot of open-source bugs. The module itself is heavily relied upon, and the bug is a showstopper, but it remained unresolved for months despite many people running into it. Lots of folks were offering temporary workarounds, but nobody made an attempt to fix the underlying problem.

I suppose the expectation was that the maintainer, Feross Aboukhadijeh, would fix the issue himself. But as you can see from his Github page, he maintains a lot of projects. Personally, I’m a fairly small-time open-source author, but it’s not uncommon for me to get 100 new Github notifications in a day. Feross undoubtedly gets even more than that.

So if you think Feross is going to drop everything to work on one particular bug, you should consider that he probably has dozens of other high-priority tasks on his to-do list. Perhaps he doesn’t even use Webpack or Babel (he seems to prefer Browserify), which means he himself might not get a lot of value out of this bugfix. It’s also arguable that this is actually a bug in Webpack or Babel, since it’s incorrectly trying to run non-strict code in a strict environment.

My takeaway is this: if a bug is impacting you personally, and you’re the one who ran into it, then you are in the best position to fix it. Asking a project maintainer, who has limited time and possibly limited interest in your issue, to both reproduce the bug based on your description and then fix it, is probably the least efficient way to get the issue resolved.

So the next time you run into a bug in an open-source project and are tempted to open a new issue (or to say “+1” or “me too”), please consider trying to fix it instead. Even if you’re only able to reproduce the issue, it’s an enormous help to the project maintainers, who are constantly working to triage new issues and decipher longwinded bug reports.

Open-source software is not manna from heaven. Nor is it a self-renewing resource that magically appears in your codebase, with zero responsibility on your part. Nope: open-source software is the tireless product of human labor and ingenuity, and it needs help from the community to survive.

Projects with an asymmetry of contribution – i.e., many more people benefiting than contributing – will eventually sputter out and even die due to maintainer burnout. You can help prevent this situation, while also giving yourself the sense of personal satisfaction from helping your fellow coders, by simply opening up a pull request.

a job well done

Comments like this make my day.

 

So if there’s an open-source project you benefit from, consider giving the maintainers the gift of a pull request. Go check out their Github page, open up the list of unresolved issues, and see if anything strikes your fancy. Even if you don’t succeed in fixing a bug, maybe you’ll find ways to improve the documentation, or to make the testing process easier.

There are lots of ways to contribute to open-source software – both big and small. But you won’t know just how easy it can be until you take that first step, and try.

Footnotes

 

1. For an alternate take on the “Github is your résumé” article, see Why GitHub is not your CV. Although it’s debatable whether your Github profile should influence hiring decisions (or if that only favors folks with the luxury of spare time and energy), I think it’s undeniable that your Github presence does influence your job-hunting prospects. So I still find “beef up your Github profile” to be valuable advice for new programmers. (Anyway, other open-source folks will look at your profile to learn more about you!)

 

2. Eagle-eyed readers may notice that yes, I am actually a collaborator on the “buffer” project. However, Feross gave me collaborator rights after only two pull requests (because he’s awesome), and I still felt that I was very unfamiliar with the codebase when I tried to tackle this bug. (For instance, I couldn’t remember how to run the tests, or maybe they had changed since I last submitted code.) So I still think this is a good example of fixing a bug in an unfamiliar repo.