Protected: The state of browser storage in 2014

This content is password protected. To view it please enter your password below:

Offline-first is people-first

A lot of the advice we get as programmers comes with an expiration date. It’s valuable for exactly the lifespan of a particular framework or tool, and then we can safely ignore it when the next framework rolls around.

Other advice is timeless. I consider Joel Spolsky’s blog, Joel on Software, and his associated books on UI design, to fall into this category.

Even though Joel worked at Microsoft on such now-ancient products as Excel 97, his advice rings as true today as it did the last century.

Some of my favorite bits of wisdom from Joel’s blog:

One of my favorite pieces of advice from Joel is one based on empathy. Let’s call it the bathtub principle.

The bathtub principle

Joel says:

Hotel bathtubs have big grab bars. They’re just there to help disabled people, but everybody uses them anyway to get out of the bathtub. They make life easier even for the physically fit.

In the same way, Joel argues that we should design UIs for the least-capable among us – those with poor sight, or limited motor skills, or limited linguistic capacity. The reason being: if you can design a UI that your grandparent can use, then you’ve probably designed a UI that’s pleasant for you to use as well.

As a concrete example, consider the familiar “File Edit …” menu at the top of most Mac OS X windows. These menu items are “half an inch wide and a mile high,” because you can keep scrolling your mouse up past the top, and still be able to click on them.

This UI pattern is a godsend for arthritic folks. They no longer have to struggle with a stubborn mouse that just won’t point at the right spot. But even those of us who are adept with mice will appreciate this feature. It’s just easier to use.

Microsoft only belatedly realized the value of this design, and early versions of Windows forced you to position your mouse at a very precise distance from the edge of the screen in order to hit that Start button. Later versions of Windows fixed this by allowing you to jam your mouse all the way to the corner.

Offline-first

When I try to convince my peers that offline-first is a valuable design principle to embrace, I’m often faced with the response, “But people are rarely offline! Our users might spend a fraction of their time in the subway, or in an airplane, or on the road. Why should we code for an edge case?”

This perspective is badly mistaken. If you focus on the “offline” part of “offline-first,” you’re missing the point.

Offline-first is about more than just users who are literally offline – instead, it’s a corollary of the bathtub principle. If you design your UIs for people who are disconnected or only infrequently connected, you create better interfaces for everyone, including those who are online with fast connections.

That’s because, in the offline-first mindset, your primary data interaction is with the local data store, rather than a remote data store. Local data stores are always faster than remote data stores, so this leads to snappier, and therefore better, user experience. (Don’t believe the marketers trying to sell you their cloud service du jour. The speed of light happens to be a fixed constant in our universe.)

As a real-world example, consider php.net. You may notice that the autosuggestion box is ridiculously fast – much faster than we’re used to seeing with, say, Google’s autosuggestions. You type, and the words appear as quick as your keystrokes. How about that.

If you want to know what enables this otherworldly speed, it’s simple: they’re just using localStorage. When you first visit php.net, a fat 1-megabyte wad of JSON is immediately downloaded into localStorage. By the time you’ve absorbed the UI and clicked the search box, all the APIs are available locally for you to query. That’s it.

Web vs. native

Web developers should take note of this mentality. It’s a trick we’ve been using in the native app space for a long time, and at the risk of sounding smug, we’ve been doing pretty well as a result of it.

I consider myself both an Android developer and a web developer. When I write an Android app, one of the first things I think about is how to design the tables and schemas for the SQLite database. From there, I translate that vision into UI elements that the user actually interacts with, and then only later do I think about how to update the local data store with data from the server.

This is offline-first in a nutshell, and many native developers are already doing it. It’s second nature to us. It’s in our blood. To a lot of native developers I talk to, “offline-first development” is just “development.”

Given the recent success of native apps vs. web apps, this is an area where web developers could really benefit by taking a lesson from native developers. And fortunately, we no longer have to give up on the web as a platform when we give up on the notion of an ever-present Internet connection.

Today, there are a variety of tools that web developers can take advantage of to adopt an offline-first mindset. Most notably, we can take advantage of new in-browser databases like Web SQL and IndexedDB. Here’s a list of some tools that can ease this process.

Just remember: offline-first isn’t only about those who are offline. It’s about making a faster and smoother experience for everyone, regardless of whether they’re offline, intermittently online, online with a poor connection, or surfing on a cool 75 mbps FiOS connection.

Offline-first is for everyone. Offline-first is people-first.

The limitations of semantic versioning

With the recent underscore 1.7.0 brouhaha, there’s been a lot of discussion about the value of semantic versioning. Most of the JavaScript community seems to take the side of Semver, with Dominic Tarr even offering a satirical Sentimental Versioning spec.

Semver is so deeply entrenched in the Node community, that it’s hard to question it without making yourself an easy target for ridicule. Plus, much of the value of Semver comes from everybody collectively agreeing on it, so as with vaccines, dissenters risk being labeled as a danger to the community at large.

To me, though, most of this discussion is missing the point. The issue is not semantic versioning, but rather the build systems we’ve created that assume and promote automatic updates based on semantic versioning – i.e. npm. We wouldn’t be so worried about a breaking change in underscore 1.7.0 if thousands of projects weren’t primed to auto-update their underscore dependencies.

As a developer, I divide my time pretty evenly between Java and JavaScript, so I may have unique perspective here. I love the npm and Node communities, and I’ve been happily using and publishing modules for the past year or so. But as a community, I think it’s time we started being honest with ourselves about what Semver and auto-updating are actually buying us.

Recap of auto-updating

Recently, npm changed its default settings to automatically add dependencies like this:

"some-dependency" : "^1.2.1"

instead of like this:

"some-dependency" : "~1.2.1"

The caret ^ means the dependency will automatically update to the latest minor version relative to 1.2.1 (e.g. 1.3.5), whereas the tilde ~ means the dependency will update to the latest patch version releative to 1.2.1 (e.g. 1.2.7).

So that means that when you do npm install some-dependency, by default your package.json will be modified to do caret-updating rather than the more humble tilde-updating. But in any case, the default has traditionally been some flavor of auto-updating.

Life in Javaland

A comparison with Java is useful. The standard packaging/dependency system in the Java world is Maven, where it’s always been recommended to nail down your dependencies to a very precise version in your pom.xml.

<dependency>
  <groupId>com.somecompany</groupId>
  <artifactId>somepackage</artifactId>
  <version>4.2.0</version>
</dependency>

This is partly a reflection of Java’s enterprisey-ness, where the thought of auto-updating would probably horrify the local Chief Security Officer (or whatever). But it’s also a reflection of the fact that Maven predates Semver’s current boom in popularity, back in the bad old days when there were few expectations about what might change between versions.

However, if you actually want to use auto-updating, Maven does have so-called “snapshot” dependencies (e.g. 1.2.1-SNAPSHOT), which can change anytime, according to the whims of the publisher. Usually these are only used for internal development, with the big uppercase letters SNAPSHOT designed to warn you in a stern voice that what you’re doing is dangerous.

Google actually flirted with auto-updating for awhile with the + modifier in Android dependencies (e.g. 1.2.1+), but now they’ve shied away from it, and if you try to add a + dependency in Android Studio, it’ll throw a big warning at you to let you know you should nail down your dependencies.

So okay, we have one community where auto-updating is a dirty word, and another where it’s the default. They can’t both be right, so as Node developers, let’s consider why we might want to be suspicious of auto-updating.

Drawback 1: minor/patch versions can still break something

Since Node was one of the the first communities to really embrace Semver, it’s tempting to say “it’s different this time,” and that that’s why we can get away with auto-updating but nobody else can. However, humans make mistakes, and Semver isn’t as airtight of a guarantee as we’d like to believe.

When we publish a patch or a minor release, we try our darnedest not to include breaking changes, but sometimes a bug slips through. Or sometimes a change that we consider to be non-breaking actually turns out to be breaking for somebody else further downstream.

I help maintain a fairly large open-source project (PouchDB), so I’ve seen this play out plenty of times. It’s not common, but it happens. One day I push a new pull request, and suddenly the Travis tests are failing due to some mysterious error. My first instinct is to assume the bug is in the pull request, but after some digging I realize that the master branch is failing too. What gives?

Well, the author of the foo module, which is depended upon by the bar module, which is depended upon by PouchDB, decided to change something in a patch release, but we weren’t prepared for it upstream, so now our tests are broken. This is an annoying situations to debug, because a git bisect is not enough. The same exact code that worked yesterday is broken today.

Typically what I do in these cases is step through the code to identify the offending module, check to see when the last master branch was pushed, try to figure out what versions of that module were published in the interim, and then either file a bug on that module or write new code to work around it.

It’s a tedious process. And it can be especially irritating, because when you’re writing a PR, you’re usually trying to fix some unrelated issue. Hunting down bugs in somebody else’s module is just an unwelcome distraction.

Drawback 2: it’s a security problem

This is such a big issue, I’m surprised nobody else in the Node community has mentioned it, to my knowledge.

npm has made it trivially easy to publish modules, which is awesome. I love that when I want to publish a new Node module, it’s just an npm publish away. Whereas if I want to publish a Java project to Maven Central, there’s a lot of ceremony in configuring my Maven credentials, doing a gradle uploadArchives, and then clicking around in the Sonatype Nexus interface. It’s a pain.

npm’s ease-of-use has a weakness, though. Given that the majority of Node projects use caret- or tilde-versioning, and given that it’s so easy to npm publish, what’s to stop some nogoodnik from stealing a prolific Node developer’s laptop (let’s say Substack or Mikeal Rogers), and then publishing some malware as a patch release to all their popular libraries? Bam, suddenly everybody’s continuous integration systems are downloading malware from npm and pushing it out to thousands of running systems.

You may trust Substack, but do you trust that he’s secured his laptop?

Of course, if you avoid caret- and tilde-versioning in your package.json, then this isn’t a problem. You can already inspect the code you’re running, and make sure you trust it. One might argue that this is the “more secure” approach, but that would negate one of auto-updating’s main selling points, which is that patch releases can supposedly contain security patches.

Drawback 3: auto-updating has a limited shelf life

This is the point I would really like to get across to other Node developers.

Right now it’s mostly fine for dependencies to break upstream, because we can remain pretty confident that if we file a bug on a project, the author will respond quickly and fix it.

For instance, a month ago I found a bug in Express, and not only did the maintainer (the awesome Doug Wilson) fix it in a matter of minutes, he also took it upon himself to come into the express-pouchdb project and submit a bunch of PRs. Experiences like that really exemplify what’s great about OSS development.

However, right now Node and npm are in their heyday. Changes are coming fast and furious, the community is active and engaged, and so of course caret- and tilde-versioning are pretty low-risk. Even if a bug is introduced in a minor or patch version upstream, it’ll probably get resolved quickly.

Imagine a future after the current boom, though, where npm occupies a position more like CPAN – still useful, but long in the tooth. Popular modules have fallen into disrepair, GitHub issues go unresolved. Maybe everyone’s moved on to Go.

In this post-apocalpytic future for Node, I can easily imagine developers saying, “Oh yeah, npm? That’s that thing where whenever you require() something, you have to immediately go in and remove all the tildes and carets.” Or worse, maybe someone will have to write a proxy in front of npm to act as a sort of Wayback Machine, shrinkwrapping each module to the dependencies it had when it was last published.

Don’t kid yourself, Noders – someday this future will be upon us. Project maintainers will eventually lose interest, move on to other projects, or maybe find that the obligations of family/work/whatever have reduced their ability to respond to bugs on GitHub. Maintainers will even die – yes, young coder, you too are mortal – and ideally whatever software we write should remain useful even after we’re gone. Ideally.

I don’t have the answers, but I do know that we as a community need to start preparing for eventualities like this. Right now it may feel like the party’s never going to end, but eventually the booze will run out, the music will stop, and we will have to make a sober evaluation of our software’s legacy. Recognizing the limitations of Semver and caret- and tilde-versioning are a step in that direction.

Postscript: why do people rebel against Semver?

I have a hunch about this: I think it’s because the larger culture hasn’t adjusted to Semver yet.

For someone steeped in Node practices, it may be obvious that version 3.0.0 of a module has introduced breaking changes since 2.0.0. To the average layman, though, a major version change indicates some big overhaul of the software, along with a slew of new features. This is a holdover from the shrinkwrap era, when a new major version meant a shiny new box in the store, and it’s still the prevailing view in popular understanding: “web 2.0,” “government 2.0,” etc.

What Semver ignores is that bumping a major version has marketing value.

We definitely experienced that recently in PouchDB. I found it funny that after we released version 3.0.0, we suddenly got a lot more traffic and stargazers, and we were even featured in JavaScript Weekly. However, the biggest change in 3.0.0 was subtracting features – that’s why we incremented the major version!

By contrast, the previous version, 2.2.3, constituted a huge internal restructuring that brought better stability, but you wouldn’t really know it, since it was just a patch version. And it got much less attention than 3.0.0.

I suppose the payoffs from incrementing a major version may dwindle once you get into Chrome-like territory with version 36, 37, etc., but for the low versions, it definitely seems to help boost your project’s public visibility.

Web SQL Database: In Memoriam

All signs seem to indicate that Apple will finally ship IndexedDB in Safari 7.1 sometime this year. This means that Safari, the last holdout against IndexedDB, will finally relent to the inevitable victory of HTML5’s new, new storage engine.

So I thought this would be a good time to hold a wake for Web SQL – that much maligned, much misunderstood also-ran that still proudly ships in Safari, Chrome, Opera, iOS, and every version of Android since 2.0.

Often in the tech industry we’re too quick to eviscerate some recently-obsoleted technology (Flash, SVN, Perl), because, as with politics and religion, nothing needs discrediting so much as the most recently reigning zeitgeist. Web SQL deserves better than that, though, so I’m here to give it its dues.

openDatabase('mydatabase', 1, 'mydatabase', 5000000, function (db) {
  db.transaction(function (tx) {
    tx.executeSql('create table rainstorms (mood text, severity int)', 
        [], function () {
      tx.executeSql('insert into rainstorms values (?, ?)', 
          ['somber', 6], function () {
        tx.executeSql('select * from rainstorms where mood = ?', 
            ['somber'], function (tx, res) {
          var row = res.rows.item(0);
          console.log('rainstorm severity: ' + row.severity + 
              ',  my mood: ' + row.mood);
        });
      });
    });
  }, function (err) { 
    console.log('boo, transaction failed!: ' + err); 
  }, function () {
    console.log('yay, transaction succeeded!');
  });
});

The gist of the story is this: in 2009 or so, native iOS and Android apps were starting to give the web a run for its money, and one area where the W3C recognized some room for improvement was in client-side storage. So Apple and Google hacked up the Web SQL Database API, which basically acknowledged that SQLite was great, mobile devs on iOS and Android loved it, and so both companies were happy to ship it in their browsers [1].

However, Microsoft and (especially) Mozilla balked, countering that the SQL language is not really a standard, and having one implementation in WebKit didn’t meet the “independent implementations” requirement necessary to be considered a serious spec.

So by 2010, Web SQL was abandoned in favor of IndexedDB, which is a document store that can be thought of as the NoSQL answer to Web SQL. It was designed by Nikunj Mehta at Oracle (of all places), and by 2014 every major browser, including IE 10 and Android 4.4, has shipped a version of IndexedDB, with Safari expected to join later this year.

As a rank-and-file developer, though, who’s worked with both Web SQL and IndexedDB, I can’t shake the feeling that the W3C made the wrong choice here. Let’s remember what Web SQL actually gave us:

  • SQLite in the browser. Seriously, right down to the sqlite_master table, fts indexes for full-text search, and the idiosyncratic type system. The only thing you didn’t get were PRAGMA commands – other than that, you still had transactions, joins, binary blobs, regexes, you name it.
  • 5MB of storage by default, up to 50MB or more depending on the platform, to be confirmed by the user with a popup window at various increments.
  • The ability to easily hook into the native mobile SQLite databases, e.g. using the SQLite plugin for Cordova/PhoneGap.
  • A high-level, performant API based on an expressive language most everybody knows (SQL).
  • A database which had already been battle-tested on mobile devices, i.e. the place where performance matters.
  • A database which, let’s not forget, is also open-source.

Now what we have instead is IndexedDB, which basically lets you store key/value pairs, where the values are JavaScript object literals and the keys can be one or more fields from within that object. It supports gets, puts, deletes, and iteration. In Chrome it’s built on Google’s LevelDB, whereas in Firefox it’s actually backed by SQLite. In IE, who knows.

Enough has been written already about the failure of IndexedDB to capture the hearts of developers. And the API certainly won’t win any beauty contests:

html5rocks.indexedDB.open = function() {
  var version = 1;
  var request = indexedDB.open("todos", version);

  // We can only create Object stores in a versionchange transaction.
  request.onupgradeneeded = function(e) {
    var db = e.target.result;

    // A versionchange transaction is started automatically.
    e.target.transaction.onerror = html5rocks.indexedDB.onerror;

    if(db.objectStoreNames.contains("todo")) {
      db.deleteObjectStore("todo");
    }

    var store = db.createObjectStore("todo",
      {keyPath: "timeStamp"});
  };

  request.onsuccess = function(e) {
    html5rocks.indexedDB.db = e.target.result;
    html5rocks.indexedDB.getAllTodoItems();
  };

  request.onerror = html5rocks.indexedDB.onerror;
};

Instead of retreading the same old ground, though, I’d like to give my own spin on the broken promises of IndexedDB, as well as acknowledge where it has succeeded.

The death of Web SQL: a play in 1 act

To understand the context of how IndexedDB won out over Web SQL, let’s flash back to 2009. Normally you’d need sleuthing skills to solve a murder mystery, but luckily for us the W3C does everything out in the open, so the whole story is publicly available on the Internet.

The best sources I’ve found are this IRC log from late 2009, the corresponding minutes, the surprisingly heated follow-up thread, and Mozilla’s June 2010 blog post acting as the final nail in the coffin [3].

Here’s my retelling of what went down, starting with the 2009 IRC log:

PROLOGUE

Six houses, all alike in dignity,
In fair IRC, where we lay our scene,
From ancient grudge to new mutiny,
Where civil blood makes civil hands unclean.
From Oracle, that SQL seer of IndexedDB,
To Google, the stronghold of search,
We add Mozilla, the Web SQL killa,
And Apple, peering from its mobile perch.
Here, a storage war would set keys to clack,
Tongues to wag, and specs to shatter, 
There was also Microsoft and Opera,
Who don't really seem to matter.

THE PLAYERS

NIKUNJ MEHTA, of House ORACLE, an instigator
JONAS SICKING, of House MOZILLA, an assassin
MACIEJ STACHOWIAK, of House APPLE, a pugilist
IAN FETTE, of House GOOGLE, a pleader
CHARLES MCCATHIENEVILE, of House OPERA, a peacemaker

ACT 1

SCENE: A dark and gloomy day in Mountain View, or
perhaps a bright and cheery one, depending on your 
IRC client's color scheme.

OK, enough joking around. Let’s let the players tell the story in their own words. I’ll try not to editorialize too much [2].

Jonas Sicking (Mozilla):

we’ve had a lot of discussions
primarily with MS and Oracle, Oracle stands behind Nikunj
we’ve talked to a lot of developers
the feedback we got is that we really don’t want SQL

Ian Fette (Google):

We’ve implemented WebDB, we’re about to ship it

Maciej Stachowiak (Apple):

We’ve implemented WebDB and have been shipping it for some time
it’s shipping in Safari

(At the time, Web SQL was called “Web DB,” and IndexedDB was called “Web Simple DB,” or just “Nikunj.”)

So basically, Sicking (of Mozilla) throws down the gauntlet: users don’t want SQL, and the solution proposed by Nikunj Mehta is backed by all three of Oracle, Microsoft, and Mozilla. Fette (of Google) and Stachowiak (of Apple) respond huffily that they’re already shipping Web SQL.

Ian Fette (Google):

we’re also interested in the Nikunj One

Fette makes a concession here. Recall that Google was quick to implement both Web SQL and IndexedDB, at least in Chrome. The Android stock browser/WebView didn’t get IndexedDB until version 4.4.

Ian Fette (Google):

the Chrome implementation shares some but not quite all of the code
beside shipping it, web sites have versions that target the iPhone and use it
we can’t easily drop it in the near future for that reason

However, Google doesn’t want to stop shipping Web SQL: that genie’s already out of the bottle, and web sites are already using it.

Later on they discuss LocalStorage. This is an interesting part of the conversation: it’s acknowledged that LocalStorage is limited because it’s synchronous only. It’s suggested that instead of IndexedDB, they could simply extend LocalStorage, but nobody bites on that proposal.

Jeremy Orlow (Google):

Google is not happy with the various proposals

Adrian Bateman (Microsoft):

Microsoft’s position is that WebSimpleDB is what we’d like to see
we don’t think we’ll reasonably be able to ship an interoperable version of WebDB
trying to arrive at an interoperable version of SQL will be too hard

Here we arrive at one of the best arguments against Web SQL: creating a separate implementation to match the WebKit version would just be too hard – Microsoft and Mozilla would have to rewrite SQLite itself, with all its funky idiosyncrasies, or just include it wholesale, in which case it’s not an independent implementation.

Chris Wilson (Microsoft):

it seems with multiple interoperable implementations
that you can’t really call it stillborn
when we started looking at WebDB
the reason we liked Nikunj was that it doesn’t impose
but it has the power
the part that concerned us with WebDB is that it presupposes SQLite
we’re not really sure

Ivan Herman (W3C):

Proposal
all the browsers shipping WebDB are WebKit based
proposal: we move WebDB to WebKit.org, and we kill it as a deliverable from this group

Charles C. McCathieNevile (Opera):

I think we’re likely to ship it

At this point, it’s pretty much taken for granted that Web SQL will be dropped from the HTML5 spec. The only thing they’re deciding now is whether to give it to a nice farm family upstate, or take it out back and shoot it.

Google won’t let it go, though. And several times, the speakers are even reminded that they’ve run out of time for discussion of web storage. Google makes the case for full-text search:

Sam Ruby (Apache):

Apple and Google have expressed an interest in added full text search to the api we’ve used

Jeremy Orlow (Google):

that’s extremely important to Google too

Ian Fette (Google):

To use this for gmail we have to be able to do fulltext and we don’t think we can do that performant in JS so we would like native code to do that.

Nikunj Mehta (Oracle):

In some discussions we can provide keyword/context, but fulltext incoroprates some more concepts that can get hairy in different languages. It should perform aequately with a qiuck index.

Spec was originally written on berkeleyDB which had no way to retrieve object based on key index. had a way to join dbs but we added a way to lookup an object from the index and treating the indices, and use of joins dropped.

So Google was really adamant that they needed full-text search for Gmail, but nobody else besides Apple was convinced.

Here’s an interesting experiment: open up Gmail in Chrome, and set your developer tools to emulate a mobile device, say the Nexus 4. Perform a search, and then check out the Resources tab to see if Google is doing anything interesting with storage.

Gmail uses Web SQL

Gmail client storage on mobile browsers.

If you’re not sure what you’re looking at, I’ll let you in on the secret: that’s a virtual table, created with the full-text search (FTS) capabilities of SQLite. Note that IndexedDB is not being used at all. And if you use desktop Gmail, neither database is used.

So clearly, Google has already voted with their code to support Web SQL, at least on mobile.

Correction: it turns out that’s not actually a FTS table – it’s just a regular table with some fancy triggers. Queries are cached, but they’re not actually run on the client. Still, FTS is indeed possible in Web SQL, and I think my point about Google preferring Web SQL over IndexedDB still stands.

Back to the IRC log:

Jeremy Orlow (Google):

In Gmail example, if you are searching for a to and from address you might have zillions of addresses so it might be a big burden on the system

Ian Fette (Google):

In terms of the world, Mozilla won’t implement WebDB, and we want to get Gmail working with a DB and there are others who want to get apps working. Plus or minus some detail, it seems Web Simple Database can do taht

Famous last words. It’s five years later, and clearly Google still doesn’t think IndexedDB is ready for primetime, at least in Gmail. Maybe IndexedDB v2 will save the day, though: the working draft contains a proposal for FTS, among other goodies.

The email follow-up: shots fired

After the 2009 meeting, there’s this follow-up email thread, which makes for great reading if you want to see what a W3C fist fight looks like. Curiously, nobody at Google joins in the fray, and we have only Stachowiak at Apple rising to Web SQL’s defense:

Maciej Stachowiak (Apple):

We actually have a bit of a chicken-and-egg problem here. Hixie has
said before he’s willing to fully spec the SQL dialect used by Web
Database. But since Mozilla categorically refuses to implement the
spec (apparently regardless of whether the SQL dialect is specified),
he doesn’t want to put in the work since it would be a comparatively
poor use of time.

Great point. It’s a little disingenuous of Mozilla to cite their own non-participation as a lack of independent implementations.

Maciej Stachowiak (Apple):

At the face-to-face, Mozilla representatives said that most if not all of the developers they spoke to said they wanted “anything but SQL” in a storage solution. This clashes with our experience at Apple, where we have been shipping Web Database for nearly two years now, and where we have seen a number of actual Web applications deployed using it (mostly targeting iPhone).

To me, this argument is so obvious it’s heartbreaking: SQL is a very newbie-friendly language, and iOS (and Android) developers are already familiar with SQLite. So why fix what ain’t broke?

Maciej Stachowiak (Apple):

It seems pretty clear to me that, even if we provide Web SimpleDB as an alternative, our mobile-focused developers will continue to use theSQL database. First, they will not see a compelling reason to change. Second, SimpleDB seems to require more code to perform even simple tasks (comparing the parallel examples in the two specs) and seems to be designed to require a JS library to be layered on top to work well. For our mobile developers, total code size is at a premium. They seem less willing than desktop-focused Web developers to ship large JS libraries, and have typically used mobile-specific JS libraries or aggressively pruned versions of full JS libraries.

An excellent point, and the Gmail example shows that this prediction has been borne out in practice. Jonas Sicking responds:

Jonas Sicking (Mozilla):

If we do specify a specific SQL dialect, that leaves us having to implement it. It’s very unlikely that the dialect would be compatible with SQLite (especially given that SQLite uses a fairly unusual SQL dialect with regards to datatypes) which likely leaves us implementing our own SQL engine.

I definitely agree that we don’t want a solution that punishes the mobile market. I think the way to do that is to ensure that SimpleDB is useful even for mobile platforms.

Sicking is right about the difficulty that Mozilla faces here, but in hindsight he was a little optimistic about IndexedDB on mobile. HTML5 is only now starting to catch up to native apps in terms of performance, while big players like Facebook have stopped betting on it entirely.

Maciej Stachowiak (Apple):

> Indeed. I still personally wouldn’t call it multiple independent
> implementations though.

Would you call multiple implementations that use the standard C library independent? Obviously there’s a judgment call to be made here. I realize that in this case a database implementation is a pretty key piece of the problem. But I also think it would be more fruitful for you to promote solutions you do like, than to try to find lawyerly reasons to stop the advancement of specs you don’t (when the later have been implemented and shipped and likely will see more implementations).

Stachowiak is clearly bitter that Web SQL got rejected for what he cites as “lawyerly” reasons. He continues:

Maciej Stachowiak (Apple):

I don’t think SimpleDB is useless for mobile platforms. You certainly *could* use it. But it does have three significant downsides compared to the SQL database: (1) it’s very different from what developers have already (happily) been using on mobile; (2) the target design point is that it’s primarily expected to be used through JavaScript libraries layered on top, and not directly (so you have to ship more code over the wire); and (3) for more complex queries, more of the work has to be done in JavaScript instead of in the database engine (so performance will likely be poor on low-power CPUs). For these reasons, I expect a lot of mobile developers will stick with the SQL database, even if we also provide something else.

Sicking had admitted earlier that he was “… not experienced enough to articulate all of the reasons well enough.” So after this onslaught from Stachowiak, another Mozilla employee, Robert O’Callahan, rushes to his colleague’s aid:

Robert O’Callahan (Mozilla):

> Would you call multiple implementations that use the standard C library
> independent? Obviously there’s a judgment call to be made here.

Yes. Multiple implementations passing query strings (more or less) verbatim to SQLite for parsing and interpretation would not pass that judgement call… IMHO, but wouldn’t you agree?

I think the problem is rather coming up with a SQL definition that can be implemented by anything other than SQLite (or from scratch, of course). One weird thing about SQLite is that column types aren’t enforced. So either the spec requires something like SQLite’s “type affinity” (in which case it doesn’t fit well with most other SQL implementations, and precludes common performance optimizations), or it requires strict type checking (which perhaps you could implement in SQLite by adding CHECK constraints?). But the latter course is probably incompatible with deployed content, so contrary to Jonas I expect the spec would be implementable *only* on top of SQLite (or from scratch, of course), or perhaps some unnatural embedding into other engines where all values are text or variants. Experience with alternative implementations would be important.

All valid points. SQLite has its own quirks, and Web SQL is basically a thin layer over SQLite. Although Stachowiak does point out later that “WebKit has around 15k lines of code which implement asynchronicity, do checking and rewrites on the queries, export DOM APIs, manage transactions, expose result sets, etc.”

O’Callahan continues:

Robert O’Callahan (Mozilla):

Do you have easy access to knowledge about the sort of complex queries these mobile apps do? That would be very useful.

To Apple’s discredit, such data was never provided (as far as I know). Although again, the example with Gmail is pretty instructive here.

Robert O’Callahan (Mozilla):

We already ship SQLite and implementing Web Database using SQLite would definitely be the path of least resistance for us. We’re just concerned it might not be the right thing for the Web.

This, of course, is why Mozilla is awesome. Whatever the advantages of Web SQL may have been, you can’t say that Mozilla didn’t have the best interests of the web in mind when they killed it.

Stachowiak counters somewhat weakly, ceding to most of O’Callahan’s points, but taking up the utilitarian argument that Web SQL is better for developers:

Maciej Stachowiak (Apple):

It seems that a database layer with a good amount of high-level concepts (including some kind of query language) is likely to be easier to code against directly for many use cases. Thus, application programmers, particularly in environments where extra abstraction layers are particularly costly

[Furthermore,] some mobile web developers have existing investment in SQL in particular, and do not appear to have had problems with it as a model. It would be a shame to abandon them, as in many ways they have been better pioneers of offline Web apps than mainline desktop-focused Web developers.

It seems plausible to me that SQL is not the best solution for all storage use cases. But it seems like a pretty aggressive position to say that, as a result, it should be out of the Web platform (and not just augmented by other facilities). It seems like that would underserve other use cases

Smelling blood, O’Callahan moves in for the kill:

Robert O’Callahan (Mozilla):

> Thus, it did not seem there would be a practical benefit to
> specifying the SQL dialect. Thus, those present said they were satisfied to
> specify that SQLite v3 is the dialect.

What exactly does that mean? Is it a specific version of SQLite? Almost every SQLite release, even point releases, adds features.

The fact that SQLite bundles new features, bug fixes and performance improvements together into almost every release makes it especially difficult to build a consistent Web API on. Have you frozen your SQLite import to a particular version? Or do you limit the SQLite dialect by parsing and validating queries? Or do you allow the dialect to change regularly as you update your SQLite import?

I thought there was a consensus that pointing to a pile of C code isn’t a good way to set standards for the Web. That’s why we write specs, and require independent implementations so we’re not even accidentally relying on a specific pile of C code. This seems to be a departure from that.

Another great point from O’Callahan. I recall writing a Cordova app where I actually had to fetch the SQLite version from the sqlite_master table, in order to figure out what features of FTS were supported. It wasn’t pretty. (Although, to be fair, we web developers are no strangers to such hacks; just take a look at the jQuery source code some time.)

There’s a little more back-and-forth in the thread, and Charles McCathieNevile (of Opera) jumps in to mediate a bit. They discuss performance, and whether any guarantees can be made about the big-O performance of IndexedDB. Ultimately, Nikunj Mehta has the last word:

Nikunj Mehta (Oracle):

WebSimpleDB will always remain easy and good to use directly, even though it will also support those who want to use libraries on top. Whether people would still prefer to use libraries or not, will depend on their use case. Specific use cases would help to find a more objective solution to your issue.

So here we arrive at a major selling point of IndexedDB: it’s low-level – much, much lower than SQL – so it’s not designed to be used directly by developers. In fact, I tend of think of IndexedDB as a thin transactional layer over LevelDB (on Chrome, anyway), which itself is best described as a tool for building databases rather than a database itself.

Also, from working on PouchDB, where we support all three of Web SQL, IndexedDB, and LevelDB, I can confirm that the first is the easiest to work with, and the last is the hardest. IndexedDB is definitely a far cry from raw LevelDB, but it has nothing close to the flexibility provided by Web SQL’s diverse toolkit. (Disclaimer: the other authors may disagree.)

Broken promises of IndexedDB: did library authors fill the gap?

So let’s return to Mehta’s original point: IndexedDB was designed to be low-level enough that the void could be filled by JavaScript libraries. In the same way that nobody uses the native XMLHttpRequest or DOM APIs ever since jQuery came along, the assumption was that library authors would pick up the slack for IndexedDB’s cumbersome API.

And although I count myself as a member of that cohort (hint, hint: try PouchDB), with the benefit of hindsight I’d like to evaluate how well that plan has played out:

  • To date, there are plenty of libraries built on top of IndexedDB/WebSQL, although none has achieved jQuery-like dominance yet. (Maybe PouchDB will.)
  • On the other hand, native apps continue to trounce web apps on mobile.
  • Meanwhile, Google and (especially) Apple have dragged their feet on IndexedDB, slowing its adoption on mobile devices.
  • Although arguably, they had no choice, given the performance needs of mobile devices. One of the downsides of a low-level JavaScript API is that the rest has to be implemented in, well, JavaScript, which tends to be slower than native C code. Unsurprisingly, in our performance tests with PouchDB, we’ve found that the Web SQL backend is nearly always faster than the IndexedDB backend, sometimes by a decimal order of magnitude.

My own take on IndexedDB

I have a few personal theories as to why IndexedDB still hasn’t really taken off, and they mostly circle back to the same points made by Stachowiak and Fette five years ago.

First off, it’s hard to get developers to care about offline functionality for any platform other than mobile – you just don’t have the same problems with poor performance and spotty Internet connections. And on mobile devices, Web SQL is king (sorry Windows Phone), meaning that in practice mobile devs can just forget that IndexedDB exists.

Secondly, IndexedDB doesn’t offer much beyond what you can already get with LocalStorage, and its API is a lot tougher to understand. It’s asynchronous, which is already a challenge for less experienced developers. And if you don’t need to do any fancy pagination, then usually a plain old localStorage.get()/localStorage.put() along with some JSON parsing/serializing will serve you just fine.

Compare this with the Web SQL database, which is also asynchronous, but which provides a fluent query language and a bevy of additional features, one of the most underrated of which is full-text search. Just think about what a client-side search engine with support for tokenization and stemming (the Porter stemmer is baked right in!) could do for your app’s comboboxes, and then compare that with IndexedDB’s meager offerings.

Another theory is that Apple’s criticisms of IndexedDB became a self-fulfilling prophesy. Clearly they’ve put more effort into Web SQL than IndexedDB, the spec be damned, and by failing to implement IndexedDB in Safari and iOS, they’ve probably stunted its growth by years.

Finally, it’s worth acknowledging that IndexedDB is just a crummy API. If you look at the HTML5 Rocks example and don’t start having flashbacks to xmlHttpRequest.onreadystatechange = function() ..., then you haven’t been doing web dev for very long.

However, nobody wants to have to resort to a third-party wrapper unless it offers the kinds of benefits that jQuery gave us over the DOM – interoperability, robustness, and an API that’s so convenient and understandable that a generation of web developers probably believes the $ is just a part of the language.

PouchDB: the jQuery of databases

Of course, this is exactly the problem we’re trying to solve with PouchDB. (I know, here comes the shameless plug.) PouchDB isn’t just a great tool for syncing data between JavaScript environments and CouchDB; it’s also a general-purpose storage API designed to work well regardless of the browser it’s running in. Think of it as jQuery for databases.

Currently, PouchDB falls back to Web SQL on browsers that don’t support IndexedDB, and it can fall back to a remote CouchDB on browsers that don’t support either. In the future, we’ll also support LocalStorage and a simple in-memory store, which will basically extend our reach everywhere, and give developers a drop-in database that “just works.”

Of course, we also do a lot of magic under the hood to work around browser bugs, in both Web SQL and IndexedDB. And there are a lot of bugs – enough for a whole other blog post. So that’s another way that we’re like jQuery.

Mostly, though, we’re just trying to move HTML5 storage forward, and to fulfill the original vision of web developers having access to neat JavaScript libraries built on top of IndexedDB. If PouchDB (or some similar library) manages to achieve mainstream success, then Nikunj Mehta will be vindicated, regardless of how developers feel about IndexedDB itself.

Conclusion

Web SQL will probably never truly die. Google and Apple are invested enough that they can’t remove it from their browsers without breaking thousands of mobile apps and web sites (including their own).

And when I write web apps, I tend to care enough about mobile performance that, until IndexedDB catches up, I’ll probably continue giving a nod to Web SQL with code like this:

var pouch = new PouchDB('mydb', {adapter: 'websql'});
if (!pouch.adapter) { // fall back to IndexedDB
  pouch = new PouchDB('mydb');
}

Web SQL, I salute you. You’re no longer in our hearts, but you’ll remain in our pockets for years to come.

Disclaimer: I apologize if I’ve misquoted anyone or taken what they said out of context. Please feel free to rip me a new one in the comments, on Twitter, or on Hacker News.

Notes:

[1]: In fact, Web SQL had been shipping in Safari since 2007. Presumably they wanted to test it out in the wild before committing to a formal spec.

[2]: I editorialize a lot.

[3]: I’m skipping some details of the story; Web SQL certainly wasn’t killed in a day. The criticisms of Web SQL, especially the “SQLite is not a standard” part, can be traced back to an April 2009 blog post and email by Vladimir Vukicevic of Mozilla. The conclusion reached by both Stachowiak and Sicking at the end of that thread was, to quote Stachowiak, that “the best path forward is to spec a particular SQL dialect, even though that task may be boring and unpleasant and not as fun as inventing a new kind of database.” Nikunj Mehta disagreed, and then went on to invent a new kind of database.

What happened to PouchDroid?

Just in case anyone is wondering about PouchDroid, a few things changed since I started working on it three months ago:

  1. I found out CouchBase Lite exists. So you may want to try that instead of some crazy JavaScript thing.
  2. I started working on PouchDB itself.

I do plan on eventually updating PouchDroid, but for the time being there’s still plenty to be done in PouchDB. Long-term goals will be:

  1. API parity with PouchDB (at least, the important parts).
  2. Get a rigorous suite of tests in place.
  3. Once the tests are passing, port it to pure Java.

If you still want to use PouchDroid in a Cordova/PhoneGap app, I’d recommend using PouchDB itself and the SQLite plugin instead. PouchDroid uses a slightly modified version of the SQLite plugin, which probably won’t give you any noticeable performance improvements. As for the XHR overrides, I’ll probably take that out, so you should just set up CORS on your CouchDB.

If you want to use PouchDroid in a Java Android project, it’s still pretty nifty for small sync tasks (e.g. the PouchDroidMigrationTask, which can sync a SQLite database to CouchDB). But if you want something more full-featured and reliable, wait for 1.0.

PouchDroid v0.1.0 is out!

PouchDroid

PouchDroid

I’ve managed to nail down the bulk of the API for PouchDroid, and I’m releasing a super-tentative 0.1.0 version today, just in time for Christmas.

Obligatory caveat: Please do not use it in production yet. God no, not yet. Your user’s data is more precious to me than that.

Do go try it out, though! There’s now a well-thought-out README and a “Getting Started” tutorial. And a dorky logo I made in GIMP. (Aw yeah.)

So, it’s official: PouchDB’s empire has spread to Android. And now that JavaScript is a first-class citizen on iOS, could that platform be far behind?

Porting PouchDB to Android: initial work and thoughts

Update: CouchDroid has been renamed to PouchDroid, and I’ve released version 0.1.0. The code examples below are out of date. Please refer to the instructions and tutorials on the GitHub page.

I love PouchDB. It demonstrates the strength and flexibility of CouchDB, and since it supports both WebSQL and IndexedDB under the hood, it obviates the need to learn their separate APIs (or to worry about the inevitable browser inconsistencies). If you know CouchDB, you already know PouchDB.

And most importantly, it offers two-way sync in just a few lines of code. To me, this is magical:

var db = new PouchDB('mydb')
db.replicate.to('http://foo.com:5984/db', {continuous : true});
db.replicate.from('http://foo.com:5984/db', {continuous : true});

I wanted to bring this same magic to Android, so I started working on an Android adapter for PouchDB. I’m calling it CouchDroid, until I can think of a better name. The concept is either completely crazy or kinda clever, which is why I’m writing this post, in the hopes of getting early feedback.

The basic idea is this: instead of rewriting PouchDB in Java, I fire up an invisible WebView that runs PouchDB in JavaScript. I override window.openDatabase to redirect to the native Java SQLite APIs, so that all of the SQL queries run on a background thread (instead of tying up the UI thread, like they normally would). I also redirect XMLHttpRequest into Java, giving me control over the HTTP request threads, and helping avoid any messy server-side configuration of CORS/JSONP for web security.

Result: it works on a fresh CouchDB installation, no assembly required. And it’s actually pretty damned fast.

The code is still a little rough around the edges, but it can already do bidirectional sync, which is great. Callbacks look weird in Java, but static typing, generics, and content assist make the Pouch APIs a dream to work with. (My precious Ctrl+space works!)

Here’s an example of bidirectional sync between two Android devices and a CouchDB server using CouchDroid. First, we define what kinds of documents we want to sync by extending PouchDocument. This is Android, so let’s store some robots:

public class Robot extends PouchDocument {

  private String name;
  private String type;
  private String creator;
  private double awesomenessFactor;
  private int iq;
  private List<RobotFunction> functions;

  // constructors, getters, setters, toString...
}
public class RobotFunction {

  private String name;

  // constructors, getters, setters, toString...
}

I’m using Jackson for JSON serialization/deserialization, which means that your standard POJOs “just work.” The PouchDocument abstract class simply adds the required CouchDB fields _id and _rev.

In our Activity, we extend CouchDroidActivity (needed to set up the Java <-> JavaScript bridge), and we add a bunch of robots to a PouchDB<Robot>:

public class MainActivity extends CouchDroidActivity {

  private PouchDB<Robot> pouch;

  // onCreate()...

  @Override
  protected void onCouchDroidReady(CouchDroidRuntime runtime) {

    pouch = PouchDB.newPouchDB(Robot.class, runtime, "robots.db");

    List<Robot> robots = Arrays.asList(
      new Robot("C3P0", "Protocol droid", "George Lucas", 0.4, 200, 
        Arrays.asList(
          new RobotFunction("Human-cyborg relations"),
          new RobotFunction("Losing his limbs"))),
      new Robot("R2-D2", "Astromech droid", "George Lucas", 0.8, 135,
        Arrays.asList(
          new RobotFunction("Getting lost"),
          new RobotFunction("Having a secret jetpack"),
          new RobotFunction("Showing holographic messages")))    
    );

    pouch.bulkDocs(robots, new BulkCallback() {

        @Override
        public void onCallback(PouchError err, List<PouchInfo> info) {
          Log.i("Pouch", "loaded: " + info);
        }
    });
  }
}

Meanwhile, on another Android device, we load a completely different list of robots:

List<Robot> robots = Arrays.asList(
  new Robot("Mecha Godzilla", "Giant monster", "Toho", 0.4, 82, 
    Arrays.asList(
      new RobotFunction("Flying through space"),
      new RobotFunction("Kicking Godzilla's ass"))),
  new Robot("Andy", "Messenger robot", "Stephen King", 0.8, 135,
    Arrays.asList(
      new RobotFunction("Relaying messages"),
      new RobotFunction("Betraying the ka-tet"),
      new RobotFunction("Many other functions"))),
  new Robot("Bender", "Bending Unit", "Matt Groening", 0.999, 120,
    Arrays.asList(
      new RobotFunction("Gettin' drunk"),
      new RobotFunction("Burping fire"),
      new RobotFunction("Bending things"),
      new RobotFunction("Inviting you to bite his lustrous posterior")))
  );

And, of course, we set up bidirectional replication on both pouches:

String remoteCouch = "http://user:password@myhost:5984/robots";
Map<String, Object> options = Maps.quickMap("continuous", true);

pouch.replicateFrom(remoteCouch, options);
pouch.replicateTo(remoteCouch, options);

Wait a few seconds (or pass in a callback), and voilà! You can check the contents on CouchDB:
C3P0 on CouchDB

And then check the contents of each PouchDB on Android:

pouch.allDocs(true, new AllDocsCallback<Robot>() {

@Override
  public void onCallback(PouchError err, AllDocsInfo<Robot> info) {
    List<Robot> robots = info.getDocuments();
    Log.i("Pouch", "pouch contains " + robots);
  }
});

This prints:

pouch contains [Robot [name=Bender, type=Bending Unit, creator=Matt Groening, 
awesomenessFactor=0.999, iq=120, functions=[RobotFunction [name=Gettin' drunk], RobotFunction 
[name=Burping fire], RobotFunction [name=Bending things], RobotFunction [name=Inviting you to bite 
his lustrous posterior]]], Robot [name=C3P0, type=Protocol droid, creator=George Lucas, 
awesomenessFactor=0.4, iq=200, functions=[RobotFunction [name=Human-cyborg relations], RobotFunction 
[name=Losing his limbs]]], Robot [name=Mecha Godzilla, type=Giant monster, creator=Toho, 
awesomenessFactor=0.4, iq=82, functions=[RobotFunction [name=Flying through space], RobotFunction 
[name=Kicking Godzilla's ass]]], Robot [name=R2-D2, type=Astromech droid, creator=George Lucas, 
awesomenessFactor=0.8, iq=135, functions=[RobotFunction [name=Getting lost], RobotFunction 
[name=Having a secret jetpack], RobotFunction [name=Showing holographic messages]]], Robot 
[name=Andy, type=Messenger robot, creator=Stephen King, awesomenessFactor=0.8, iq=135, functions=
[RobotFunction [name=Relaying messages], RobotFunction [name=Betraying the ka-tet], RobotFunction 
[name=Many other functions]]]]

So within seconds, all five documents have been synced to two separate PouchDBs and one CouchDB. Not bad!

In addition to adapting the PouchDB API for Java, I also wrote a simple migration tool to mirror an existing SQLite database to a remote CouchDB. It could be useful, if you just want a read-only web site where users can view their Android data:

new CouchDroidMigrationTask.Builder(runtime, sqliteDatabase)
    .setUserId("fooUser")
    .setCouchdbUrl("http://user:password@foo.com:5984/db")
    .addSqliteTable("SomeTable", "uniqueId")
    .addSqliteTable("SomeOtherTable", "uniqueId")
    .setProgressListener(MainActivity.this)
    .build()
    .start();

This converts SQLite data like this:

sqlite&gt; .schema Monsters
CREATE TABLE Monsters (_id integer primary key autoincrement, 
  uniqueId text not null,
  nationalDexNumber integer not null,
  type1 text not null,
  type2 text,
  name text not null);
sqlite&gt; select * from Monsters limit 1
1|001|1|Grass|Poison|Bulbasaur

into CouchDB data like this:

{
   "_id": "fooUser~pokemon_11d1eaac.db~Monsters~001",
   "_rev": "1-bd52d48dba37ce490c38d455726296f0",
   "table": "Monsters",
   "user": "fooUser",
   "sqliteDB": "pokemon_11d1eaac.db",
   "appPackage": "com.nolanlawson.couchdroid.example1",
   "content": {
       "_id": 1,
       "uniqueId": "001",
       "nationalDexNumber": 1,
       "type1": "Grass",
       "type2": "Poison",
       "name": "Bulbasaur"
   }
}

Notice that the user is included as a field and as part of the _id, so you can easily set up per-user write privileges. For per-user read privileges, you still need to set up one database per user.

CouchDroid isn’t ready for a production release yet. But even in its rudimentary state, I think it’s pretty damn exciting. As Android developers, wouldn’t it be great if we didn’t have to write so many SQL queries, and we could just put and get our POJOs? And wouldn’t it be awesome if that data were periodically synced to the server, so we didn’t even have to think about intermittent availability or incremental sync or conflict resolution or any of that junk? And wouldn’t our lives be so much easier if the data was immediately available in a RESTful web service like CouchDB, so we didn’t even need to write any server code? The dream is big, but it’s worth pursuing.

For more details on the project, check it out on GitHub. The sample apps in the examples directory are a good place to start. Example #1 is the migration script above, Example #2 is some basic CRUD operations on the Pouch API, and Example #3 is the full bidirectional sync described above. More to come!

Follow

Get every new post delivered to your Inbox.

Join 828 other followers