Posts Tagged ‘couch apps’

CouchDB doesn’t want to be your database. It wants to be your web site.

I’d like to talk to you today about Couch apps. No, not CouchApps. No, not necessarily CouchApps either. The phrase has been bandied around a lot, so it’s worth explaining what I mean: I’m talking about webapps that exclusively use CouchDB for their backend, whether or not they’re actually hosted within CouchDB and regardless of how they’re built.

Yes, this is a thing people are actually trying to do, and no, it’s not crazy. The purpose of this article is to explain why.

First off, some background: CouchDB is a NoSQL database (or key-value store, as the cool kids say) written in Erlang. It is probably the origin of this joke. Nobody who uses CouchDB cares that it is written in Erlang, though, because the big selling point is that you can interact with it using Javascript, JSON, and plain ol’ HTTP. It is “a database for the web,” the first of its kind.

CouchDB: it’s a database, right?

When I first started using CouchDB, I tried to treat it like any other database. I looked for connectors based on the language I was using: Ektorp for Java, AnyEvent::CouchDB for Perl, Nano for Node. And I used the web interface (bewilderingly called “Futon”) as I would a query browser – neat for debugging, but not much else. The fact that it ran in a web browser just kinda seemed like a gimmick.

Recently, though, when I was working on a Node app that didn’t go anywhere but was a fun diversion, I came across this quote by Couch apostle J. Chris Anderson:

Because CouchDB is a web server, you can serve applications directly [to] the browser without any middle tier. When I’m feeling punchy, I like to call the traditional application server stack “extra code to make CouchDB uglier and slower.”

Suddenly, I realized what CouchDB was all about.

No wait, CouchDB is a miracle

See, here I was, using client-side Javascript to talk to Express to talk to Node to talk to Nano to talk to Couch, and at each step I was converting parameter names from underscores to camel case (or whatever my petty hangups are), all the while introducing bugs as I tried to make each layer fit nicely with the next one. And I had a working web server right in front of me! CouchDB! Why not just call it directly, you fool?! (I shout at myself in hindsight.)

I think the reason a lot of developers, like myself, might have missed this epiphany is that we’re used to treating databases as, well, databases. Whether it’s MongoDB or MySQL or Oracle, you gotta have your JDBC connector for Java and perhaps an ORM layer or maybe you just give up on Hibernate and write all the database objects yourself, so half of your code is getters and setters, but that’s OK, because that’s how we abstract the database.

You see, you can’t just have your peanut butter and jelly sandwich! You need an interface between the bread and the peanut butter, and an abstraction layer between the peanut butter and the jelly, and don’t even get me started on the jelly and the bread! What, you want your bread to get soggy?

As a programmer, I’m so used to treating databases as this other, alien thing that needs to be handled with latex gloves, separately from my application code, that reaching for the nearest library has become a reflex.

But you don’t need that with CouchDB. Because… it’s just HTTP. Any extra layers just give you another API to learn.

CouchDB is the web done right

And in fact, CouchDB is better than HTTP, because CouchDB actually fulfills the promise of what RESTful services were supposed to be, instead of the kludges we’ve come to expect. Look! DELETE actually deletes things! POST isn’t just what you use when you need to send more data than a GET allows! And HEAD and PUT are actually useful, instead of just being trivia to impress your friends at dinner parties — “Oh, did you know that there are actually more HTTP commands than just GET and POST?” “Oh, how fascinating!”

You see, once you set aside your preconceived notion of what a database is supposed to be, you can actually get rid of all your fancy connectors and just use a standard HTTP library. (I like Requests for Python.) You can even use the network debugger in a browser window to see how CouchDB does everything. It’s all just AJAX!

And then, if you make it this far down the rabbit hole, you might notice that CouchDB actually has a user authentication database, with password hashing. You might also notice that it’s even got roles and privileges and administrator controls. And that’s when you realize, with fascinated horror, the most insidious thing about CouchDB:

CouchDB doesn’t want to be your database; it wants to be your web site.

And finally, this is where we come back to the subject of Couch apps. A Couch app is just a pure HTML/CSS/Javascript application, with only CouchDB as its backend, and this is the intended use case for CouchDB.

Now, think about what this proposition means to you as a developer. The web is moving more and more towards rich, client-side applications — we’ve had jQuery for years, and now we even have MVC with platforms like Ember, Knockout, and AngularJS. If CouchDB does user authentication (it’s got a “signup” button right on the home page, for crying out loud), paging, indexing, full-text search, geo data, and it all speaks HTTP, well… what does that actually leave us to do on the server?

Take a long look in the mirror, and really ask yourself! And yes, for those of you who do machine learning and scientific computing and business intelligence, I can already see you raising your hands, but for the rest of us who get paid to write Twitter clones, the answer is: not much. Your average CRUD app can magically transform into a PGPD app (PUT, GET, POST, DELETE), you can throw it up on CouchDB with some nice HTML and CSS to style it, and be at your local brewpub by 3. Or maybe you could just send the default Futon interface to the client and tell them you wrote it.

Futon interface in CouchDB

“See, it’s a collaborative document editor, and the dude on the Couch is a lazy writer…”

Now, this is the dream. And CouchDB, as it stands in 2013, actually gets us pretty damn far toward that dream. The app I’m releasing this week, Ultimate Crossword, is a testament to that. It’s a pure Couch app that only cheats by using Solr for full-text search (because I was too lazy to learn the Lucene plugin). It’s got user accounts, data aggregation, and even continuous syncing between the client and server thanks to the wonderful PouchDB.

Building this site gave me a lot of insight into what’s possible with a Couch app. However, I also got a reality check about where CouchDB still falls short of achieving the dream. I’ve got four big complaints:

1) No per-document read privileges

This is a big one. CouchDB has three basic security modes:

  • Everyone can do everything.
  • Some people can write (some documents), everyone can read (all documents).
  • Some people can write (some documents), some people can read (all documents).

If you want to give users exclusive read access to certain documents, you have to create a separate database for each user. And unfortunately, CouchDB has no feature to do this automatically. So you need a process on the server with administrative privileges to do it, breaking the pure “Couch app” ideal. Then, if you want to aggregate the data, you actually need another process to sync to a separate database, and… well, it just gets messy. I’m strongly rooting for this feature to show up in a future CouchDB release.

2) No password recovery.

This is a feature that users have come to expect from modern web sites. And despite all its security flaws (in that it makes your email a single point of failure), it seems here to stay.

Now, CouchDB can store arbitrary data in the users table (like email addresses), and you can even do custom validation. But for the whole “give us your email, and we’ll send you a new password” thing, you’re on your own.

On the bright side, the passwords are all salted and PBKDF2-hashed, so no attacker has much to gain from cracking your Couch.

3) No database migration.

This is a big one for me, although I wonder if I’m the only one. Since my early days of Java development, I’ve appreciated having Liquibase so I could track my database schema changes in version control.

In theory, CouchDB should be ideal for something like this, since it versions everything, and even its views (aka indexes) are their own documents. But I haven’t found a good recipe for managing this yet. For the time being, I just keep a series of Python scripts that create the databases.

4) Views are not indexes, and documents are not tables.

One of the nice things about SQL databases as a development paradigm is the flexibility of the SQL language itself. Decided you wanna sort by dogsLastName instead of favoritePokemon? No problem, we’ll just add an index. Too much data getting sent across the wire? No big deal, we’ll just SELECT the fields we need, instead of SELECT(*).

In CouchDB, you can’t do a WHERE and you can’t just SELECT the fields you want. Any query that’s not simply fetching a whole document by its ID requires a view, and those are costly to create. I’ve worked with Couch databases containing millions of documents, and rebuilding a view would often take days. I’d have a coworker ask me to add a new filter criterion for a view, and on Friday I’d say, “Okay, it’ll be ready by Monday.” For the Ultimate Crossword app, I stupidly decided to use CouchDB to crunch the data itself, and I ended up needing five separate Couch servers running on solid state drives in order to process it in in a reasonable amount of time. (CouchDB is best thought of as a single-process application. It’s append-only, so it uses one process per database file.)

Also, the fact that you can’t SELECT arbitrary fields means you need to start thinking about how much data you want to send over the wire with each document, and how to threshold it. I found myself structuring my database into a summary/detail format early on, and modeling the documents very tightly to the user interface, in ways that just made me feel icky.

Database purists, of course, would say that this is where the latex gloves are supposed to come out. But I think that if CouchDB simply had a better system for managing migrations (see #3) and/or faster view creation, this would be a non-issue. I’d also love it if the output of a view could be put into its own database, so I could have endlessly kaleidoscoping views of my data. One more for the wishlist!


Despite these drawbacks, I still think CouchDB has a lot of potential to revolutionize the way people write webapps. I certainly still plan to use it for quick hacking (hell, the crossword app only took me ten days to write), and Couch’s append-only design means I’ll never have to worry about my data getting corrupted. (It’s been proudly touted as “the Honda Accord of databases.”)

But for all its developers’ humility, CouchDB is a really exciting technology. When you step back and look at it, it’s a daring, crazy proposition, a bold statement about how awesome web development would be if we could just let it be the web. It’s a raving streetside lunatic, grabbing random people by the shoulders and screaming at them with frantic urgency: “We don’t need the server anymore! We only need the database! The database is the server!”

In short, CouchDB is an expression of an ideal, a fantastical tale of science fiction told by wide-eyed dreamers. And if there’s one truth about wide-eyed dreamers, it’s this: with hindsight, their predictions either seem delusional, or inevitable.

(Psssst! Go check out my Ultimate Crossword app! It’ll make you feel bad about your user authentication!)

Update: I decided to remove the CouchDB user authentication from the Ultimate Crossword app (I realized it was irresponsible to let people collaboratively “solve” the puzzle), but it’s still a pure Couch app!