CouchDB doesn’t want to be your database. It wants to be your web site.

15 Nov

CouchDB doesn’t want to be your database. It wants to be your web site.

Posted November 15, 2013 by Nolan Lawson in Webapps. Tagged: couch apps, couchdb, ultimate crossword. 7 Comments

I’d like to talk to you today about Couch apps. No, not CouchApps. No, not necessarily CouchApps either. The phrase has been bandied around a lot, so it’s worth explaining what I mean: I’m talking about webapps that exclusively use CouchDB for their backend, whether or not they’re actually hosted within CouchDB and regardless of how they’re built.

Yes, this is a thing people are actually trying to do, and no, it’s not crazy. The purpose of this article is to explain why.

First off, some background: CouchDB is a NoSQL database (or key-value store, as the cool kids say) written in Erlang. It is probably the origin of this joke. Nobody who uses CouchDB cares that it is written in Erlang, though, because the big selling point is that you can interact with it using Javascript, JSON, and plain ol’ HTTP. It is “a database for the web,” the first of its kind.

CouchDB: it’s a database, right?

When I first started using CouchDB, I tried to treat it like any other database. I looked for connectors based on the language I was using: Ektorp for Java, AnyEvent::CouchDB for Perl, Nano for Node. And I used the web interface (bewilderingly called “Futon”) as I would a query browser – neat for debugging, but not much else. The fact that it ran in a web browser just kinda seemed like a gimmick.

Recently, though, when I was working on a Node app that didn’t go anywhere but was a fun diversion, I came across this quote by Couch apostle J. Chris Anderson:

Because CouchDB is a web server, you can serve applications directly [to] the browser without any middle tier. When I’m feeling punchy, I like to call the traditional application server stack “extra code to make CouchDB uglier and slower.”

Suddenly, I realized what CouchDB was all about.

No wait, CouchDB is a miracle

See, here I was, using client-side Javascript to talk to Express to talk to Node to talk to Nano to talk to Couch, and at each step I was converting parameter names from underscores to camel case (or whatever my petty hangups are), all the while introducing bugs as I tried to make each layer fit nicely with the next one. And I had a working web server right in front of me! CouchDB! Why not just call it directly, you fool?! (I shout at myself in hindsight.)

I think the reason a lot of developers, like myself, might have missed this epiphany is that we’re used to treating databases as, well, databases. Whether it’s MongoDB or MySQL or Oracle, you gotta have your JDBC connector for Java and perhaps an ORM layer or maybe you just give up on Hibernate and write all the database objects yourself, so half of your code is getters and setters, but that’s OK, because that’s how we abstract the database.

You see, you can’t just have your peanut butter and jelly sandwich! You need an interface between the bread and the peanut butter, and an abstraction layer between the peanut butter and the jelly, and don’t even get me started on the jelly and the bread! What, you want your bread to get soggy?

As a programmer, I’m so used to treating databases as this other, alien thing that needs to be handled with latex gloves, separately from my application code, that reaching for the nearest library has become a reflex.

But you don’t need that with CouchDB. Because… it’s just HTTP. Any extra layers just give you another API to learn.

CouchDB is the web done right

And in fact, CouchDB is better than HTTP, because CouchDB actually fulfills the promise of what RESTful services were supposed to be, instead of the kludges we’ve come to expect. Look! DELETE actually deletes things! POST isn’t just what you use when you need to send more data than a GET allows! And HEAD and PUT are actually useful, instead of just being trivia to impress your friends at dinner parties — “Oh, did you know that there are actually more HTTP commands than just GET and POST?” “Oh, how fascinating!”

You see, once you set aside your preconceived notion of what a database is supposed to be, you can actually get rid of all your fancy connectors and just use a standard HTTP library. (I like Requests for Python.) You can even use the network debugger in a browser window to see how CouchDB does everything. It’s all just AJAX!

And then, if you make it this far down the rabbit hole, you might notice that CouchDB actually has a user authentication database, with password hashing. You might also notice that it’s even got roles and privileges and administrator controls. And that’s when you realize, with fascinated horror, the most insidious thing about CouchDB:

CouchDB doesn’t want to be your database; it wants to be your web site.

And finally, this is where we come back to the subject of Couch apps. A Couch app is just a pure HTML/CSS/Javascript application, with only CouchDB as its backend, and this is the intended use case for CouchDB.

Now, think about what this proposition means to you as a developer. The web is moving more and more towards rich, client-side applications — we’ve had jQuery for years, and now we even have MVC with platforms like Ember, Knockout, and AngularJS. If CouchDB does user authentication (it’s got a “signup” button right on the home page, for crying out loud), paging, indexing, full-text search, geo data, and it all speaks HTTP, well… what does that actually leave us to do on the server?

Take a long look in the mirror, and really ask yourself! And yes, for those of you who do machine learning and scientific computing and business intelligence, I can already see you raising your hands, but for the rest of us who get paid to write Twitter clones, the answer is: not much. Your average CRUD app can magically transform into a PGPD app (PUT, GET, POST, DELETE), you can throw it up on CouchDB with some nice HTML and CSS to style it, and be at your local brewpub by 3. Or maybe you could just send the default Futon interface to the client and tell them you wrote it.

“See, it’s a collaborative document editor, and the dude on the Couch is a lazy writer…”

Now, this is the dream. And CouchDB, as it stands in 2013, actually gets us pretty damn far toward that dream. The app I’m releasing this week, Ultimate Crossword, is a testament to that. It’s a pure Couch app that only cheats by using Solr for full-text search (because I was too lazy to learn the Lucene plugin). It’s got user accounts, data aggregation, and even continuous syncing between the client and server thanks to the wonderful PouchDB.

Building this site gave me a lot of insight into what’s possible with a Couch app. However, I also got a reality check about where CouchDB still falls short of achieving the dream. I’ve got four big complaints:

1) No per-document read privileges

This is a big one. CouchDB has three basic security modes:

Everyone can do everything.
Some people can write (some documents), everyone can read (all documents).
Some people can write (some documents), some people can read (all documents).

If you want to give users exclusive read access to certain documents, you have to create a separate database for each user. And unfortunately, CouchDB has no feature to do this automatically. So you need a process on the server with administrative privileges to do it, breaking the pure “Couch app” ideal. Then, if you want to aggregate the data, you actually need another process to sync to a separate database, and… well, it just gets messy. I’m strongly rooting for this feature to show up in a future CouchDB release.

2) No password recovery.

This is a feature that users have come to expect from modern web sites. And despite all its security flaws (in that it makes your email a single point of failure), it seems here to stay.

Now, CouchDB can store arbitrary data in the users table (like email addresses), and you can even do custom validation. But for the whole “give us your email, and we’ll send you a new password” thing, you’re on your own.

On the bright side, the passwords are all salted and PBKDF2-hashed, so no attacker has much to gain from cracking your Couch.

3) No database migration.

This is a big one for me, although I wonder if I’m the only one. Since my early days of Java development, I’ve appreciated having Liquibase so I could track my database schema changes in version control.

In theory, CouchDB should be ideal for something like this, since it versions everything, and even its views (aka indexes) are their own documents. But I haven’t found a good recipe for managing this yet. For the time being, I just keep a series of Python scripts that create the databases.

4) Views are not indexes, and documents are not tables.

One of the nice things about SQL databases as a development paradigm is the flexibility of the SQL language itself. Decided you wanna sort by dogsLastName instead of favoritePokemon? No problem, we’ll just add an index. Too much data getting sent across the wire? No big deal, we’ll just SELECT the fields we need, instead of SELECT(*).

In CouchDB, you can’t do a WHERE and you can’t just SELECT the fields you want. Any query that’s not simply fetching a whole document by its ID requires a view, and those are costly to create. I’ve worked with Couch databases containing millions of documents, and rebuilding a view would often take days. I’d have a coworker ask me to add a new filter criterion for a view, and on Friday I’d say, “Okay, it’ll be ready by Monday.” For the Ultimate Crossword app, I stupidly decided to use CouchDB to crunch the data itself, and I ended up needing five separate Couch servers running on solid state drives in order to process it in in a reasonable amount of time. (CouchDB is best thought of as a single-process application. It’s append-only, so it uses one process per database file.)

Also, the fact that you can’t SELECT arbitrary fields means you need to start thinking about how much data you want to send over the wire with each document, and how to threshold it. I found myself structuring my database into a summary/detail format early on, and modeling the documents very tightly to the user interface, in ways that just made me feel icky.

Database purists, of course, would say that this is where the latex gloves are supposed to come out. But I think that if CouchDB simply had a better system for managing migrations (see #3) and/or faster view creation, this would be a non-issue. I’d also love it if the output of a view could be put into its own database, so I could have endlessly kaleidoscoping views of my data. One more for the wishlist!

Conclusion

Despite these drawbacks, I still think CouchDB has a lot of potential to revolutionize the way people write webapps. I certainly still plan to use it for quick hacking (hell, the crossword app only took me ten days to write), and Couch’s append-only design means I’ll never have to worry about my data getting corrupted. (It’s been proudly touted as “the Honda Accord of databases.”)

But for all its developers’ humility, CouchDB is a really exciting technology. When you step back and look at it, it’s a daring, crazy proposition, a bold statement about how awesome web development would be if we could just let it be the web. It’s a raving streetside lunatic, grabbing random people by the shoulders and screaming at them with frantic urgency: “We don’t need the server anymore! We only need the database! The database is the server!”

In short, CouchDB is an expression of an ideal, a fantastical tale of science fiction told by wide-eyed dreamers. And if there’s one truth about wide-eyed dreamers, it’s this: with hindsight, their predictions either seem delusional, or inevitable.

(Psssst! Go check out my Ultimate Crossword app! It’ll make you feel bad about your user authentication!)

Update: I decided to remove the CouchDB user authentication from the Ultimate Crossword app (I realized it was irresponsible to let people collaboratively “solve” the puzzle), but it’s still a pure Couch app!

7 responses to this post.

Posted by Scotty on August 4, 2014 at 4:38 PM

Not really. It’s a horrible design for a web server.

One typo in the URL, and you get nothing but cryptic error messages. not even a decent 404 error, or a redirection.

Authentication with timeouts, and you get ‘Unauthorized DB’ messages.
It may ‘serve’ web pages as attachments fairly well… But it has absolutely NO error handling mechanism. You can’t redirect someone back to where they need to be if they make a mistake.

The keyword that real developers use is ‘robust’… it’s completely lacking in the ability to let you build something that is. Basically, it’s only good when it’s working.

There is no IDE that you can easily integrate with it. And any code has to be stripped of white-space in order to run, so you can forget pulling it back out to edit.

And I dare anyone to come up with a good authentication solution that isn’t incestuous (only authenticate to itself) forget Active Directory, LDAP, etc…

It’s only a toy. And one that is suffering from an identity crisis.

Reply
- Posted by Nolan Lawson on August 9, 2014 at 2:16 PM
  
  A lot of your complaints are fair. CouchDB provides English-language error messages, but of course they aren’t intended for users at all. However, I think the error codes are awesome, and they adhere to the spec really closely: 201 means something was created, 409 means conflict, 404 really means not found, etc.
  
  The authentication system does have some problems. I think even the Couch developers will admit as much – I seem to recall Jan Lenhardt mentioning he planned to move off of the _users database for Hoodie, and I know Dale Harvey has never been a fan, hence the Janus project. The biggest flaw I see is the lack of “I can only read and write my own data,” but apparently you can build CouchDB using Build CouchDB with the couchperuser plugin, so there’s always that.
  
  The identity crisis is actually a great thing, in my opinion. It means that everybody’s working on separate databases that are all forced to use the same original sync protocol if they want to hook into the existing ecosystem (Couchbase Sync Gateway, Cloudant, CouchDB, PouchDB, etc.) It’s almost like the browser wars, where the real winner will be the platform.
  
  I’m even more pro-Couch than when I wrote this article (natch, since I’m on PouchDB now). It may be true that ultimately your webapp will grow in complexity to the point where you’re forced to put a proxy server in front of CouchDB to smooth out its rougher edges, but I think you’ll still be amazed how far you can get with vanilla Couch.
  
  Reply
Posted by 55+ on September 21, 2014 at 12:23 AM

I am predicting that the ultimate winner will be PouchDB – the Java edition. Merged with ElasticSearch, it would provide ultimate in storage and searching capabilities.

Reply
Posted by David Boden on November 6, 2014 at 4:59 AM

I like the idea of CouchDB being the whole website. I like the model, in some circumstances, of one couchdb user per real world user. However, the whole topic of user authentication via Google, Facebook, Twitter etc. needs to be addressed head on rather than skirted around.

The CouchDB says that OAuth is supported, but this is a couch-user flow and isn’t what most users would be hoping for:
http://docs.couchdb.org/en/latest/api/server/authn.html#api-auth-oauth

Perhaps an external service could be written to support a HTML5 “couchapps” user registration flow, handling new user creations as well as obtaining oauth credentials. Until the folks at CouchDB state their objectives and preferred direction, developers will stick with 3-tier architecture where mature solutions already exist to authenticate users. If the CouchDB gurus expressed a desire to support an external server for user management flows, I might even write it! Just don’t want to waste my time doing something against the grain of the vision for CouchDB.

Reply
Posted by PouchDB & CouchDB: An interview with Nolan Lawson – CouchDB Blog on April 4, 2017 at 9:01 AM

[…] Sync, reliability, and simplicity. As J. Chris Anderson has said, CouchDB doesn’t aim to be the Ferrari of databases; it wants to be the Honda accord of databases. (See my old blog post on the subject) […]

Reply
Posted by Credo on June 1, 2017 at 3:00 AM

Are any of the flaws you mentioned solved by now?

Reply
Posted by Wesley on September 12, 2017 at 3:39 AM

Imagine this setup for Node/Express/Vue project :

desktop app (Electron) : writes to PouchDB (offline first) with a possibility to sync with remote db.
web app (online) : uses API server calls to write to CouchDB

So :

PouchDB is on client
CouchDB is on server

These two use the same data and I know I can sync (master-master db) them.

I’m wondering how I would manage the code to achieve this.

Is it possible to use the same controllers/models to interact with both databases?
Or would I have to write controllers/models on the server to interact with CouchDB and also write other controllers/models on the client to interact with PouchDB?
How would you structure a project like that ?

Thank you for clarifying this, as I can’t seem to find a clear answer for this and I’m a total noob.

Reply

Read the Tea Leaves Software and other dark arts, by Nolan Lawson