Posts Tagged ‘grails’

Relatedness Calculator: Part Deux

Recently I added a ton of features and improvements to the Relatedness Calculator. Not only is the new app faster, lighter, prettier, and lower-cholesterol, but it’s also made me a savvier web developer in the process. How about that! I find myself actually learning a lot about web development, and slowly correcting my shameful n00b errors from the first version.

Version One

Now, I think the first version of the Relatedness Calculator was pretty cool. Neato, even. It solved the original problem that got me excited about this project in the first place, which was:

How closely related am I to my grandma’s cousin’s daughter?

— A dude on a message board

Answering this question in an automated way required writing a parser, defining a hell of a lot of English relationships, and writing an algorithm to calculate the relatedness coefficient for arbitrarily chained relationships (e.g. “X’s X’s X’s (…) X”).

Grandma's cousin's daughter

Nothing could be simpler.

As it turns out, there’s a lot you have to know about genealogy in order to write such a system. At the minimum, you have to read something like Richard Dawkins’ essay on genesmanship from The Selfish Gene. But then there are also a lot of non-obvious things you learn that go beyond genealogy 101.

For instance, I learned that there’s a whole class of relationships that can be expressed with “half” – it’s not just half-brothers and half-sisters. There are half-cousins, half-uncles, half-second cousins, and even half-great-nieces too. What they all have in common is having two common ancestors in the family tree, which get reduced to just one in the “half” versions. Neat, huh?

There’s also a sort of “relatedness addition” that goes into calculating the relatedness coefficients for chained relations. If, for instance, I was parsing “uncle’s daughter,” I discovered I could take the common ancestors for “uncle” and the common ancestors for “daughter,” then use what the math geeks call matrix addition to get the right set of common ancestors. And it worked! I have no idea if it’s justified mathematically, but it made my unit tests pass, so that’s all I care about. (Man, am I a computer programmer, or what?)

There are also some relations that don’t make sense when you chain them together – for instance, “cousin’s brother” or “father’s son.” Examples like these kept screwing up the relatedness addition, because in fact they were redundant ways of expressing a much simpler concept. I.e., “cousin’s brother” is really just a cousin, and “father’s son” is really just you, or possibly your brother. The app had to handle all these errors to avoid barfing up nonsensical results, such as arbitrarily reduced relatedness coefficients for “father’s son’s father’s son’s father’s son’s…” etc.

So essentially, version one was an academic exercise. I barely paid any attention to the UI, and instead just made sure that the core logic was bulletproof. But all that changed with the second version.

User Confusion

From looking at the logs, I could see that users were typing a lot of unexpected queries that completely broke the system:

  • step-cousin twice removed on my mother’s side
  • john’s uncle’s wife
  • identical twin
  • father’s 2nd wife’s son
  • 4th great-grandfather’s granddaughter

The worst part about a user typing something the system doesn’t understand is that they get a big fat error page. Most users will probably leave the site the first time something like that happens. After all, they went to all the trouble of typing a query out, and the system puked. Who would want to waste their time on a site like that?

Yes, my anonymous Internet application knows all about your friend John.

Obviously some of these queries are things I can fix – “identical twin” and “twice removed” are all features I added in version two. But other things don’t really make any sense in terms of calculating relatedness. Who is “John”? How am I supposed to know if you’re biologically related to his wife? And you’re not biologically related to your step-cousins, either!

I decided the biggest problem here was not that the system was throwing error messages, but rather that users didn’t know what kind of input the system expected. This is a classic autocompletion problem. People hate staring at a blank page. So starting with version two, I added a nice little autocomplete box.

Showing the space of possible queries goes a long way to help increase the user’s comprehension of the system. “Do I have to start with the word ‘my’?” “Do I have to put a space after the word ‘great’?” “Do I have to type ‘grandfather’ or can I just put ‘gramps’?” All these questions are answered in real time, as you type. It also makes playing around with the system rewarding and fun.

Memory usage

One of the biggest problems I ran into after adding autocompletion was OutOfMemoryErrors. I had used a simple trie object to suggest autocompletions, but this was really taxing the available memory on my Amazon EC2 Micro instance (613 MB).

So the first thing I did was add a 1G swap file on the OS. This had the effect of degrading the performance, while avoiding a complete application crash. I knew I still had some work to do.

I deduced that the new trie object was the culprit, given that the system didn’t crash until after the first autocomplete request was made. So using NetBeans to profile the memory usage, I slowly started slimming down the trie.

The original trie used a classic node-based data structure, similar to a linked list:

public class Trie<T> {
    private TrieNode root = new TrieNode();

    private class TrieNode {

        T value;
        Map<Character, TrieNode> next = Maps.newHashMap();

    }

A parameterized Trie<T> could hold any object in the T, and out of laziness I was storing the entire String that led to a leaf TrieNode. So first off I axed that, since the String itself could just be reconstructed from the breadcrumb trail of Characters anyway.

Next, I cut the memory usage of the Trie roughly in half by replacing all instances of HashMap<Character, TrieNode> with a custom SparseCharArray<TrieNode>. My SparseCharArray is basically just an Object array that has some of the behavior of a Map when I need it. I had already used a similar data structure for my Japanese Name Converter.

In the end, the new trie greatly improved the memory usage of the application, to the point where I didn’t even really need the swap space anymore. (But it’s nice to keep, just in case!)

Browser optimization

Having very little experience in building webapps, I didn’t have the faintest clue how to optimize one. Luckily there are a lot of smart folks who have already figured most of this stuff out, and who publish nice, condensed best-practice guides. Also, the debugging tools are incredibly slick these days, and between Firebug and Chrome’s Developer Tools, I had everything I needed to get under the hood and start tinkering around.

As it turned out, the biggest drain on the application’s performance was just resource management. Grails automates a lot of that, and it’s nice when it works, but when it doesn’t, you have to get your hands dirty. I eventually noticed I had a misbehaving plugin that was requesting Javascript resources that didn’t exist, so I was getting 302 (“moved temporarily”) redirects. On my teensy Micro instance running in Oregon, halfway across the globe from me, this was adding a nearly half-second roundtrip for each request. Once I fixed the links, though, the resources could be cached in the browser and therefore loaded in almost 0 time after the first request.

Another basic problem with redirects came from the root of the app itself. Linking to “/relatedness-calculator” instead of “/relatedness-calculator/” (i.e. without the slash), amazingly, also added about half a second to the request, which would occur anytime the user clicked the logo. So the addition of a single character saved me half a second of response time.

De-uglification

I don’t think there’s much that needs to be said here. Just take a look at the before and after pictures:

Before.

After.

Yep, I finally broke down and learned how to use Gimp. Bubbly, glossy Web 2.0 text for the win!

Another subtle visual change I made was to the family tree graph. To make the relationships in the graph clearer, I added emphasis to the nodes that were most relevant to the user’s query. For instance, in the case of “grandpa’s cousin,” the system draws a green border around “you,” “your grandpa,” and “your grandpa’s cousin.”

I don’t think I’ve ever met a single one of my grandpa’s cousins.

Conclusion

In the end, I’m pretty pleased with the changes I made to the Relatedness Calculator, and I’m proud of the end product. The only thing I find that I’m missing is the thrill I always got from Android development, from getting my code immediately into users’ hands and finding out whether an app was a winner or a dud. With Android development, users always seemed to quickly find my apps and give me feedback on them, even if I never did any marketing.

For instance, even my all-time least popular app, App Tracker, has 4,000 downloads, 50 ratings, 30 comments, and a handful of emails I’ve received from interested users. But with the Relatedness Calculator, it’s been out for three months and I haven’t heard a peep about it. It doesn’t seem to rank highly in Google’s search results, and according to the logs it’s only gotten about 250 query hits.

So I suppose the next step with the Relatedness Calculator is to figure out how to get people to use the damn thing. “Search engine optimization” and all that. My first instinct is to start emailing around to genealogy sites to see if any of them would be interested in linking to me, or maybe even running a standalone version on their own servers.

Alternatively, I could just make an Android version and see if my clout on the Play Store helps push it up in the search results. But I’m hesitant to do that, because this isn’t a product that makes a whole lot of sense as a mobile app. And plus, the mobile site works just fine.

In any case, I don’t plan on abandoning the Relatedness Calculator anytime soon. It’s a nice, simple project that gives me an opportunity to write some gnarly regexes while learning a thing or two about web development. And anyway, I’ve got an app that can figure out your relatedness to your identical twin’s children. How cool is that?

Introducing the Relatedness Calculator

I’m releasing a new open-source app today. It’s called the Relatedness Calculator. It’s a pretty simple app: what it does is take the name of a relative, written in plain English, and calculate the relatedness coefficient for that relative. It even accepts complex English descriptions, like “father’s half-brother’s granddaughter.” Plus, it draws a family tree, so that you can visualize the relationship. It’s pretty fun if you’re into genealogy!

Why did I write this app? Well, the inspiration actually came from a strange place: a message board. On this message board, a guy was wondering if anyone though it would be weird to date a distant relative of his. He said she was his grandma’s cousin’s daughter, and although that’s a pretty big distance in the family tree, he was worried about the social stigma if the two of them were to get married. Having a layman’s interest in genetics, I did the math for him and told him that he was only 1.5% related to her. And if he compared that to cousins (12.5%), which is kind of the borderline of acceptability for most cultures, he could see that it’s not really a big deal.

Doing the math for such a far-flung relation, though, was kind of tough. I had to go back and re-read some of Richard Dawkin’s excellent The Selfish Gene to brush up on the calculations. And then I wanted to make sure that my math was right, so I wrote some Java classes to help me out. Then I started writing unit tests. Then I started writing a parser, and… well, sometimes when you’re moderately autistic, as many programmers are, you start to get carried away. So I did, and that became a standalone Java library.

At this point I knew I wanted to make an app out of this project, because I thought it was neat enough to interest a lot of people (and not just those who want to date their relatives). I could have easily built an Android app out of it, but I’ve already written 8 Android apps, and anyway I didn’t think it was the right platform. Nobody’s going to download such a trivial app to their phone. Also, there didn’t seem to be any general-purpose relatedness calculators on the web, so I thought: why not make my own?

This is my first published webapp, so in many ways it’s kind of a “hello world” for me. I used Grails, because it seemed like a nice framework, and Groovy is close to Java. Astute Grails fans will notice that I didn’t even bother to change the default theming, but that’s typically what I do when I write apps. I don’t have a head for design, so even with my Android apps I’ve always just used the default theme. I prefer focusing on functionality over flash.

To deploy the app was a bit complicated. I hesitated between choosing Google App Engine and Amazon EC2, but in the end I decided to go with the latter, because as it turned out I needed full OS access in order to run Graphviz on the command line in a separate process, which I don’t believe is supported by GAE.

I registered for an Amazon EC2 Micro instance because it’s free for one year, so we’ll see how it holds up. Even though the app is pretty lightweight – no persistence, no heavy computation, most everything cached – I’ve already seen some significant slowdown. Since the server is free, though, I can’t really complain.

The biggest difference I find with developing webapps versus Android apps is just the barrier to entry. When it comes to costs, being an Android developer will only run you the $25 signup fee, and then after that Google takes care of the rest. For a webapp, though, I need to pay $10-15/year for the domain name, and then my micro EC2 instance (after the first free year) will cost at least $15/month, plus more if the app is popular. That’s pretty steep for hobby apps like mine, which don’t even bring in any money.

As for deployment, you also don’t need to know how to set up a LAMP stack on Android, or how to build a WAR file, or even how to use the command line. You just submit your app to Google using the web interface, and you’re done.

Plus, of course, the project structure itself is much simpler. Android apps are just Java and XML, whereas with a modern webapp, you need to make sure your Javascript plays nice with your CSS, and that your HTML plays nice with whatever’s generating it on the server side – Java, Groovy, Perl, Python, Ruby, etc. That’s a lot to take in at once.

When it comes to Android, I think an undergrad, with one or two intro Java courses under their belt, could probably build a decent app just by reading the Android tutorials. Building a webapp, on the other hand, requires a lot of additional specialized knowledge. Having come from a server-side background, and having had to figure this stuff out myself by trial and error, I can definitely say that web development is not for the faint of heart. It’s a global standard, and the W3C has tried to please everybody, so much of it feels like a Frankenstein’s monster of barely compatible compromises, rather than a holistic framework built from the ground up, like what Google did with Android.

And yet, web development is so… satisfying. I love that I can build this app, and now anybody with a browser can just point themselves to my site and make use of my code. It even works on mobile! Just play around with the CSS a bit, and you can turn your webapp into a perfectly functional, cross-platform mobile application. I suppose these observations seem pretty banal to folks who have been doing web development for forever, but for me, it’s still kind of amazing.

Anyway, the Relatedness Calculator is live now, so go check it out. And if you have improvements to make, fork it on GitHub!