Building an accessible emoji picker

In my previous blog post, I introduced emoji-picker-element, a custom element that acts as an emoji picker. In the post, I said that accessibility “shouldn’t be an afterthought,” and made it clear that I took accessibility seriously.

But I didn’t explain how I actually made the component accessible! In this post I’ll try to make up for that oversight.

Reduce motion

If you have motion-based animations in your web app, one of the easiest things you can do to improve accessibility is to disable them with the prefers-reduced-motion CSS media query. (Note that this applies to transform animations but not opacity animations, as it’s my understanding that opacity changes don’t cause nausea for those with vestibular disorders.)

There were two animations in emoji-picker-element that I had to consider: 1) a small “indicator” under the currently-selected tab button, which moves as you click tab buttons, and 2) the skin tone dropdown, which slides down when you click it.

 

For the tab indicator, the fix was fairly simple:

.indicator {
  will-change: opacity, transform;
  transition: opacity 0.1s linear, transform 0.25s ease-in-out;
}

@media (prefers-reduced-motion: reduce) {
  .indicator {
    will-change: opacity;
    transition: opacity 0.1s linear;
  }
}

Note that there is also an opacity transition on this element (which plays when search results appear or disappear), so there is a bit of unfortunate repetition here. But the core idea is to remove the transform animation.

For the skin tone dropdown, the fix was a bit more complicated. The reason is that I have a JavaScript transitionend event listener on the element:

element.addEventListener('transitionend', listener);

If I were to remove the transform animation completely, then this listener would never fire. So I borrowed a technique from the cssremedy project, which looks like this:

@media (prefers-reduced-motion: reduce) {
  .skintone-list {
    transition-duration: 0.001s;
  }
}

Based on my testing in Safari, Firefox, and Chrome, this effectively removes the animation while ensuring that transitionend still fires. (There are other tricks mentioned in that thread, but I found that this solution was sufficient for my use case.)

With these fixes in place, the potentially-nauseating animations are removed for those who prefer things that way. The easiest way to test this is in the Chrome DevTools “Rendering” tab, under “Emulate CSS media feature prefers-reduced-motion”:

Screenshot of Chrome DevTools "prefers reduced motion" Rendering setting

Here it is with motion reduced:

 

As you can see, the elements no longer move around. Instead, they instantly pop in or out of place.

Screen reader and keyboard accessibility

When testing screen reader accessibility, I use four main tools:

  1. NVDA in Firefox on Windows (with SpeechViewer enabled so I can see the text)
  2. VoiceOver in Safari on macOS
  3. Chrome’s Accessibility panel in DevTools (Firefox also has nice accessibility tools! I use them occasionally for a second opinion.)
  4. The axe extension (also available in Lighthouse)

I like testing in actual screen readers, because they sometimes have bugs or differing behavior (just like browsers!). Testing in both NVDA and VoiceOver gives me more confidence that I didn’t mess anything up.

In the following sections, I’ll go over the basic semantic patterns I used in emoji-picker-element, and how those work for screen reader or keyboard users.

Tab buttons and tab panel

For the main emoji picker UI, I decided to use the tab panel pattern with manual activation. This means that each emoji category (“Smileys and emoticons,” “People and body”) is a tab button (role=tab), and the list of emoji underneath is a tab panel (role=tabpanel).

Screenshot showing the emoji categories as tabs and the main grid of emoji as the tab panel

The only thing I had to do to make this pattern work was to add and keydown listeners to move focus left and right between tabs. (Technically I should also add Home and End – I have that as a todo!)

Aside from that, clearly each tab button should be a <button> (so that Enter and Spacebar fire correctly), and it should have an aria-label for assistive text. I also went ahead and added titles that echo the content of aria-label, but I’m considering replacing those since they aren’t accessible to keyboard users. (I.e. title appears when you hover with a mouse, but not when you focus with the keyboard. Plus it sometimes adds extra spoken text in NVDA, which is less than ideal.)

Skin tone dropdown

The skin tone dropdown is modeled on the collapsible dropdown listbox pattern. In other words, it’s basically a fancy <select> element.

Annotated screenshot showing the collapsed skin tone button as a button, and the expanded list of skin tones as a listbox

The button that triggers the dropdown is just a regular <button>, whereas the listbox has role=listbox and its child <button>s have role=option. To implement this, I just used the DevTools to analyze the W3C demo linked above, then tested in both NVDA and VoiceOver to ensure that my implementation had the same behavior as the W3C example.

One pro tip: since the listbox disappears on the blur event, which would happen when you click inside the DevTools itself, you can use the DevTools to remove the blur event and make it easier to inspect the listbox in its “expanded” state.

Screenshot of Chrome DevTools on the W3C collapsible listbox example, showing an arrow pointing at the "remove" button in DevTools next to the "blur" listener in the Event Listeners section

Removing this blur listener may make debugging a bit easier.

Search input and search results

For the search input, I decided to do something a bit clever (which may or may not burn me later!).

By default, the emoji in the tabpanel are simple <button>s aligned in a CSS Grid. They’re given role=menuitem and placed inside of a role=menu container, modeled after the menu pattern.

However, when you start typing in the search input, the tabpanel emoji are instantly replaced with the search results emoji:

 

Visually, this is pretty straightforward, and it aligns with the behavior of most other emoji pickers. I also found it to be good for performance, because I can have one single Svelte #each expression, and let Svelte handle the list updates (as opposed to clearing and re-creating the entire list whenever something changes).

If I had done nothing else, this would have been okay for accessibility. The user can type some text, and then change focus to the search results to see what the results are. But that sounded kind of awkward, so I wanted to do one better.

So instead, I implemented the combobox with listbox popup pattern. When you start typing into the input (which is <input type=search> with role=combobox), the menu with menuitems immediately transforms into a listbox with options instead. Then, using aria-activedescendant and aria-controls, I link the listbox with the combobox, allowing the user to press or down to cycle through the search results. (I also used an aria-describedby to explain this behavior.)

 

In the above video, you can see NVDA reading out the expanded/collapsed state of the combobox, as well as the currently-selected emoji as I cycle through the list by pressing and . (My thanks to Assistiv Labs for providing a free account for OSS testing! This saved me a trip to go boot up my Windows machine.)

So here’s the upshot: from the perspective of a screen reader user, the search input works exactly like a standard combobox with a dropdown! They don’t have to know that the menu/menuitem elements are being replaced at all, or that they’re aligned in a grid.

Now, this isn’t a perfect pattern: for a sighted user, they might find it more intuitive to press and to move horizontally through the grid of emoji (and and to move vertically). However, I found it would be tricky to properly handle the / keys, as the search input itself allows you to move the cursor left and right when it has focus. Whereas the and are unambiguous in this situation, so they’re safe to use.

Plus, this is really a progressive enhancement – mouse or touch users don’t have to know that these keyboard shortcuts exist. So I’m happy with this pattern for now.

Testing

To test this project, I decided to use Jest with testing-library. One of my favorite things about testing-library is that it forces you to put accessibility front-and-center. By design, it makes it difficult to use simple CSS selectors, and encourages you to query by ARIA roles instead.

This made it a bit harder to debug, since I had to inspect the implicit accessibility tree of my component rather than the explicit DOM structure. But it helped keep me honest, and ensure that I was adding the proper ARIA roles during development.

If I had one criticism of this approach, I would say that I find it inherently harder to test using Jest and JSDom rather than a real browser. Rather than having access to the browser DevTools, I had to set debugger statements and use node --inspect-brk to walk through the code in the Chrome Node inspector.

It also wasn’t always clear when one of my ARIA role queries wasn’t working properly. Perhaps when the Accessibility Object Model gains wider adoption, it will become as easy to test the accessibility tree as it is to test CSS selectors.

Conclusion

To get accessibility right, it helps to consider it from the get-go. Like performance, it can be easy to paint yourself into a corner if you quickly build a solution based only on the visuals, without testing the semantics as well.

While building emoji-picker-element, I found that I often had to make tweaks to improve performance (e.g. using one big Svelte #each expression) or accessibility (e.g. changing the aria-hidden and tabindex attributes on elements that are visually hidden but not display: none). I also had to think hard about how to merge the menu component with the listbox to allow the search to work properly.

I don’t consider myself an accessibility expert, so I’m happy to hear from others who have ideas for improving accessibility. For instance, I don’t think the favorites bar is easily reachable to screen reader or keyboard users, and I’m still noodling on how I might improve that. Please feel free to open a GitHub issue if you have any ideas for what I can do better!

Introducing emoji-picker-element: a memory-efficient emoji picker for the web

Screenshot of emoji-picker-element, an emoji picker, in light and dark mode, with grids of emoji faces

Emoji pickers are ubiquitous. It seems that every social media and messaging app needs to have a little grid of cartoon faces you can click on.

There’s nothing inherently wrong with emoji (they’re fun! they’re popular! they make communication livelier!). But the way they’re currently used on the web is wasteful.

The emoji picker landscape

The main problem is that there are a lot of emoji: 1,814 as of the latest version, Emoji v.13.0. That doesn’t even include the skin tone variants (or combinations of skin tones), and each of those emoji has associated shortcodes, tags, ASCII emoticons, etc. And the way most web-based emoji pickers work, all of this data is stored in-memory.

A popular JavaScript library to manage emoji data, emojibase, currently offers two JSON files for English: the main one, which is 854kB, and the “compact” one, which is 543kB. Now imagine that, for every browser tab you have open, each website is potentially loading over a half a megabyte of data, just to show a little grid of emoji!

The median web page is now around 2MB, according to the Internet Archive. Hopefully any emoji picker data is lazy-loaded, but either way, half a megabyte is a big chunk in any reasonable perf budget.

You could say that these websites should ditch the custom picker and just ask people to use the built-in emoji picker for their OS. The problem is that a lot of people don’t know that these exist. (How many Mac users know about Cmd+Ctrl+Space? How many Windows users have memorized Win+.?) Some OSes don’t even have a native emoji picker (try Ubuntu or stock Android for an example). Even ignoring these problems, websites often need to tweak the emoji picker, such as adding their own custom emoji (e.g. Discord, Slack, Mastodon).

Screenshot of the built-in emoji pickers on macOS and Windows

The built-in emoji pickers on macOS and Windows. Ask a non-techie if they’ve ever even seen these.

Shouldn’t browser vendors offer a standard emoji picker element? Something like <input type="emoji">? I’ve actually proposed this in the past, but I’m not aware of any browser vendors that were interested in picking it up.

Also, standardizing an emoji picker would probably require coordination between the JavaScript (TC39) and web (W3C/WHATWG) standards, as you’d ideally want a JavaScript-based API for querying Intl-specific emoji data (e.g. to show autocompletions) in addition to the actual picker element. The degree of collaboration between the browser vendors and the standards bodies would have to be fairly involved, so it seems unlikely to happen soon.

Starting from scratch

When I first wrote Pinafore, I didn’t want to deal with this situation at all. I found the whole idea of custom emoji pickers to be absurd – just use the one built into the OS!

When I realized how unreliable the OS-provided emoji pickers were, though, and after considering the need for Mastodon custom emoji, I eventually decided to use emoji-mart, a React component that Mastodon also uses. For various reasons, though, I grew frustrated with emoji-mart (even as I contributed to it), and I mused about how I might build my own.

What would the ideal web-based emoji picker be like? I settled on a few requirements:

  1. Data should be stored in IndexedDB, not in-memory. The Unicode Consortium is never going to stop adding emoji, so at some point keeping all the emoji and their metadata in-memory is going to become unsustainable (or at least, unwieldy).
  2. It should be a custom element. Web components are a thing; using an emoji picker should be as simple as dropping <emoji-picker></emoji-picker> into your HTML.
  3. It should be lightweight. If every website is going to use their own emoji picker, then it should at least have a low JavaScript footprint.
  4. It should be accessible. Accessibility shouldn’t be an afterthought; the emoji picker should work well for screen reader users, keyboard users – everyone.

So against my better judgment, I embarked on the arduous task: building a full emoji picker, the way I thought it should be built. Today it’s available on npm, and you can try out a demo version here. I call it (somewhat presumptuously): emoji-picker-element.

Design

emoji-picker-element follows the vision I set out above. Usage can be as simple as:

<emoji-picker></emoji-picker>

<script type=module>
  import 'https://unpkg.com/emoji-picker-element'
</script>

Under the hood, emoji-picker-element will download the emojibase data, parse it, and store it in IndexedDB. (By default, it fetches from jsdelivr.) The second time the picker loads, it will just do a HEAD request and check the ETag to see if anything has changed.

This is a bonus of using IndexedDB: we can avoid downloading, parsing, and processing the emoji data a second time, because it’s already available on-disk! Following offline-first principles, emoji-picker-element also lazily updates in the background if the data has changed.

 

emoji-picker-element shows only native emoji (no spritesheets), in the categories defined by emojibase. If any emoji are not supported by the OS or browser, then those will be hidden.

Like most emoji pickers, you can also do a text search, set a skin tone, and see a list of frequently-used emoji. I also added support for custom emoji with custom categories, which is an important feature for Mastodon admins who want to add some pizzazz and personality to their instance.

To keep the bundle size small and the runtime performance fast, I’m using Svelte 3 to implement the picker. The tree-shaken Svelte “runtime” is bundled with the component, so consumers don’t have to care what framework I’m using – it “just works.”

Also, I’m using Shadow DOM, which keeps styles nicely encapsulated, while also offering a neat styling API using CSS variables:

emoji-picker {
  --num-columns: 6;
  --emoji-size: 14px;
  --border-color: black;
}

(If you’re wondering why this works, it’s because CSS variables pierce the shadow DOM.)

Evaluating

So how well did I do? The most important consideration in my mind was performance, so here’s how emoji-picker-element stacks up.

Memory usage

This was the most interesting one to me. Was my hypothesis correct, that storing the emoji data in IndexedDB would reduce the memory footprint?

Using the new Chrome measureMemory() API, I compared four pages:

  • A “blank” HTML page
  • The same page with emoji-picker-element loaded
  • The same page with the emojibase “compact” JSON object loaded
  • The same page with the emojibase full JSON object loaded

Here are the results:

Scenario Bytes Relative to blank page
blank 635 kB 0 B
picker 1.38 MB 744 kB
compact 1.43 MB 792 kB
full 1.77 MB 1.13 MB

As you can see, emoji-picker-element takes up less memory than merely loading the “compact” JSON itself! That’s all the HTML, CSS, and JavaScript needed to render the component, and it’s already smaller than the emoji data alone.

Note that the size of a JSON file (in this case, 854kB for the full data and 543kB for the compact data) is not the same as the memory usage when it’s parsed into a JavaScript object. That’s why it’s important to actually parse the JSON to get the true memory usage.

Disk usage

Given that emoji-picker-element moves data out of memory and into IndexedDB, you might also wonder how much space it takes up on disk. There are three major IndexedDB implementations in browsers (Chrome/Chromium, Firefox/Gecko, and Safari/WebKit), so I wrote a script to calculate the IndexedDB disk usage. Here it is:

Browser Disk usage
Chrome 896kB
Firefox 1.26MB
GNOME Web (WebKit) 1.77MB

(Note that these values may be a bit inconsistent – it seems that all browsers do some amount of compacting over time. These are the lowest I saw.)

These numbers are not small, but they make sense given the size of the emoji data, and the fact that the database contains an index for text search. I find it to be a reasonable tradeoff given the memory savings.

Runtime performance

To calculate the load time, I measure from the beginning of the custom element’s constructor to the requestPostAnimationFrame (polyfilled) after rendering the first set of emoji and favorites bar.

On a Nexus 5 (released in 2013) running Chrome on Android 6, median of 5 iterations, I got the following numbers:

Type Duration
First load (uncached) 2982ms
Subsequent load (cached) 259ms

(This was running on a local network – I’m focusing on CPU performance more than network performance.)

Looking at a Chrome trace of first load shows that the Nexus 5 spends about 200ms rendering the initial view (i.e. the “loading” state), 200ms downloading the data on the local intranet, 400ms processing it, 650ms inserting it into IndexedDB, roughly another second of IndexedDB transaction costs, then 250ms rendering the final result:

Annotated screenshot of a Chrome timeline trace showing the durations described above

In the above screenshot, the total render took 2.76s.

On second load, most of those IndexedDB costs are gone, and we merely fetch from IDB and render:

Another Chrome timeline screenshot, this one showing a much shorter duration (~430ms)

In this case, the total render took 434ms.

Interestingly, because it’s IndexedDB, we can actually move the initial data loading to a web worker, which frees up the main thread. In that case, the initial load looks like this:

Screenshot of Chrome timeline showing a similar trace to the "first load", but now the IndexedDB costs are on the worker thread and the main thread is largely idle

The total time in this case was about 2.5s. Clearly, we’ve just shifted the costs from one thread to another (we haven’t made anything faster), but it’s pretty neat that we can avoid jank this way! In the end, only about 300ms of work happens on the main thread.

This points to a nice application-level optimization: assuming the emoji picker is lazy-loaded, the app can proactively spin up a worker thread to populate the database. This would ensure that the emoji data is already ready when the user first opens the picker.

Bundle size

Currently the bundle size of emoji-picker-element is 39.66kB minified, and 12.3kB minified+Brotli. Those are both a bit too large for my taste, but they do compare favorably with other emoji pickers on npm: the smallest one I could find with a similar feature set is interweave-emoji-picker, which is 40.7kB minified.

Of course, the other emoji pickers don’t also include the entire runtime for their framework. The reason I’m bundling Svelte with the component is 1) Svelte’s runtime is quite small already, 2) it’s also tree-shaken to only include what you need, and 3) I’m assuming that most folks out there aren’t using Svelte, since it’s still an up-and-coming framework.

That said, I do have a separate build for those who are already using Svelte 3, and in that case the bundle size is about 11kB smaller (minified). Either way, emoji-picker-element should ideally be lazy-loaded!

Pitfalls and future work

Not everything went swimmingly well with this project, so I have some ideas for where it can improve in the future.

The woes of native emoji

First off, using native emoji is easier said than done. No OS has Emoji v13 yet, and unless you’re on an Apple device, it’s unlikely that you even have Emoji v12. (Only 19% of Android devices are running Android 10+, which has Emoji v12, whereas 75% of iOS devices are running iOS 13.2+, which has Emoji v12.1.) Or your OS may have incomplete emoji (Microsoft’s choice to use two-letter codes for flags is… odd at best). If you’re using Chrome on Ubuntu, you have no native color emoji at all, unless you install a separate package.

GitLab has a great post detailing all the headaches of supporting native emoji. It involves rendering emoji to canvas and checking the rendered color, as well as handling edge cases such as ligatures that may appear as “double emoji” when rendered improperly. (E.g. “person with red hair” may appear as a person with a floating wig of red hair next to them.) In short, it’s a mess.

"Person" emoji on the left and "red hair" emoji on the right

“Person with red hair” when rendered incorrectly.

However, my goal with this project is to skate to where the puck is going, not where it is. My hope is that browsers and OSes will get their acts together, and start to broadly support native color emoji without hacks or workarounds. (If they don’t, more websites than just emoji-picker-element will appear broken! So they have every incentive to do this.)

Plus, I don’t have to jump through as many hoops as GitLab, because I’m not concerned about supporting every emoji under the sun. (If I were, I would just use a spritesheet!) So my current strategy is a sniff test – check a representative emoji from each emoji version, and hide the entire set if one emoji is unsupported. This isn’t perfect, but I believe it’s good enough for most use cases. And it should improve over time as native emoji support improves.

Shadow DOM

This could be a blog post in and of itself, but shadow DOM is a double-edged sword. On the one hand, I love how easy it is to encapsulate CSS, and to offer a styling API with CSS variables. On the other hand, I found that any libraries that manage focus – e.g. focus-visible, a11y-dialog, or my own arrow-key-navigation – had to be tweaked to account for shadow DOM. Otherwise, the focus behavior didn’t work properly.

I won’t go into the nitty-gritty details (see my Mastodon thread for that). Long story short: none of these libraries worked with shadow DOM, except for focus-visible, which required some manual work. So for a11y-dialog I forked it, and for arrow-key-navigation I had to implement shadow DOM support. Kind of annoying, when all I really wanted was style encapsulation!

But again: I’m trying to skate to where the puck is going. My hope is that eventually browsers will iron out problems with the shadow DOM API, and make it easier for libraries to implement focus traps, custom focus hotkeys, etc.

Conclusion

Building emoji-picker-element was not easy. Properly rendering native emoji required arcane knowledge of emoji support on various platforms and browsers, as well as how to detect it. Add in skin tone variants (which may or may not be supported, even if the base emoji is supported), and it can get very complicated very fast.

I’m deeply indebted to the work of those who came before me: emojibase for providing the emoji data, Emojipedia for being an inexhaustible resource on the subject, and if-emoji and GitLab for demystifying how to detect emoji support.

Of course, I’m also grateful for all the existing emoji pickers that I drew inspiration from – it’s nice not to have to reinvent the wheel! Most emoji pickers have converged on a strikingly similar design, so I didn’t feel the need to be particularly creative with my own design. (Although I do think some of my CSS animations are nice.)

I still believe that the emoji picker is probably something browsers should handle themselves, to avoid the inevitable bloat of every web page loading their own picker. But for the time being, there should at least be a web-based emoji picker that’s as lightweight and performant as possible. emoji-picker-element is my best attempt at that vision.

You can discuss this post on the fediverse and on Lobste.rs. Here is the code and the demo.

Linux on the desktop as a web developer

I’ve been using Ubuntu as my primary home desktop OS for the past year and a half, so I thought it would be a good time to write up my experiences. Hopefully this will be interesting to other web developers who are currently using Mac or Windows and may be Linux-curious.

Photo of a Dell laptop with an Ubuntu background and a monitor, keyboard, and mouse

My basic setup. Dell XPS 13, Kensington trackball mouse (yes I’m a weirdo who likes trackballs), Apple magic keyboard (I still prefer the feel), and a BenQ monitor (because I play some games where display lag matters)

Note: in this post, I’m mostly going to be talking about Ubuntu. I’ve played with other Linux distros, but I stick with Ubuntu because if I have a problem, I can Google it and find an answer 99.9% of the time.

Some history

I first switched to Linux in 2007, when I was at university. At the time I perceived it to be a huge step-up over Windows Vista (so much faster! and better for programmers!), but it also came with plenty of headaches:

  • WiFi didn’t work out of the box. I had to use ndiswrapper to wrap Windows drivers.
  • Multi-monitor and presentations were terrible. Every time I used xrandr I knew I would suffer.
  • Poor support for a lot of consumer applications. I recall running Netflix on Firefox in Wine because this was the only way to get it to work.

Around 2012 I switched to Mac – mostly because I noticed that every web developer giving a conference talk was using one. Then I became a dual Windows/Mac user when I joined Microsoft in 2016, and I didn’t consider Linux again until after I left Microsoft in 2018.

I’m happy to say that none of my old Linux headaches exist anymore in 2020. On my Dell XPS 13 (which comes with Ubuntu preinstalled), WiFi and multi-monitor work out-of-the-box. And since it seems everything is either an Electron app or a website these days, it’s rare to find a consumer app that doesn’t support Linux. (At least, the ones I care about; I’m sure you can find a counter-example!) The biggest gripe I have nowadays is with fonts, which is a far cry from fiddling with WiFi drivers.

OK so enough history, let’s talk about the good and the bad about Linux in 2020, from a web developer’s perspective.

The command line

I tend to live and breathe on the command line, and for me the command line on Linux is second-to-none.

The main reason should be clear: if you’re writing code that’s going to run on a server somewhere, that server is probably going to run Linux. Even if you’re not doing much sysadmin stuff, you’re probably using Linux to run your test and CI infrastructure. So eventually your code is going to have to run on Linux.

Using Linux as your desktop machine just makes things that much simpler. All my servers run Ubuntu, as do my Travis CI tests, as does my desktop. I know that my shell scripts will run exactly the same on all these environments. If you’ve ever run into headaches with subtle differences between the Mac and Linux shell (e.g. incompatible versions of grep, tar, and sed with slightly different flags, so you have to brew install coreutils and use ggrep and gtar… ugh), then you know what I’m talking about.

If you’re a Mac user, the hardest part about switching to the Linux terminal is probably just going to be the iTerm keyboard shortcuts you’ve memorized to open tabs and panes. I found the simplest solution was to use tmux instead. As an added bonus, tmux also runs on Mac, so if you learn the keyboard shortcuts once, you can use them everywhere. I set my terminal to automatically open tmux on startup.

Screenshot of tmux running in Ubuntu with a few panes open

Ah, the command line on Linux. Feels like home.

Since the command line is the main selling point of Linux (IMO), it’s tempting to just use Windows with the Windows Subsystem for Linux instead. This is definitely a viable option, and totally reasonable, especially if there’s that one Windows program you really need (or you’re a PC gamer). For me, though, I don’t do much PC gaming, and my experience with WSL is that although compatibility was excellent, the performance was poor. npm install would take orders of magnitude more time on WSL compared to the equivalent Mac or Linux machine. (Keep in mind I was using WSL back in 2016-2018 though, and I’m told it’s improved since then.)

Still, for me, I just don’t find Windows to my taste. The UI has always felt slow and clunky to me, which may just be my perception, although when I read blog posts like this one from Bruce Dawson I feel a bit vindicated. (Right-clicking taskbar icons is slow! Why?) In any case, Ubuntu starts up fast, the system updates are quick and unobtrusive, and it’s not missing any must-have apps for me. So I run 100% Ubuntu, no dual-boot even.

Web development

For the average web developer, most of the stuff you need is going to work just fine on Linux. You can run Chrome and VS Code (or WebStorm, my preference), and all your command-line utilities like node and npm will work the same. You can use nvm to manage Node versions. So far, so good.

As a web developer, the biggest issue I’ve run into is not having a quick way to test all three major browser engines – Blink (Chrome), Gecko (Firefox), and WebKit (Safari). Especially now that Edge has gone Chromium and the Trident/EdgeHTML lineage is slowly dying out, it’s really attractive that, with a Mac, you can test all three major browser engines without having to switch to another machine or use a tool like BrowserStack.

On Linux of course we have Chrome and Firefox, and those run mostly the same as they do on a Mac, so they fit the bill just fine. For WebKit we even have GNOME Web (aka Epiphany Browser), but I only consider it “okay” as a stand-in for Safari. It doesn’t support some of the Safari-specific APIs (e.g. backdrop filter, Apple Pay, etc.), and it’s terribly slow, but it’s good for a quick gut-check to see if some bit of code will run well on Safari or not.

Screenshot of Gnome Web on HTML5Test showing a score of 432

GNOME Web on HTML5Test. Safari it is not.

Unfortunately for me, I don’t consider that “good enough,” especially since the vast majority of Safari users are on iOS, so that’s the platform you really want to test. And here is where Linux runs into its biggest drawback from a web developer’s perspective: debugging iOS Safari.

If you want to debug Chrome or Firefox on Android – no problem. adb runs just fine on Linux, you can run chrome:inspect on Chrome or Remote Debugging on Firefox, and it all works great. For iOS Safari, though, the best option you have is remotedebug-ios-webkit-adapter, which uses ios-webkit-debug-proxy under the hood.

Essentially this is an elaborate suite of tools that makes iOS Safari kinda-sorta look like Chrome, so that you can use the Chrome DevTools to debug it. The most amazing thing about it is… it actually works! As long as you can get the odd dependencies running correctly, you’ll have your familiar Chrome DevTools attached to an iOS device.

Screenshot of debugging example.com in iOS Safari in Chrome DevTools on Linux, showing some iOS-specific APIs in the console

Believe it or not, you can actually debug iOS Safari from Linux.

If you have a spare iPhone or iPod Touch laying around, this is not a bad option. But it’s still a far cry from the streamlined experience on a Mac, where you can quickly run an iOS Simulator with any iOS version of your choice, and where Safari debugging works out-of-the-box.

For accessibility testing, it’s a similar story. Of course Firefox and Chrome have accessibility tools, but they’re no substitute for VoiceOver on Mac or NVDA on Windows. Linux does have the Orca screen reader, but I don’t see much point in testing it, since it’s not representative of actual screen reader usage. Especially given that screen readers may have bugs or quirks, I prefer testing the real deal. So I keep a Mac Mini and cheap Windows desktop around for this reason.

Conclusion

So in short, using Linux as your desktop environment if you’re a web developer is pretty great. You probably won’t miss much, as soon as you rewire your brain to get the keyboard shortcuts right.

I find that the main things I miss these days are some of Apple’s best built-in apps, such as Preview or Garage Band. I love Preview for taking a quick screenshot and drawing arrows and boxes on it (something I do surprisingly often), and I haven’t found any good substitutes on Linux. (I use Pinta, which is okay.) Other apps like ImageOptim also have no Linux version.

So if you depend on some Mac-only apps, or if you need best-in-class Safari and iOS debugging, then I wouldn’t recommend Linux over Mac. If your main focus is accessibility, it also might not be sufficient for you (although something like Assistiv Labs may change this calculus). But for everything else, it’s a great desktop OS for web development.

Thanks to Ben Frain for asking about my Linux experiences and inspiring this blog post.

Customizing fonts in Firefox on Linux

I recently updated my desktop machine from Ubuntu 18.04 to 20.04. (I tend to be skittish and stick to the LTS releases.) Everything went great, except that when I opened some of my go-to websites (such as GitHub), the fonts just looked… off.

Screenshot of GitHub rendered in Nimbus Sans

After some research, I learned a few things:

  1. Firefox relies on an OS-level utility called fontconfig to choose fonts.
  2. On Ubuntu 20.04, fontconfig has a default config file that automatically replaces proprietary fonts like Helvetica with free fonts like Nimbus Sans.
  3. If a web page’s CSS has Helvetica in the font-family property, Firefox will ask the OS, “Hey, do you have Helvetica?” And the OS will say, “Yep!” and serve up Nimbus Sans.

Nimbus Sans is not an attractive font to my eyes. I would have much preferred an alternative like DejaVu Sans or Liberation Sans (one of which, I’m pretty sure, is what Ubuntu 18.04 was using). So I did some digging.

Surprisingly, I couldn’t find a simple answer to “how do I tell Firefox which fonts to use on Linux?”. The fontconfig man pages are too dense to be much help. But after stumbling upon this helpful blog post, I eventually found a solution.

Create a fontconfig file

First, I created a file at ~/.config/fontconfig/fonts.conf (you may have to create the directory). Here’s mine:

<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">
<fontconfig>
  <alias>
    <family>Helvetica</family>
    <prefer>
      <family>Liberation Sans</family>
    </prefer>
  </alias>
</fontconfig>

Basically, this file says, “If you’re trying to render Helvetica, prefer Liberation Sans instead.”

Create a test HTML page

Next, I created a small HTML page that just renders various CSS font-familys, like this:

<h2 style="font-family: Helvetica">Helvetica</h2>
<h2 style="font-family: system-ui">system-ui</h2>
<h2 style="font-family: sans-serif">sans-serif</h2>
<h2 style="font-family: Arial">Arial</h2>

If you’re curious, by default Ubuntu 20.04 will give Nimbus Sans for Helvetica, DejaVu Serif for system-ui, DejaVu Sans for sans-serif, and Liberation Sans for Arial (on Firefox 75).

Then my debug cycle became:

  1. Edit the fonts.conf file.
  2. Reload the page in Firefox (you may have to close the tab and open a new one).
  3. Open the Firefox developer tools to the “Fonts” section.

Screenshot of Firefox DevTools showing the "fonts used" section

Using this technique, I could finally get GitHub (as well as some other sites) to better suit my tastes:

Screenshot of GitHub rendered in Liberation Sans

Here is a zoomed-in before and after (Nimbus Sans on top, Liberation Sans on bottom):

Screenshot comparing Liberation Sans to Nimbus Sans in GitHub

Hopefully this blog post is helpful to someone else trying to figure out fontconfig. Good luck!

Notes

This research upended my understanding of how font-family works in CSS. I figured that if a website declares font-family: Helvetica, sans-serif, then it will fall back to sans-serif if Helvetica doesn’t exist on the user’s OS. But given that the OS can lie about which fonts it supports (and thus choose “Helvetica”), this can really change how fonts render.

Apparently fc-match Helvetica is supposed to show which font fontconfig chooses for Helvetica. This didn’t work for me. Running this command (with or without sudo) would always return “Nimbus Sans,” regardless of what Firefox was showing. Clearing the cache with fc-cache -rv also had no effect. Go figure.

Also, the Firefox font settings (Preferences → Fonts and Colors) don’t work. Well, maybe they work for choosing the default sans-serif font, but they don’t help if a website’s CSS explicitly lists Helvetica first. (As they often do, which is perhaps a testament to the Mac-centrism of most web developers. I even found this pattern in my own website’s CSS, which probably dates back to when I was a Mac user.)

Also, be aware that Chrome uses its own system for managing fonts, different from Firefox’s. If you test Chrome, you’ll get different results.

Fixing memory leaks in web applications

Part of the bargain we struck when we switched from building server-rendered websites to client-rendered SPAs is that we suddenly had to take a lot more care with the resources on the user’s device. Don’t block the UI thread, don’t make the laptop’s fan spin, don’t drain the phone’s battery, etc. We traded better interactivity and “app-like” behavior for a new class of problems that don’t really exist in the server-rendered world.

One of these problems is memory leaks. A poorly-coded SPA can easily eat up megabytes or even gigabytes of memory, continuing to gobble up more and more resources, even as it’s sitting innocently in a background tab. At this point, the page might start to slow to a crawl, or the browser may just terminate the tab and you’ll see Chrome’s familiar “Aw, snap!” page.

Chrome page saying "Aw snap! Something went wrong while displaying this web page."

(Of course, a server-rendered website can also leak memory on the server side. But it’s extremely unlikely to leak memory on the client side, since the browser will clear the memory every time you navigate between pages.)

The subject of memory leaks is not well-covered in the web development literature. And yet, I’m pretty sure that most non-trivial SPAs leak memory, unless the team behind them has a robust infrastructure for catching and fixing memory leaks. It’s just far too easy in JavaScript to accidentally allocate some memory and forget to clean it up.

So why is so little written about memory leaks? My guesses:

  • Lack of complaints: most users are not diligently watching Task Manager while they surf the web. Typically, you won’t hear about it from your users unless the leak is so bad that the tab is crashing or the app is slowing down.
  • Lack of data: the Chrome team doesn’t provide data about how much memory websites are using in the wild. Nor are websites often measuring this themselves.
  • Lack of tooling: it’s still not easy to identify or fix memory leaks with existing tooling.
  • Lack of caring: browsers are pretty good at killing tabs that consume too much memory. Plus people seem to blame the browser rather than the websites.

In this post, I’d like to share some of my experience fixing memory leaks in web applications, and provide some examples of how to effectively track them down.

Anatomy of a memory leak

Modern web app frameworks like React, Vue, and Svelte use a component-based model. Within this model, the most common way to introduce a memory leak is something like this:

window.addEventListener('message', this.onMessage.bind(this));

That’s it. That’s all it takes to introduce a memory leak. If you call addEventListener on some global object (the window, the <body>, etc.) and then forget to clean it up with removeEventListener when the component is unmounted, then you’ve created a memory leak.

Worse, you’ve just leaked your entire component. Because this.onMessage is bound to this, the component has leaked. So have all of its child components. And very likely, so have all the DOM nodes associated with the components. This can get very bad very fast.

Here is the fix:

// Mount phase
this.onMessage = this.onMessage.bind(this);
window.addEventListener('message', this.onMessage);

// Unmount phase
window.removeEventListener('message', this.onMessage);

Note that we saved a reference to the bound onMessage function. You have to pass in exactly the same function to removeEventListener that you passed in to addEventListener, or else it won’t work.

The memory leak landscape

In my experience, the most common sources of memory leaks are APIs like these:

  1. addEventListener. This is the most common one. Call removeEventListener to clean it up.
  2. setTimeout / setInterval. If you create a recurring timer (e.g. to run every 30 seconds), then you need to clean it up with clearTimeout or clearInterval. (setTimeout can leak if it’s used like setInterval – i.e., scheduling a new setTimeout inside of the setTimeout callback.)
  3. IntersectionObserver, ResizeObserver, MutationObserver, etc. These new-ish APIs are very convenient, but they are also likely to leak. If you create one inside of a component, and it’s attached to a globally-available element, then you need to call disconnect() to clean them up. (Note that DOM nodes which are garbage-collected will have their listeners and observers garbage-collected as well. So typically, you only need to worry about global elements, e.g. the <body>, the document, an omnipresent header/footer element, etc.)
  4. Promises, Observables, EventEmitters, etc. Any programming model where you’re setting up a listener can leak memory if you forget to stop listening. (A Promise can leak if it’s never resolved or rejected, in which case any .then() callbacks attached to it will leak.)
  5. Global object stores. With something like Redux the state is global, so if you’re not careful, you can just keep appending memory to it and it will never get cleaned up.
  6. Infinite DOM growth. If you implement an infinite scrolling list without virtualization, then the number of DOM nodes will grow without bound.

Of course, there are plenty of other ways to leak memory, but these are the most common ones I’ve seen.

Identifying memory leaks

This is the hard part. I’ll start off by saying that I just don’t think any of the tooling out there is very good. I’ve tried Firefox’s memory tool, the Edge and IE memory tools, and even Windows Performance Analyzer. The best-in-class is still the Chrome Developer Tools, but it has a lot of rough edges that are worth knowing about.

In the Chrome DevTools, our main tool of choice is going to be the “heap snapshot” tool in the “Memory” tab. There are other memory tools in Chrome, but I don’t find them very helpful for identifying leaks.

Screenshot of the Chrome DevTools Memory tab with the Heap Snapshot tool

The Heap Snapshot tool allows you to take a memory capture of the main thread or web workers or iframes.

When you click the “take snapshot” button, you’ve captured all the live objects in a particular JavaScript VM on that web page. This includes objects referenced by the window, objects referenced by setInterval callbacks, etc. Think of it as a frozen moment in time representing all the memory used by that web page.

The next step is to reproduce some scenario that you think may be leaking – for instance, opening and closing a modal dialog. Once the dialog is closed, you’d expect memory to return back to the previous level. So you take another snapshot, and then diff it with the previous snapshot. This diffing is really the killer feature of the tool.

Diagram showing a first heapsnapshot followed by a leaking scenario followed by a second heap snapshot which should be equal to the first

However, there are a few limitations of the tool that you should be aware of:

  1. Even if you click the little “collect garbage” button, you may need to take a few consecutive snapshots for Chrome to truly clean up the unreferenced memory. In my experience, three should be enough. (Check the total memory size of each snapshot – it should eventually stabilize.)
  2. If you have web workers, service workers, iframes, shared workers, etc., then this memory will not be represented in the heap snapshot, because it lives in another JavaScript VM. You can capture this memory if you want, but just be sure you know which one you’re measuring.
  3. Sometimes the snapshotter will get stuck or crash. In that case, just close the browser tab and start all over again.

At this point, if your app is non-trivial, then you’re probably going to see a lot of leaking objects between the two snapshots. This is where things get tricky, because not all of these are true leaks. Many of these are just normal usage – some object gets de-allocated in favor of another one, something gets cached in a way that will get cleaned up later, etc.

Cutting through the noise

I’ve found that the best way to cut through the noise is to repeat the leaking scenario several times. For instance, instead of just opening and closing a modal dialog once, you might open and close it 7 times. (7 is a nice conspicuous prime number.) Then you can check the heap snapshot diff to see if any objects leaked 7 times. (Or 14 times, or 21 times.)

Screenshot of the Chrome DevTools heap snapshot diff showing six heap snapshot captures with several objects leaking 7 times

A heap snapshot diff. Note that we’re comparing snapshot #6 to snapshot #3, because I take three captures in a row to allow more garbage collection to occur. Also note that several objects are leaking 7 times.

(Another helpful technique is to run through the scenario once before recording the first snapshot. Especially if you are using a lot of code-splitting, then your scenario is likely to have a one-time memory cost of loading the necessary JavaScript modules.)

At this point, you might wonder why we should sort by the number of objects rather than the total memory. Intuitively, we’re trying to reduce the amount of memory leaking, so shouldn’t we focus on the total memory usage? Well, this doesn’t work very well, for an important reason.

When something is leaking, it’s because (to paraphrase Joe Armstrong) you’re holding onto the banana, but you ended up getting the banana, the gorilla holding the banana, and the whole jungle. If you measure based on total bytes, you’re measuring the jungle, not the banana.

Let’s go back to the addEventListener example above. The source of the leak is an event listener, which is referencing a function, which references a component, which probably references a ton of stuff like arrays, strings, and objects.

If you sort the heap snapshot diff by total memory, then it’s going to show you a bunch of arrays, strings, and objects – most of which are probably unrelated to the leak. What you really want to find is the event listener, but this takes up a minuscule amount of memory compared to the stuff it’s referencing. To fix the leak, you want to find the banana, not the jungle.

So if you sort by the number of objects leaked, you’re going to see 7 event listeners. And maybe 7 components, and 14 sub-components, or something like that. That “7” should stand out like a sore thumb, since it’s such an unusual number. And no matter how many times you repeat the scenario, you should see exactly that number of objects leaking. This is how you can quickly find the source of the leak.

Walking the retainer tree

The heap snapshot diff will also show you a “retainer” chain showing which objects are pointing to which other objects and thus keeping the memory alive. This is how you can figure out where the leaking object was allocated.

Screenshot of a retainer chain showing someObject referenced by a closure referenced by an event listener

The retainer chain shows you which object is referencing the leaked object. The way to read it is that each object is referenced by the object below it.

In the above example, there is a variable called someObject which is referenced by a closure (aka “context”), which is referenced by an event listener. If you click the source link, it will take you to the JavaScript declaration, which is fairly straightforward:

class SomeObject () { /* ... */ }

const someObject = new SomeObject();
const onMessage = () => { /* ... */ };
window.addEventListener('message', onMessage);

In the above example, the “context” is the onMessage closure which references the someObject variable. (This is a contrived example; real memory leaks can be much less obvious!)

But the heap snapshotting tool has several limitations:

  1. If you save and re-load the snapshot file, then you will lose all the file references to where the object was allocated. So for instance, you won’t see that the event listener’s closure comes from line 22 of foo.js. Since this is really critical information, it’s almost useless to save and send heap snapshot files.
  2. If there are WeakMaps involved, then Chrome will show you those references even though they don’t really matter – those objects would be de-allocated as soon as the other references are cleaned up. So they are just noise.
  3. Chrome classifies the objects by their prototype. So the more you use actual classes/functions and the less you use anonymous objects, the easier it will be to see what exactly is leaking. As an example, imagine if our leak was somehow due to an object rather than an EventListener. Since object is extremely generic, we’re unlikely to see exactly 7 of them leaking.

This is my basic strategy for identifying memory leaks. I’ve successfully used this technique to find dozens of memory leaks in the past.

This guide is just the start, though – beyond this, you will also have to be handy with setting breakpoints, logging, and testing your fix to see if it resolves the leak. Unfortunately, this is just inherently a time-consuming process.

Automated memory leak analysis

I’ll precede this by saying that I haven’t found a great way to automate the detection of memory leaks. Chrome has a non-standard performance.memory API, but for privacy reasons it doesn’t have a very precise granularity, so you can’t really use it in production to identify leaks. The W3C Web Performance Working Group has discussed memory tooling in the past, but has yet to agree on a new standard to replace this API.

In a lab or synthetic testing environment, you can increase the granularity on this API by using the Chrome flag --enable-precise-memory-info. You can also create heap snapshot files by calling the proprietary Chromedriver command :takeHeapSnapshot. This has the same limitation mentioned above, though – you probably want to take three in a row and discard the first two.

Since event listeners are the most common source of memory leaks, another technique that I’ve used is to monkey-patch the addEventListener and removeEventListener APIs to count the references and ensure they return to zero. Here is an example of how to do that.

In the Chrome DevTools, you can also use the proprietary getEventListeners() API to see the event listeners attached to a particular element. Note that this can only be used in DevTools, though.

Update: Mathias Bynens has informed me of another useful DevTools API: queryObjects(), which can show you all objects created with a particular constructor. Christoph Guttandin also has an interesting blog post about using this API for automated memory leak detection in Puppeteer.

Summary

The state of finding and fixing memory leaks in web apps is still fairly rudimentary. In this blog post, I’ve covered some of the techniques that have worked for me, but admittedly this is still a difficult and time-consuming process.

As with most performance problems, an ounce of prevention can be worth a pound of cure. You might find it worthwhile to put synthetic testing in place rather than trying to debug a memory leak after the fact. Especially if there are several leaks on a page, it will probably turn into an onion-peeling exercise – you fix one leak, then find another, then repeat (while weeping the whole time!). Code review can also help catch common memory leak patterns, if you know what to look for.

JavaScript is a memory-safe language, so it’s somewhat ironic how easy it is to leak memory in a web application. Part of this is just inherent to UI design, though – we need to listen for mouse events, scroll events, keyboard events, etc., and these are all patterns that can easily lead to memory leaks. But by trying to keep our web applications’ memory usage low, we can improve runtime performance, avoid crashes, and be respectful of resource constraints on the user’s device.

Thanks to Jake Archibald and Yang Guo for feedback on a draft of this post. And thanks to Dinko Bajric for inventing the “choose a prime number” technique, which I’ve found so helpful in memory leak analysis.

2019 end-year book review

After reviewing books for a couple of years, one thing that stood out to me is just how few of them were written by women. Well this year, I tried to buck that trend – of the 8 books on this list, 4 were written by women.

Happy new year!

Quick links

Nonfiction

Fiction

Nonfiction

How to Do Nothing: Resisting the Attention Economy by Jenny Odell (2019)

A beautiful, moving book that’s hard to summarize or draw a “conclusion” from. You just have to read it.

This book impressed upon me the power of quiet, contemplative moments and of art. And it did so by being a great piece of art itself. That’s pretty rare for a nonfiction book, so it deserves a place on my must-read list of 2019.

Trick Mirror: Reflections on Self-Delusion by Jia Tolentino (2019)

A collection of essays on various topics, hilariously written and full of unexpected conclusions. Tolentino is one of those writers who can craft a sentence in just such a way that it makes you laugh out loud at the mere phrasing, but she’s also incredibly insightful, especially when writing about narcissism and self-delusion in the context of modern media.

Here she is on trolling in social media:

My column about trolling would, of course, attract an influx of trolling. Then, having proven my point, maybe I’d go on TV and talk about the situation, and then I would get trolled even more, and then I could go on defining myself in reference to trolls forever, positioning them as inexorable and monstrous, and they would return the favor in the interest of their own ideological advancement, and this whole situation could continue until we all died.

Whiteshift: Populism, Immigration, and the Future of White Majorities by Eric Kaufmann (2019)

After the 2016 election, I read a lot of commentary like The Retreat of Western Liberalism by Edward Luce, which argued that the rise of populism in Western democracies is fundamentally an economic phenomenon. (Luce’s book begins with the famous elephant chart). This was a common trope among political commentators at the time, as was the image of the out-of-work coal miner in West Virginia. (Think: Hillbilly Elegy by J. D. Vance.)

Eventually, though, a different picture started to emerge. Books like Identity Crisis pointed out that, according to the polling data, the most important factor in the 2016 election was actually ethnicity and immigration, not economics. This idea shocked me, but I didn’t really have a framework for understanding why anyone would be upset about demographic changes.

Whiteshift helps put this situation into perspective. Kaufmann compares our current moment to the anti-Catholic and anti-Jewish immigrant backlashes of the 19th and early 20th century, which caused political and social turmoil before these groups eventually became subsumed into the “white” majority. He makes a good case that the same thing will happen with current demographic trends, as long as we can craft a unifying narrative for the new majority.

I appreciate that this book is high on statistics and academic rigor, and low on emotional reasoning. I hope that the author’s optimistic outlook ends up being correct.

Because Internet: Understanding the New Rules of Language by Gretchen McCulloch

A really fun read. I enjoyed the historical outlook on how different groups became absorbed into internet culture at different points of time, and how their first contact point (be it Usenet, AIM, or Facebook) influenced the way they communicate on the internet.

I also appreciate that this book was written by a linguist, and that she takes the study of internet language as seriously as any other language. One of the more interesting arguments in this book is that emoji are more analogous to hand gestures than to anything we can identify in written language (hieroglyphs, ideograms, etc.).

Misquoting Jesus: The Story Behind Who Changed the Bible and Why by Bart D. Ehrman (2005)

This book is a great read if you’re looking for a historical perspective on early Christianity: how it spread from the Gospels through Paul the Apostle, and how different scholars clashed with one another in their interpretations.

I was especially interested in how many different sects there were among early Christians – such as the Marcionists, who believed that the God of the Old Testament was distinct from the God of the New Testament. It’s interesting to imagine how differently world religions might have turned out if one or another interpretation had happened to survive.

Upheaval: Turning Points for Nations in Crisis by Jared Diamond (2019)

A great book, just as engrossing as Guns, Germs, and Steel or Collapse. Even though we may live in “interesting times,” I appreciate the historical perspective, because it helps you realize that history tends to be cyclical, and you can learn a lot about the present by studying the past. I found the chapters on Finland’s Winter War and Chile’s Pinochet regime to be especially interesting.

Fiction

The Mists of Avalon by Marion Zimmer Bradley (1982)

This is a massive book, so it’s difficult to only say one or two things about it. I didn’t really know much about the Arthurian legend before reading it, so to me this is less of a “retelling” and more of the foundational way I’ll now look at the story of Arthur, Guinevere, the Holy Grail, etc.

What I found most interesting was the story of King Arthur as a kind of clash of civilizations – between the indigenous Druidic Celtic people and the conquering Romans (and later Saxons). The author paints a vivid picture of a population slowly shifting from largely pagan to largely Christian, and the conflict between those who either stubbornly hold to the old ways, eagerly embrace the new ones, or try to find some common ground between the two.

Fire and Blood by George R. R. Martin (2019)

I’ve been an avid Song of Ice and Fire reader since 2011, so I’ve been eagerly awaiting the sequel to A Dance with Dragons since what feels like forever.

A prequel is not what I would have wished for, but (like the Dunk and Egg novellas) this one will have to hold me over. Ultimately I think Fire and Blood is a great read, full of intriguing characters and richly-imagined world-building. If you’ve already read The World of Ice and Fire, though, you may find it a bit repetitive. (Although personally, I prefer the more novel-like format of Fire and Blood.)

What I’ve learned about accessibility in SPAs

Over the past year or so, I’ve learned a lot about accessibility, mostly thanks to working on Pinafore, which is a Single Page App (SPA). In this post, I’d like to share some of the highlights of what I’ve learned, in the hope that it can help others who are trying to learn more about accessibility.

One big advantage I’ve had in this area is the help of Marco Zehe, an accessibility expert who works at Mozilla and is blind himself. Marco has patiently coached me on a lot of these topics, and his comments on the Pinafore GitHub repo are a treasure trove of knowledge.

So without further ado, let’s dive in!

Misconceptions

One misconception I’ve observed in the web community is that JavaScript is somehow inherently anti-accessibility. This seems to stem from a time when screen readers did not support JavaScript particularly well, and so indeed the more JavaScript you used, the more likely things were to be inaccessible.

I’ve found, though, that most of the accessibility fixes I’ve made have actually involved writing more JavaScript, not less. So today, this rule is definitely more myth than fact. However, there are a few cases where it holds true:

divs and spans versus buttons and inputs

Here’s the best piece of accessibility advice for newbies: if something is a button, make it a <button>. If something is an input, make it an <input>. Don’t try to reinvent everything from scratch using <div>s and <span>s. This may seem obvious to more seasoned web developers, but for those who are new to accessibility, it’s worth reviewing why this is the case.

First off, for anyone who doubts that this is a thing, there was a large open-source dependency of Pinafore (and of Mastodon) that had several thousand GitHub stars, tens of thousands of weekly downloads on npm, and was composed almost entirely of <div>s and <span>s. In other words: when something should have been a <button>, it was instead a <span> with a click listener. (I’ve since fixed most of these accessibility issues, but this was the state I found it in.)

This is a real problem! People really do try to build entire interfaces out of <div>s and <span>s. Rather than chastise, though, let me analyze the problem and offer a solution.

I believe the reason people are tempted to use <div>s and <span>s is that they have minimal user agent styles, i.e. there is less you have to override in CSS. However, resetting the style on a <button> is actually pretty easy:

button {
  margin: 0;
  padding: 0;
  border: none;
  background: none;
}

99% of the time, I’ve found that this was all I needed to reset a <button> to have essentially the same style as a <div> or a <span>. For more advanced use cases, you can explore this CSS Tricks article.

In any case, the whole reason you want to use a real <button> over a <span> or a <div> is that you essentially get accessibility for free:

  • For keyboard users who Tab around instead of using a mouse, a <button> automatically gets the right focus in the right order.
  • When focused, you can press the Space bar on a <button> to press it.
  • Screen readers announce the <button> as a button.
  • Etc.

You could build all this yourself in JavaScript, but you’ll probably mess something up, and you’ll also have a bunch of extra code to maintain. So it’s best just to use the native semantic HTML elements.

SPAs must manually handle focus and scroll position

There is another case where the “JavaScript is anti-accessibility” mantra has a kernel of truth: SPA navigation. Within SPAs, it’s common for JavaScript to handle navigation between pages, i.e. by modifying the DOM and History API rather than triggering a full page load. This causes several challenges for accessibility:

  • You need to manage focus yourself.
  • You need to manage scroll position yourself.

For instance, let’s say I’m in my timeline, and I want to click this timestamp to see the full thread of a post:

Screenshot of Pinafore timeline showing a post from me with an arrow pointing to the timestamp

When I click the link and then press the back button, focus should return to the element I last clicked (note the purple outline):

Screenshot showing the same image as above, but with a focus outline around the timestamp

For classic server-rendered pages, most browser engines [1] give you this functionality for free. You don’t have to code anything. But in an SPA, since you’re overriding the normal navigation behavior, you have to handle the focus yourself.

This also applies to scrolling, especially in a virtual list. In the above screenshot, note that I’m scrolled down to exactly the point in the page from before I clicked. Again, this is seamless when you’re dealing with server-rendered pages, but for SPAs the responsibility is yours.

Easier integration testing

One thing I was surprised to learn is that, by making my app more accessible, I also made it easier to test. Consider the case of toggle buttons.

A toggle button is a button that can have two states: pressed or not pressed. For instance, in the screenshot below, the “boost” and “favorite” buttons (i.e. the circular arrow and the star) are toggle buttons, because it’s possible to boost or favorite a post, and they start off in unboosted/unfavorited states.

Screenshot of a Mastodon post in Pinafore with the favorite (star) button pressed and the boost button unpressed

Visually, there are plenty of styles you can use to signal the pressed/unpressed state – for instance, I’ve opted to make the colors darker when pressed. But for the benefit of screen reader users, you’ll typically want to use a pattern like the following:

<button type="button" aria-pressed="false">
  Unpressed
</button>
<button type="button" aria-pressed="true">
  Pressed
</button>

Incidentally, this makes it easier to write integration tests (e.g. using TestCafe or Cypress). Why rely on classes and styles, which might change if you redesign your app, when you can instead rely on the semantic attributes, which are guaranteed to stay the same?

I observed this pattern again and again: the more I improved accessibility, the easier things were to test. For instance:

  • When using the feed pattern, I could use aria-posinset and aria-setsize to confirm that the virtual list had the correct number of items and in the correct order.
  • For buttons without text, I could test the aria-label rather than the background image or something that might change if the design changed.
  • For hidden elements, I could use aria-hidden to identify them.
  • Etc.

So make accessibility a part of your testing strategy! If something is easy for screen readers to interpret, then it’ll probably be easier for your automated tests to interpret, too. After all, screen reader users might not be able to see colors, but neither can your headless browser tests!

Subtleties with focus management

After watching this talk by Ian Forrest and playing around with KaiOS, I realized I could make some small changes to improve keyboard accessibility in my app.

As pointed out in the talk, it’s not necessarily the case that every mouse-accessible element also needs to be keyboard-accessible. If there are redundant links on the page, then you can skip them in the tabindex order, so a keyboard user won’t have to press Tab so much.

In the case of Pinafore, consider a post. There are two links that lead to the user’s profile page – the profile picture and the user name:

Screenshot showing a Mastodon post with red rectangles around the user avatar and the user name

These two links lead to exactly the same page; they are strictly redundant. So I chose to add tabindex="-1" to the profile picture, giving keyboard users one less link to have to Tab through. Especially on a KaiOS device with a tiny d-pad, this is a nice feature!


 

In the above video, note that the profile picture and timestamp are skipped in the tab order because they are redundant – clicking the profile picture does the same thing as clicking the user name, and clicking the timestamp does the same thing as clicking on the entire post. (Users can also disable the “click the entire post” feature, as it may be problematic for those with motor impairments. In that case, the timestamp is re-added to the tab order.)

Interestingly, an element with tabindex="-1" can still become focused if you click it and then press the back button. But luckily, tabbing out of that element does the right thing as long as the other tabbable elements are in the proper order.

The final boss: accessible autocomplete

After implementing several accessible widgets from scratch, including the feed pattern and an image carousel (which I described in a previous post), I found that the single most complicated widget to implement correctly was autocompletion.


 

Originally, I had implemented this widget by following this design, which relies largely on creating an element with aria-live="assertive" which explicitly speaks every change in the widget state (e.g. “the current selected item is number 2 of 3”). This is kind of a heavy-handed solution, though, and it led to several bugs.

After toying around with a few patterns, I eventually settled on a more standard design using aria-activedescendant. Roughly, the HTML looks like this:

<textarea
  id="the-textarea"
  aria-describedby="the-description"
  aria-owns="the-list"
  aria-expanded="false"
  aria-autocomplete="both"
  aria-activedescendant="option-1">
</textarea>
<ul id="the-list" role="listbox">
  <li 
    id="option-1"
    role="option" 
    aria-label="First option (1 of 2)">
  </li>
  <li
    id="option-2"
    role="option" 
    aria-label="Second option (2 of 2)">
  </li>
</ul>
<label for="the-textarea" class="sr-only">
  What's on your mind?
</label>
<span id="the-description" class="sr-only">
  When autocomplete results are available, press up or down 
  arrows and enter to select.
</span>

Explaining this pattern probably deserves a blog post in and of itself, but in broad strokes, what’s happening is:

  • The description and label are offscreen, using styles which make it only visible to screen readers. The description explains that you can press up or down on the results and press enter to select.
  • aria-expanded indicates whether there are autocomplete results or not.
  • aria-activedescendant indicates which option in the list is selected.
  • aria-labels on the options allow me to control how it’s spoken to a screen reader, and to explicitly include text like “1 of 2” in case the screen reader doesn’t speak this information.

After extensive testing, this was more-or-less the best solution I could come up with. It works perfectly in NVDA on the latest version of Firefox, although sadly it has some minor issues in VoiceOver on Safari and NVDA on Chrome. However, since this is the standards-based solution (and doesn’t rely on aria-live="assertive" hacks), my hope is that browsers and screen readers will catch up with this implementation.

Update: I managed to get this widget working in Chrome+NVDA and Safari+VoiceOver. The fixes needed are described in this comment.

Manual and automated accessibility testing

There are a lot of automated tools that can give you good tips on improving accessibility in your web app. Some of the ones I’ve used include Lighthouse (which uses Axe under the hood), the Chrome accessibility tools, and the Firefox accessibility tools. (These tools can give you slightly different results, so I like to use multiple so that I can get second opinions!)

Screenshot of a post in Pinafore with the Firefox accessibility tools open in a side panel

However, I’ve found that, especially for screen reader accessibility, there is no substitute for testing in an actual browser with an actual screen reader. It gives you the exact experience that a screen reader user would have, and it helps build empathy for what kinds of design patterns work well for voice navigation and which ones don’t. Also, sometimes screen readers have bugs or slightly differing behavior, and these are things that accessibility auditing tools can’t tell you.

If you’re just getting started, I would recommend watching Rob Dodson’s “A11ycasts” series, especially the tutorials on VoiceOver for macOS and NVDA for Windows. (Note that NVDA is usually paired with Firefox and VoiceOver is optimized for Safari. So although you can use either one with other browsers, those are the pairings that tend to be most representative of real-world usage.)

Personally I find VoiceOver to be the easiest to use from a developer’s point of view, mostly because it has a visual display of the assistive text while it’s being spoken.

Screenshot of VoiceOver in Safari on macOS showing text at the bottom of the screen representing what's spoken

NVDA can also be configured to do this, but you have to know to go into the settings and enable the “Speech Viewer” option. I would definitely recommend turning this on if you’re using NVDA for development!

Screenshot of Speech Viewer in NVDA on Firefox showing lines of text representing what's spoken

Similar to testing screen readers, it’s also a good idea to try Tabing around your app to see how comfortable it is with a keyboard. Does the focus change unexpectedly? Do you have to do a lot of unnecessary Tabing to get where you want? Are there any handy keyboard shortcuts you’d like to add?

For a lot of things in accessibility, there are no hard-and-fast rules. Like design or usability in general, sometimes you just have to experience what your users are experiencing and see where you can optimize.

Conclusion

Accessibility can be challenging, but ultimately it’s worth the effort. Working on accessibility has improved the overall usability of my app in a number of ways, leading to unforeseen benefits such as KaiOS arrow key navigation and better integration tests.

The greatest satisfaction, though, comes from users who are happy with the work I’ve done. I was beyond pleased when Marco had this to say:

“Pinafore is for now by far the most accessible way to use Mastodon. I use it on desktop as well as iOS, both iPhone & iPad, too. So thank you again for getting accessibility in right from the start and making sure the new features you add are also accessible.”
Marco Zehe, October 21 2019

Thank you, Marco, and thanks for all your help! Hopefully this blog post will serve as a way to pay your accessibility advice forward.

Thanks to Sorin Davidoi, Thomas Wilburn, and Marco Zehe for feedback on a draft of this post.

Footnotes

1. In the course of writing this article, I was surprised to learn that, for server-rendered pages, pressing the back button restores focus to the previously-clicked element in Firefox, Safari, and Edge (EdgeHTML), but not Chrome. I found a webcompat.com bug describing the browser difference, I’ve gone ahead and filed a bug on Chrome.

The joy and challenge of developing for KaiOS

Photo of my hand holding a Nokia 8110 4G running Pinafore on KaiOS

Recently I spent some time getting Pinafore to run on a KaiOS device. Overall, I found it to be challenging but enjoyable. In this post, I’ll talk about some of the tricks that helped me to work with this curious little feature phone.

Why KaiOS? Well, I guess I’ve always been a bit of a gadget geek. I’ve been developing for Android since the early days of the HTC Dream, I managed to get my Nexus 5 to triple-boot into Android, Firefox OS, and Ubuntu Touch, and I’ve also played around with the Amazon Fire Phone, Windows Phone, iPod Touch… You get the idea.

Recently though, I’ve found the mobile landscape a bit boring. Android and iOS are the two big players, and it’s been that way for nearly a decade. Firefox OS, Blackberry OS, Windows Phone, and other would-be contenders have long since bitten the dust.

That’s why I’m excited about KaiOS. It’s a platform that’s growing surprisingly fast and is actually based on Firefox OS (may it rest in peace). It’s especially popular in India, where it’s already the second-most popular mobile OS after Android.

KaiOS can run regular websites, but it can also run “packaged” apps, ala Firefox OS. So how can an aspiring KaiOS developer get started?

Step one: buying a development phone

I like to test on real devices, because I want to experience my app with real-world hardware and performance constraints, as an actual user would. I decided to get the Nokia 8110 4G because it was available on Amazon for $70.

Note that this phone only supports AT&T in the U.S., so if you plan on using it as your actual personal phone, you may be out of luck.

When you first unbox the phone, you’ll have about 5 minutes’ worth of setup screens, after which you’re ready to go. I’d also recommend going into the settings and upgrading the OS, since mine needed an upgrade right away.

Next steps: development environment

The best resources I’ve found for KaiOS development are the official developer guide and Paul Kinlan’s quick start guide. Paul’s guide is the best place to start, since it’s short and will get you up and running quickly.

Paul says that he could only get WebIDE to work in Firefox 48, but I found that Firefox 59 also worked (per the KaiOS developer guide). So let’s use that.

First you’ll want to download Firefox 59 for your OS (in my case, Ubuntu) via Mozilla’s download site. Sadly I could not get Firefox 59 to run alongside the latest Firefox (which is my go-to browser), and I also found that Firefox 59 tried to aggressively update itself, so every time I reopened it, it would be running the latest version. To work around that, you can run this script:

rm -fr firefox profile
mkdir profile
tar -xjf firefox-59.0.3.tar.bz2
./firefox/firefox-bin -profile profile

This will delete Firefox’s stored data and restart it with a fresh profile every time. It’s handy for quickly restarting Firefox!

After you input the “secret” code that Paul mentions (*#*#33284#*#*), you should see a little bug icon in the corner:

Screenshot of KaiOS home screen showing a little bug icon in the corner

At this point you should be able to run:

adb devices

…and see your device in the list of devices:

List of devices attached
4939400 device

If adb isn’t running, you can run adb kill-server && adb start-server to restart it.

This didn’t work for me out of the box – I got an error saying it couldn’t connect to my device due to faulty udev rules. Luckily the KaiOS developer guide has a “Setting USB access” script that will fix that.

After running that script, I used Paul’s trick of setting up adb forwarding:

adb forward tcp:6000 localfilesystem:/data/local/debugger-socket

After this, you should be able to connect in Firefox WebIDE by clicking “Remote Runtime.”

Screenshot of Firefox WebIDE with Remote Runtime selected and localhost:6000 entered

One thing I noticed is that occasionally my device would get disconnected from WebIDE. In that case, you can just forward to another port:

adb forward tcp:6001 localfilesystem:/data/local/debugger-socket

…and then reconnect WebIDE to the new port, and it should work.

Next steps: actually writing code

As far as I can tell, KaiOS is based on Firefox 48. You can tell by the user agent string:

Mozilla/5.0 (Mobile; Nokia_8110_4G; rv:48.0) Gecko/48.0 Firefox/48.0 KAIOS/2.5.1

This means that you won’t necessarily have all the latest-and-greatest JavaScript or Web API features. Here’s a brief list of what I found didn’t work:

  • async functions
  • ServiceWorker
  • Intl

Update: it turns out ServiceWorker is supported! But you have to enable a permission in the manifest. Thanks, Fabrice!

As a quick gut-check of whether your app will run on KaiOS, you can try downloading Firefox 48, loading your website, and seeing if it works. (I found this easier than trying to debug on KaiOS right out of the gate.) You can use the same script above to run Firefox 48 with a fresh profile.

Once your code is sufficiently backwards-compatible (I found that @babel/preset-env with the default settings plus an Intl polyfill was good enough for me as a starting point), you have two options for testing your app:

  • as a packaged app
  • as a hosted app

(If you’ve done Firefox OS development before, you’ll find these options familiar.)

I tried both techniques, but found that the hosted app was easier for quick testing. If your development machine and phone are on the same WiFi network, then you can create a manifest.webapp file per KaiOS’s documentation and then load a URL like so:

http://192.168.1.2:4002/manifest.webapp

Screenshot of WebIDE showing a local manifest URL entered into the Hosted App popup

(If you’re not sure what IP address, to use, run ifconfig -a | grep inet.)

Then you should see your app listed, and you can run it on the device by clicking the “play” button:

Screenshot of WebIDE showing Pinafore as a hosted app

You can also try the packaged-app route, but I found that it didn’t play well with the routing library in my framework of choice (Sapper) due to the extra /index.html. Plus, if I ever actually submit this app to the KaiStore, I’ll probably go with the hosted route since it’s less maintenance work.

Optimizing for KaiOS

In terms of web development, there are a few good things to be aware of when developing for KaiOS, and in particular for the Nokia 8110 4G.

First off, the available screen size is 240×294 pixels, which you can check by using window.innerWidth in JavaScript, @media (max-width: 240px) in CSS, etc. If you set "fullscreen": true to hide the status bar, you can get slightly more breathing room: 240×320.

I had already optimized my app for the iPhone 4’s screen size (which is 320×480), but I found I had to do extra work to make things show up correctly on an even tinier screen.

Screenshot comparing Pinafore on an iPhone 4 versus a smaller Nokia 8110 4G

The iPhone 4 is already pretty small, but it’s a giant compared to the Nokia 8110 4G.

Next, the input methods are quite limited. If you’ve actually taken the time to make your webapp keyboard-accessible, then you should be most of the way there. But there are still some extra optimizations you may have to make for KaiOS.

By default, the and arrows will typically scroll the page up or down. I decided to use the and arrows to change the focus – i.e. to act as proxies for the Tab and Shift + Tab keys. I wrote some fairly simple logic to navigate through the focusable elements on the page, taking special care to allow the arrow keys to work properly inside of text inputs and modal dialogs.

I also found that the focus outlines were a bit hard to see on the small screen, so I scaled them up for KaiOS and added some extra styling to make the focus more obvious.

Some apps may opt to display an onscreen cursor (this seems to be the default behavior of the web browser), but I found that to be a bit of a clumsy experience. So I think managing the focus is better.

 

Other KaiOS web developer hurdles

The next most difficult thing about KaiOS is still the lack of modern browser standards. For instance, I couldn’t get SVG <use> tags to work properly, so I fell back to inlining each SVG as a workaround.

CSS Grid is (mercifully) supported, so I didn’t have to backport my Grid logic. Since Grid is not supported in Firefox 48, this hints to me that KaiOS must have tweaked the Firefox OS code beyond what Firefox 48 can support. But I haven’t done the full rundown of all the supported features. (And I still find it helpful to use Firefox 48 as a minimum bar for a quick test target.)

As with all web development, this is the area where you’ll probably have to roll up your sleeves and test for what’s supported and what’s not. But overall I find KaiOS to be much easier to develop for than, say, IE11. It’s much further along on the standards track than a truly legacy browser.

Once you’ve gotten past those hurdles, though, the result can be surprisingly good! I was impressed at how well this tiny feature phone could run Pinafore:

 
(Much of the credit, I’m sure, is due to the excellent SvelteJS framework.)

Conclusion

Building for KaiOS is fun! The device is quirky and lightweight, and it makes me excited about mobile development in a way that I haven’t felt in a while. In terms of development, it feels like a nice compromise between a feature phone with limited hardware (small screen, no touch, limited key input) and a smartphone OS with cutting-edge web technologies (it has CSS Grid! And String.prototype.padStart()! It’s not so bad!).

However, I’m not a huge fan of app stores – I don’t like having to agree to the ToS, going through a review process, and ultimately subjecting myself to the whims of a private corporation. But KaiOS is neat, it’s cute, and it’s growing in developing markets, so I think it’s a worthy venue for development.

And unlike fully proprietary development platforms, you’re not writing throwaway code if KaiOS ever goes away. Since it’s ultimately just a webapp, improvements I’ve made to Pinafore for keyboard accessibility and small screens will accrue to any other device that can browse the web. Other than the manifest.webapp file, there’s nothing truly specific to KaiOS that I had to do to support it.

Overall, I find it fun and refreshing to build for something like KaiOS. It helps me see the web platform from a new perspective, and if nothing else, I’ve got a new gadget to play with.

Browsers, input events, and frame throttling

If there’s one thing I’ve learned about web performance, it’s that you have to approach it with a sense of open-mindedness and humility. Otherwise, prepare to be humbled.

Just as soon as you think you’ve got it all figured out, poof! Browsers change their implementation. Or poof! The spec changes. Or poof! You just flat-out turn out to be wrong. So you have to constantly test and revalidate your assumptions.

In a recent post, I suggested that pointermove events fire more frequently than requestAnimationFrame, and so it’s a good idea to throttle them to rAF. I also rattled off some other events that may fire faster than rAF, such as scroll, wheel, touchmove, and mousemove.

Do these events actually fire faster than rAF, though? It’s an important detail! If browsers already align/throttle these events to rAF, then there’s little point in recreating that same behavior in userland. (Thankfully an extra rAF won’t add an extra frame delay, though, assuming browsers fire the rAF-aligned events right before rAF. Thanks Jake Archibald for this tip!)

TL;DR: it varies across browsers and events. I’d still recommend the rAF-throttling technique described in my previous post.

Step one: check the spec

The first question to ask is: what does the spec say?

After reading the specs for pointermove, mousemove, touchmove, scroll, and wheel, I found that the only mention of animation frame timing was in pointermove and scroll. The spec for pointermove says:

A user agent MUST fire a pointer event named pointermove when a pointer changes button state. […] These events may be coalesced or aligned to animation frame callbacks based on UA decision.

(Emphasis mine.) So browsers are not required to coalesce or align pointermove events to animation frames, but they may do so. (Presumably, this is the point of getCoalescedEvents().)

As for scroll, it’s mentioned in the event loop spec, where it says “for each fully active Document […], run the scroll steps for that Document” as part of the steps before running rAF callbacks. So on the main document at least, scroll is definitely supposed to fire before rAF.

For contrast, here’s touchmove:

A user agent must dispatch this event type to indicate when the user moves a touch point along the touch surface. […] Note that the rate at which the user agent sends touchmove events is implementation-defined, and may depend on hardware capabilities and other implementation details.

(Emphasis mine.) So this time, nothing about animation frames, and also some language about “implementation-defined.” Similarly, here’s mousemove:

The frequency rate of events while the pointing device is moved is implementation-, device-, and platform-specific, but multiple consecutive mousemove events SHOULD be fired for sustained pointer-device movement, rather than a single event for each instance of mouse movement.

(Emphasis mine.) So we’re starting to get a pretty clear picture (or a hazy one, depending on your perspective). It seems that, aside from scroll, the specs don’t have much to say about whether events should be coalesced with rAF or not.

Step two: test it

However, this doesn’t mean browsers don’t do it! After all, it’s clearly in browsers’ interests to coalesce these events to animation frames. Assuming that most web developers do the simplest possible thing and handle the events directly, then any browser that aligns with rAF will avoid some unintentional jank from noisy input events.

Do browsers actually do this, though? Thankfully Jake has written a nice demo which makes it easy to test this. I’ve also extended his demo to test scroll events. And because I apparently have way too much free time on my hands (or I just hate uncertainty when it comes to browser stuff), I went ahead and compiled the data for various browsers and OSes:

pointermove mousemove touchmove wheel scroll
Chrome 76 (Windows 10) Y* Y* N/A Y* Y
Firefox 68 (Windows 10) Y Y N/A N Y
Edge 18 (Windows 10) N N N/A N Y
Chrome 76 (macOS 10.14.6) Y* Y* N/A Y* Y
Firefox 68 (macOS 10.14.6) Y Y N/A N Y
Safari 12.1.2 (macOS 10.14.6) N/A N N/A N N
Safari Technology Preview 13.1 (macOS 10.14.6) N N N/A N N
Chrome 76 (Ubuntu 16.04) Y* Y* N/A Y* Y
Firefox 68 (Ubuntu 16.04) Y Y N/A N Y
GNOME Web 3.28.5 (Ubuntu 16.04) N/A N N/A N N
Chrome 76 (Android 6) Y N/A Y N/A Y
Firefox 68 (Android 6) N/A N/A Y N/A Y
Safari (iOS 12.4) N/A N/A Y N/A N

Abbreviations:

  • Y: Yes, events are coalesced and aligned to rAF.
  • N: No, events fire independently of and faster than rAF.
  • N/A: Event doesn’t apply to this device/browser.
  • *: Except when Dev Tools are opened, apparently.

Conclusion

As you can see from the data, there is a lot of variance in terms of which events and browsers align to rAF. Although for the most part, it seems consistent within browser engines (e.g. GNOME Web is a WebKit-based browser, and it patterns with macOS Safari). Note though that I only tested a regular mouse or trackpad, not exotic input devices such as a Wacom stylus, Surface Pen, etc.

Given this data, I would take the cautious approach and still do the manual rAF-throttling as described in my previous blog post. It has the upside of being guaranteed to work roughly the same across all browsers, at the cost of some extra bookkeeping. [1]

Depending on your supported browser matrix, though, and depending on when you’re reading this (maybe at a point in the future when all browser input events are rAF-aligned!), then you may just handle the input directly and trust the browser to align it to rAF. [2]

Thanks to Ben Kelly and Jake Archibald for feedback on a draft of this blog post. Thanks also to Jake for clueing me in to this rAF-throttling business in the first place.

Footnotes

1. Interestingly, in the case of pointermove at least, the browser behavior can be feature-detected by checking getCoalescedEvents (i.e. Firefox and Chrome have it, Edge and Safari Technology Preview don’t). So you can use PointerEvent.prototype.getCoalescedEvents as a feature check. But there’s little point in feature-detecting, since manual rAF-throttling doesn’t add an extra frame delay in browsers that already rAF-align.

2. Jake also pointed me to an interesting detail: “Although these events are synced to rendering, they’ll flush if another non-synced event happens.” So for instance, keyboard events will interfere with pointermove and cause them to no longer sync to rAF, which you can reproduce in Jake’s demo by typing on the keyboard and moving the mouse at the same time. Another good reason to just rAF-throttle and be sure!

High-performance input handling on the web

Update: In a follow-up post, I explore some of the subtleties across browsers in how they fire input events.

There is a class of UI performance problems that arise from the following situation: An input event is firing faster than the browser can paint frames.

Several events can fit this description:

  • scroll
  • wheel
  • mousemove
  • touchmove
  • pointermove
  • etc.

Intuitively, it makes sense why this would happen. A user can jiggle their mouse and deliver precise x/y updates faster than the browser can paint frames, especially if the UI thread is busy and thus the framerate is being throttled (also known as “jank”).

Screenshot of Chrome Dev Tools showing that a long frame of 546ms can contain as many as four pointermove events

In the above screenshot, pointermove events are firing faster than the framerate can keep up.[1] This can also happen for scroll events, touch events, etc.

Update: In Chrome, pointermove is actually supposed to align/throttle to requestAnimationFrame automatically, but there is a bug where it behaves differently with Dev Tools open.

The performance problem occurs when the developer naïvely chooses to handle the input directly:

element.addEventListener('pointermove', () => {
  doExpensiveOperation()
})

In a previous post, I discussed Lodash’s debounce and throttle functions, which I find very useful for these kinds of situations. Recently however, I found a pattern I like even better, so I want to discuss that here.

Understanding the event loop

Let’s take a step back. What exactly are we trying to achieve here? Well, we want the browser to do only the work necessary to paint the frames that it’s able to paint. For instance, in the case of a pointermove event, we may want to update the x/y coordinates of an element rendered to the DOM.

The problem with Lodash’s throttle()/debounce() is that we would have to choose an arbitrary delay (e.g. 20 milliseconds or 50 milliseconds), which may end up being faster or slower than the browser is actually able to paint, depending on the device and browser. So really, we want to throttle to requestAnimationFrame():

element.addEventListener('pointermove', () => {
  requestAnimationFrame(doExpensiveOperation)
})

With the above code, we are at least aligning our work with the browser’s event loop, i.e. firing right before style and layout are calculated.

However, even this is not really ideal. Imagine that a pointermove event fires three times for every frame. In that case, we will essentially do three times the necessary work on every frame:

Chrome Dev Tools screenshot showing an 82 millisecond frame where there are three pointermove events queued by requestAnimationFrame inside of the frame

This may be harmless if the code is fast enough, or if it’s only writing to the DOM. However, if it’s both writing to and reading from the DOM, then we will end up with the classic layout thrashing scenario,[2] and our rAF-based solution is actually no better than handling the input directly, because we recalculate the style and layout for every pointermove event.

Chrome Dev Tools screenshot of layout thrashing, showing two pointermove events with large Layout blocks and the text "Forced reflow is a likely performance bottleneck"

Note the style and layout recalculations in the purple blocks, which Chrome marks with a red triangle and a warning about “forced reflow.”

Throttling based on framerate

Again, let’s take a step back and figure out what we’re trying to do. If the user is dragging their finger across the screen, and pointermove fires 3 times for every frame, then we actually don’t care about the first and second events. We only care about the third one, because that’s the one we need to paint.

So let’s only run the final callback before each requestAnimationFrame. This pattern will work nicely:

function throttleRAF () {
  let queuedCallback
  return callback => {
    if (!queuedCallback) {
      requestAnimationFrame(() => {
        const cb = queuedCallback
        queuedCallback = null
        cb()
      })
    }
    queuedCallback = callback
  }
}

We could also use cancelAnimationFrame for this, but I prefer the above solution because it’s calling fewer DOM APIs. (It only calls requestAnimationFrame() once per frame.)

This is nice, but at this point we can still optimize it further. Recall that we want to avoid layout thrashing, which means we want to batch all of our reads and writes to avoid unnecessary recalculations.

In “Accurately measuring layout on the web”, I explore some patterns for queuing a timer to fire after style and layout are calculated. Since writing that post, a new web standard called requestPostAnimationFrame has been proposed, and it fits the bill nicely. There is also a good polyfill called afterframe.

To best align our DOM updates with the browser’s event loop, we want to follow these simple rules:

  1. DOM writes go in requestAnimationFrame().
  2. DOM reads go in requestPostAnimationFrame().

The reason this works is because we write to the DOM right before the browser will need to calculate style and layout (in rAF), and then we read from the DOM once the calculations have been made and the DOM is “clean” (in rPAF).

If we do this correctly, then we shouldn’t see any warnings in the Chrome Dev Tools about “forced reflow” (i.e. a forced style/layout outside of the browser’s normal event loop). Instead, all layout calculations should happen during the regular event loop cycle.

Chrome Dev Tools screenshot showing one pointermove per frame and large layout blocks with no "forced reflow" warning

In the Chrome Dev Tools, you can tell the difference between a forced layout (or “reflow”) and a normal one because of the red triangle (and warning) on the purple style/layout blocks. Note that above, there are no warnings.

To accomplish this, let’s make our throttler more generic, and create one that can handle requestPostAnimationFrame as well:

function throttle (timer) {
  let queuedCallback
  return callback => {
    if (!queuedCallback) {
      timer(() => {
        const cb = queuedCallback
        queuedCallback = null
        cb()
      })
    }
    queuedCallback = callback
  }
}

Then we can create multiple throttlers based on whether we’re doing DOM reads or writes:[3]

const throttledWrite = throttle(requestAnimationFrame)
const throttledRead = throttle(requestPostAnimationFrame)

element.addEventListener('pointermove', e => {
  throttledWrite(() => {
    doWrite(e)
  })
  throttledRead(() => {
    doRead(e)
  })
})

Effectively, we have implemented something like fastdom, but using only requestAnimationFrame and requestPostAnimationFrame!

Pointer event pitfalls

The last piece of the puzzle (at least for me, while implementing a UI like this), was to avoid the pointer events polyfill. I found that, even after implementing all the above performance improvements, my UI was still janky in Firefox for Android.

After some digging with WebIDE, I found that Firefox for Android currently does not support Pointer Events, and instead only supports Touch Events. (This is similar to the current version of iOS Safari.) After profiling, I found that the polyfill itself was taking up a lot of my frame budget.

Screenshot of Firefox WebIDE showing a lot of time spent in pointer-events polyfill

So instead, I switched to handling pointer/mouse/touch events myself. Hopefully in the near future this won’t be necessary, and all browsers will support Pointer Events! We’re already close.

Here is the before-and-after of my UI, using Firefox on a Nexus 5:

 

When handling very performance-sensitive scenarios, like a UI that should respond to every pointermove event, it’s important to reduce the amount of work done on each frame. I’m sure that this polyfill is useful in other situations, but in my case, it was just adding too much overhead.

One other optimization I made was to delay updates to the store (which trigger some extra JavaScript computations) until the user’s drag had completed, instead of on every drag event. The end result is that, even on a resource-constrained device like the Nexus 5, the UI can actually keep up with the user’s finger!

Conclusion

I hope this blog post was helpful for anyone handling scroll, touchmove, pointermove, or similar input events. Thinking in terms of how I’d like to align my work with the browser’s event loop (using requestAnimationFrame and requestPostAnimationFrame) was useful for me.

Note that I’m not saying to never use Lodash’s throttle or debounce. I use them all the time! Sometimes it makes sense to just let a timer fire every n milliseconds – e.g. when debouncing window resize events. In other cases, I like using requestIdleCallback – for instance, when updating a non-critical part of the UI based on user input, like a “number of characters remaining” counter when typing into a text box.

In general, though, I hope that once requestPostAnimationFrame makes its way into browsers, web developers will start to think more purposefully about how they do UI updates, leading to fewer instances of layout thrashing. fastdom was written in 2013, and yet its lessons still apply today. Hopefully when rPAF lands, it will be much easier to use this pattern and reduce the impact of layout thrashing on web performance.

Footnotes

1. In the Pointer Events Level 2 spec, it says that pointermove events “may be coalesced or aligned to animation frame callbacks based on UA decision.” So hypothetically, a browser could throttle pointermove to fire only once per rAF (and if you need precise x/y events, e.g. for a drawing app, you can use getCoalescedEvents()). It’s not clear to me, though, that any browser actually does this. Update: see comments below, some browsers do! In any case, throttling the events to rAF in JavaScript accomplishes the same thing, regardless of UA behavior.

2. Technically, the only DOM reads that matter in the case of layout thrashing are DOM APIs that force style/layout, e.g. getBoundingClientRect() and offsetLeft. If you’re just calling getAttribute() or classList.contains(), then you’re not going to trigger style/layout recalculations.

3. Note that if you have different parts of the code that are doing separate reads/writes, then each one will need its own throttler function. Otherwise one throttler could cancel the other one out. This can be a bit tricky to get right, although to be fair the same footgun exists with Lodash’s debounce/throttle.