JavaScript performance beyond bundle size

There’s an old story about a drunk trying to find his keys in the streetlight. Why? Well, because that’s where it’s the brightest. It’s a funny story, but also relatable, because as humans we all tend to take the path of least resistance.

I think we have the same problem in the web performance community. There’s a huge focus recently on JavaScript bundle size: how big are your dependencies? Could you use a smaller one? Could you lazy-load it? But I believe we focus on bundle size first and foremost because it’s easy to measure.

That’s not to say that bundle size isn’t important! Just like how you might have left your keys in the streetlight. And heck, you might as well check there first, since it’s the quickest place to look. But here are some other things that are harder to measure, but can be just as important:

  • Parse/compile time
  • Execution time
  • Power usage
  • Memory usage
  • Disk usage

A JavaScript dependency can affect all of these metrics. But they’re less discussed than bundle size, and I suspect it’s because they’re less straightforward to measure. In this post, I want to talk about how I approach bundle size, and how I approach the other metrics too.

Bundle size

When talking about the size of JavaScript code, you have to be precise. Some folks will say “my library is 10 kilobytes.” Is that minified? Gzipped? Tree-shaken? Did you use the highest Gzip setting (9)? What about Brotli compression?

This may sound like hair-splitting, but the distinction actually matters, especially between compressed and uncompressed size. The compressed size affects how fast it is to send bytes over the wire, whereas the uncompressed size affects how long it takes the browser to parse, compile, and execute the JavaScript. (These tend to correlate with code size, although it’s not a perfect predictor.)

The most important thing, though, is to be consistent. You don’t want to measure Library A using unminified, uncompressed size versus Library B using minified and compressed size (unless there’s a real difference in how you’re serving them).

Bundlephobia

For me, Bundlephobia is the Swiss Army knife of bundle size analysis. You can look up any dependency from npm and it will tell you both the minified size (what the browser parses and executes) as well as the minified and compressed size (what the browser downloads).

For instance, we can use this tool to see that react-dom weighs 121.1kB minified, but preact weighs 10.2kB. So we can confirm that Preact really is the honest goods – a React-compatible framework at a fraction of the size!

In this case, I don’t get hung up on exactly which minifier or exactly what Gzip compression level Bundlephobia is using, because at least it’s using the same system everywhere. So I know I’m comparing apples to apples.

Now that said, there are some caveats with Bundlephobia:

  1. It doesn’t tell you the tree-shaken cost. If you’re only importing one part of a module, the other parts may be tree-shaken out.
  2. It won’t tell you about subdirectory dependencies. So for instance, I know how expensive it is to import 'preact', but import 'preact/compat' could be literally anything – compat.js could be a huge file, and I’d have no way to know.
  3. If there are polyfills involved (e.g. your bundler injecting a polyfill for Node’s Buffer API, or for the JavaScript Object.assign() API), you won’t necessarily see it here.

In all the above cases, you really just have to run your bundler and check the output. Every bundler is different, and depending on the configuration or other factors, you might end up with a huge bundle or a tiny one. So next, let’s move on to the bundler-specific tools.

Webpack Bundle Analyzer

I love Webpack Bundle Analyzer. It offers a nice visualization of every chunk in your Webpack output, as well as which modules are inside of those chunks.

Screenshot of Webpack Bundle Analyer showing a list of modules and sizes on the left and a visual tree map of modules and sizes on the right, where the module is larger if it has a greater size, and modules-within-modules are also shown proportionally

In terms of the sizes it shows, the two most useful ones are “parsed” (the default) and “Gzipped”. “Parsed” essentially means “minified,” so these two measurements are roughly comparable with what Bundlephobia would tell us. But the difference here is that we’re actually running our bundler, so we know that the sizes are accurate for our particular application.

Rollup Plugin Analyzer

For Rollup, I would really love to have a graphical interface like Webpack Bundle Analyzer. But the next best thing I’ve found is Rollup Plugin Analyer, which will output your module sizes to the console while building.

Unfortunately, this tool doesn’t give us the minified or Gzipped size – just the size as seen by Rollup before such optimizations occur. It’s not perfect, but it’s great in a pinch.

Other bundle size tools

Other tools I’ve dabbled with and found useful:

I’m sure you can find other tools to add to this list!

Beyond the bundle

As I mentioned, though, I don’t think JavaScript bundle size is everything. It’s great as a first approximation, because it’s (comparatively) easy to measure, but there are plenty of other metrics that can impact page performance.

Runtime CPU cost

The first and most important one is the runtime cost. This can be broken into a few buckets:

  • Parsing
  • Compilation
  • Execution

These three phases are basically the end-to-end cost of calling require("some-dependency") or import "some-dependency". They may correlate with bundle size, but it’s not a one-to-one mapping.

For a trivial example, here is a (tiny!) JavaScript snippet that consumes a ton of CPU:

const start = Date.now()
while (Date.now() - start < 5000) {}

This snippet would get a great score on Bundlephobia, but unfortunately it will block the main thread for 5 seconds. This is a somewhat absurd example, but in the real world, you can find small libraries that nonetheless hammer the main thread. Traversing through all elements in the DOM, iterating through a large array in LocalStorage, calculating digits of pi… unless you’ve hand-inspected all your dependencies, it’s hard to know what they’re doing in there.

Parsing and compilation are both really hard to measure. It’s easy to fool yourself, because browsers have lots of optimizations around bytecode caching. For instance, browsers might not run the parse/compile step on second page load, or third page load (!), or when the JavaScript is cached in a Service Worker. So you might think a module is cheap to parse/compile, when really the browser has just cached it in advance.

Screenshot from Chrome DevTools showing main thread with Compilation followed by execution of some JavaScript anonymous call stacks

Compilation and execution in Chrome DevTools. Note that Chrome does some parsing and compilation off-main-thread.

The only way to be 100% safe is to completely clear the browser cache and measure first page load. I don’t like to mess around, so typically I will do this in a private/guest browsing window, or in a completely separate browser. You’ll also want to make sure that any browser extensions are disabled (private mode typically does this), since those extensions can impact page load time. You don’t want to get halfway into analyzing a Chrome trace and realize that you’re measuring your password manager!

Another thing I usually do is set Chrome’s CPU throttling to 4x or 6x. I think of 4x as “similar enough to a mobile device,” and 6x as “a super-duper slowed-down machine that makes the traces much easier to read, because everything is bigger.” Use whichever one you want; either will be more representative of real users than your (probably) high-end developer machine.

If I’m concerned about network speed, this is the point where I would turn on network throttling as well. “Fast 3G” is usually a good one that hits the sweet spot between “more like the real world” and “not so slow that I start yelling at my computer.”

So putting it all together, my steps for getting an accurate trace are typically:

  1. Open a private/guest browsing window.
  2. Navigate to about:blank if necessary (you don’t want to measure the unload event for your browser home page).
  3. Open the DevTools in Chrome.
  4. Go to the Performance tab.
  5. In the settings, turn on CPU throttling and/or network throttling.
  6. Click the Record button.
  7. Type the URL and press Enter.
  8. Stop recording when the page has loaded.

Screenshot of Chrome DevTools showing a page on about:blank, the CPU Throttling set to 6x, Network Throttling set to Fast 3G, and in a guest browsing window with no extensions

Now you have a performance trace (also known as a “timeline” or “profile”), which will show you the parse/compile/execution times for the JavaScript code in your initial page load. Unfortunately this part can end up being pretty manual, but there are some tricks to make it easier.

Most importantly, use the User Timing API (aka performance marks and measures) to mark parts of your web application with names that are meaningful to you. Focus on parts that you worry will be expensive, such as the initial render of your root application, a blocking XHR call, or bootstrapping your state object.

You can strip out performance.mark/performance.measure calls in production if you’re worried about the (small) overhead of these APIs. I like to turn it on or off based on query string parameters, so that I can easily turn on user timings in production if I want to analyze the production build. Terser’s pure_funcs option can also be used to remove performance.mark and performance.measure calls when you minify. (Heck, you can remove console.logs here too. It’s very handy.)

Another useful tool is mark-loader, which is a Webpack plugin that automatically wraps your modules in mark/measure calls so that you can see each dependency’s runtime cost. Why try to puzzle over a JavaScript call stack, when the tool can tell you exactly which dependencies are consuming exactly how much time?

Screenshot of Chrome DevTools showing User Timing section with bars marked for Three, Moment, and React. The JavaScript callstacks underneath mostly say "anonymous"

Loading Three.js, Moment, and React in production mode. Without the User Timings, would you be able to figure out where the time is being spent?

One thing to be aware of when measuring runtime performance is that the costs can vary between minified and unminified code. Unused functions may be stripped out, code will be smaller and more optimized, and libraries may define process.env.NODE_ENV === 'development' blocks that don’t run in production mode.

My general strategy for dealing with this situation is to treat the minified, production build as the source of truth, and to use marks and measures to make it comprehensible. As mentioned, though, performance.mark and performance.measure have their own small overhead, so you may want to toggle them with query string parameters.

Power usage

You don’t have to be an environmentalist to think that minimizing power use is important. We live in a world where people are increasingly browsing the web on devices that aren’t plugged into a power outlet, and the last thing they want is to run out of juice because of a misbehaving website.

I tend to think of power usage as a subset of CPU usage. There are some exceptions to this, like waking up the radio for a network connection, but most of the time, if a website is consuming excessive power, it’s because it’s consuming excessive CPU on the main thread.

So everything I’ve said above about improving JavaScript parse/compile/execute time will also reduce power consumption. But for long-lived web applications especially, the most insidious form of power drain comes after first page load. This might manifest as a user suddenly noticing that their laptop fan is whirring or their phone is growing hot, even though they’re just looking at an (apparently) idle webpage.

Once again, the tool of choice in these situations is the Chrome DevTools Performance tab, using essentially the same steps described above. What you’ll want to look for, though, is repeated CPU usage, usually due to timers or animations. For instance, a poorly-coded custom scrollbar, an IntersectionObserver polyfill, or an animated loading spinner may decide that they need to run code in every requestAnimationFrame or in a setInterval loop.

Screenshot of Chrome DevTools showing little peaks of yellow JavaScript usage periodically in the timeline

A poorly-behaved JavaScript widget. Notice the little peaks of JavaScript usage, showing constant CPU usage even while the page is idle.

Note that this kind of power drain can also occur due to unoptimized CSS animations – no JavaScript required! (In that case, it would be purple peaks rather than yellow peaks in the Chrome UI.) For long-running CSS animations, be sure to always prefer GPU-accelerated CSS properties.

Another tool you can use is Chrome’s Performance Monitor tab, which is actually different from the Performance tab. I see this as a sort of heartbeat monitor of how your website is doing perf-wise, without the hassle of manually starting and stopping a trace. If you see constant CPU usage here on an otherwise inert webpage, then you probably have a power usage problem.

Screenshot of Chrome Performance Monitor showing steady 8.4% cpu usage on a chart, along with a chart of memory usage in a sawtooth pattern, going up and down

The same poorly-behaved JavaScript widget in Performance Monitor. Note the constant low hum of CPU usage, as well as the sawtooth pattern in the memory usage, indicating memory constantly being allocated and de-allocated.

Also: hat tip to the WebKit folks, who added an explicit Energy Impact panel to the Safari Web Inspector. Another good tool to check out!

Memory usage

Memory usage is something that used to be much harder to analyze, but the tooling has improved a lot recently.

I already wrote a post about memory leaks last year, but it’s important to remember that memory usage and memory leaks are two separate problems. A website can have high memory usage without explicitly leaking memory. Whereas another website could start small, but eventually balloon to a huge size due to runaway leaks.

You can read the above blog post for how to analyze memory leaks. But in terms of memory usage, we have a new browser API that helps quite a bit with measuring it: performance.measureUserAgentSpecificMemory (formerly performance.measureMemory, which sadly was much less of a mouthful). There are several advantages of this API:

  1. It returns a promise that automatically resolves after garbage collection. (No more need for weird hacks to force GC!)
  2. It measures more than just JavaScript VM size – it also includes DOM memory as well as memory in web workers and iframes.
  3. In the case of cross-origin iframes, which are process-isolated due to Site Isolation, it will break down the attribution. So you can know exactly how memory-hungry your ads and embeds are!

Here is a sample output from the API:

{
  "breakdown": [
    {
      "attribution": ["https://pinafore.social/"],
      "bytes": 755360,
      "types": ["Window", "JS"]
    },
    {
      "attribution": [],
      "bytes": 804322,
      "types": ["Window", "JS", "Shared"]
    }
  ],
  "bytes": 1559682
}

In this case, bytes is the banner metric you’ll want to use for “how much memory am I using?” The breakdown is optional, and the spec explicitly notes that browsers can decide not to include it.

That said, it can still be finicky to use this API. First off, it’s only available in Chrome 89+. (In slightly older releases, you can set the “enable experimental web platform features” flag and use the old performance.measureMemory API.) More problematic, though, is that due to the potential for abuse, this API has been limited to cross-origin isolated contexts. This effectively means that you have to set some special headers, and if you rely on any cross-origin resources (external CSS, JavaScript, images, etc.), they’ll need to set some special headers too.

If that sounds like too much trouble, though, and if you only plan to use this API for automated testing, then you can run Chrome with the --disable-web-security flag. (At your own risk, of course!) Note, though, that measuring memory currently doesn’t work in headless mode.

Of course, this API also doesn’t give you a great level of granularity. You won’t be able to figure out, for instance, that React takes up X number of bytes, and Lodash takes up Y bytes, etc. A/B testing may be the only effective way to figure that kind of thing out. But this is still much better than the older tooling we had for measuring memory (which is so flawed that it’s really not even worth describing).

Disk usage

Limiting disk usage is most important in web application scenarios, where it’s possible to reach browser quota limits depending on the amount of available storage on the device. Excessive storage usage can come in many forms, such as stuffing too many large images into the ServiceWorker cache, but JavaScript can add up too.

You might think that the disk usage of a JavaScript module is a direct correlate of its bundle size (i.e. the cost of caching it), but there are some cases were this isn’t true. For instance, with my own emoji-picker-element, I make heavy use of IndexedDB to store the emoji data. This means I have to be cognizant of database-related disk usage, such as storing unnecessary data or creating excessive indexes.

Screenshot of Chrome DevTools Application tab under "Clear Storage" which a pie chart showing megabytes taken up in Cache Storage as well as IndexedDB, and a button saying "Clear Storage"

The Chrome DevTools has an “Application” tab which shows the total storage usage for a website. This is pretty good as a first approximation, but I’ve found that this screen can be a little bit inconsistent, and also the data has to be gathered manually. Plus, I’m interested in more than just Chrome, since IndexedDB has vastly different implementations across browsers, so the storage size could vary wildly.

The solution I landed on is a small script that launches Playwright, which is a Puppeteer-like tool that has the advantage of being able to launch more browsers than just Chrome. Another neat feature is that it can launch browsers with a fresh storage area, so you can launch a browser, write storage to /tmp, and then measure the IndexedDB usage for each browser.

To give you an example, here is what I get for the current version of emoji-picker-element:

Browser IndexedDB directory size
Chromium 2.13 MB
Firefox 1.37 MB
WebKit 2.17 MB

Of course, you would have to adapt this script if you wanted to measure the storage size of the ServiceWorker cache, LocalStorage, etc.

Another option, which might work better in a production environment, would be the StorageManager.estimate() API. However, this is designed more for figuring out if you’re approaching quota limits rather than performance analysis, so I’m not sure how accurate it would be as a disk usage metric. As MDN notes: “The returned values are not exact; between compression, deduplication, and obfuscation for security reasons, they will be imprecise.”

Conclusion

Performance is a multi-faceted thing. It would be great if we could reduce it down to a single metric such as bundle size, but if you really want to cover all the bases, there are a lot of different angles to consider.

Sometimes this can feel overwhelming, which is why I think initiatives like the Core Web Vitals, or a general focus on bundle size, aren’t such a bad thing. If you tell people they need to optimize a dozen different metrics, they may just decide not to optimize any of them.

That said, for JavaScript dependencies in particular, I would love if it were easier to see all of these metrics at a glance. Imagine if Bundlephobia had a “Nutrition Facts”-type view, with bundle size as the headline metric (sort of like calories!), and all the other metrics listed below. It wouldn’t have to be precise: the numbers might depend on the browser, the size of the DOM, how the API is used, etc. But you could imagine some basic stats around initial CPU execution time, memory usage, and disk usage that wouldn’t be impossible to measure in an automated way.

If such a thing existed, it would be a lot easier to make informed decisions about which JavaScript dependencies to use, whether to lazy-load them, etc. But in the meantime, there are lots of different ways of gathering this data, and I hope this blog post has at least encouraged you to look a little bit beyond the streetlight.

Thanks to Thomas Steiner and Jake Archibald for feedback on a draft of this blog post.

Managing focus in the shadow DOM

One of the trickiest things about the shadow DOM is that it subverts web developers’ expectations about how the DOM works. In the normal rules of the game, document.querySelectorAll('*') grabs all the elements in the DOM. With the shadow DOM, though, it doesn’t work that way: shadow elements are encapsulated.

Other classic DOM APIs, such as element.children and element.parentElement, are similarly unable to traverse shadow boundaries. Instead, you have to use more esoteric APIs like element.shadowRoot and getRootNode(), which didn’t exist before shadow DOM came onto the scene.

In practice, this means that a lot of JavaScript libraries designed for the pre-shadow DOM era might not work well if you’re using web components. This comes up more often than you might think.

The problem

For example, sometimes you want to iterate through all the tabbable elements on a page. Maybe you’re doing this because you want to build a focus trap for a modal dialog, or because you’re implementing arrow key navigation for KaiOS devices.

Now, without doing anything, elements inside of the shadow DOM are already focusable or tabbable just like any other element on the page. For instance, with my own emoji-picker-element, you can tab through its <input>, <button>s, etc.:

 

When implementing a focus trap or arrow key navigation, we want to preserve this existing behavior. So the first challenge is to emulate whatever the browser normally does when you press Tab or Shift+Tab. In this case, shadow DOM makes things a bit more complicated because you can’t just use a straightforward querySelectorAll() (or other pre-shadow DOM iteration techniques) to find all the tabbable elements.

Pedantic note: an element can be focusable but not tabbable. For instance, when using tabindex="-1", an element can be focused when clicking, but not when tabbing through the page.

While researching this, I found that a lot of off-the-shelf JavaScript libraries for focus management don’t handle the shadow DOM properly. For example, focusable provides a query selector string that you’re intended to use like so:

import focusable from 'focusable'

document.querySelectorAll(focusable)

Unfortunately, this can’t reach inside the shadow DOM, so it won’t work for something like emoji-picker-element. Bummer.

To be fair to focusable, though, many other libraries in the same category (focus traps, “get all tabbable elements,” accessible dialogs, etc.) have the same problem. So in this post, I’d like to explain what these libraries would need to do to support shadow DOM.

The solution

I’ve written a couple of JavaScript packages that deal with shadow DOM: kagekiri, which implements querySelectorAll() in a way that can traverse shadow boundaries, and arrow-key-navigation, which makes the left and right keys change focus.

To understand how these libraries work, let’s first understand the problem we’re trying to solve. In a non-shadow DOM context, what does this do?

document.querySelectorAll('*')

If you answered “grab all the elements in the DOM,” you’re absolutely right. But more importantly: what order are the elements returned in? It turns out that they’re returned in a depth-first tree traversal order, which is crucial because this is the same order as when the user presses Tab or Shift+Tab to change focus. (Let’s ignore positive tabindex values for the moment, which are an anti-pattern anyway.)

In the case of shadow DOM, we want to maintain this depth-first order, while also piercing into the shadow DOM for any shadow roots we encounter. Essentially, we want to pretend that the shadow DOM doesn’t exist.

There are a few ways you can do this. In kagekiri, I implemented a depth-first search myself, whereas in arrow-key-navigation, I used a TreeWalker, which is a somewhat obscure API that traverses elements in depth-first order. Either way, the main insight is that you need a way to enumerate a node’s “shadow children” as well as its actual children (which can be mixed together in the case of slotted elements). You also need to be able to run the reverse logic: finding the “light” parent of a shadow tree. And of course, this has to be recursive, since shadow roots can be nested within other shadow roots.

Rather than bore you with the details, suffice it to say that you need roughly a dozen lines of code, both for enumerating an element’s children and finding an element’s parent. In the non-shadow DOM world, these would be equivalent to a simple element.children and element.parentElement respectively.

Why the browser should handle this

Here’s the thing: I don’t particularly want to explain every line of code required for this exercise. I just want to impress upon you that this is a lot of heavy lifting for something that should probably be exposed as a web API. It feels silly that the browser knows perfectly well which element it would focus if I typed Tab or Shift+Tab, but as a web developer I have to reverse-engineer this behavior.

You might say that I’m missing the whole point of shadow DOM: after all, encapsulation is one of its major selling points. But I’d counter that a lot of folks are using shadow DOM because it’s the only way to get native CSS encapsulation (similar to “scoped” CSS in frameworks like Vue and Svelte), not necessarily DOM API encapsulation. So the fact that it breaks querySelectorAll() is a downside rather than an upside.

Here’s a sketch of my dream API:

element.getNextTabbableElement()
element.getPreviousTabbableElement()

Perhaps, like getRootNode(), these APIs could also offer an option for whether or not you want to pierce the shadow boundary. In any case, an API like this would obviate the need for the hacks described in this post.

I’d argue that browsers should provide such an API not only because of shadow DOM, but also because of built-in elements like <video> and <audio>. These behave like closed-shadow roots, in that they contain tabbable elements (i.e. the pause/play/track controls), but you can’t reach inside to manipulate them.

Screenshot of GNOME Web (WebKit) browser on an MDN video element demo page showing the dev tools open with a closed use agent shadow content for the controls of the video

WebKit’s developer tools helpfully shows the video controls as “shadow content (user agent).” You can look, but you can’t touch!

As far as I know, there’s no way to implement a WAI-ARIA compliant modal dialog with a standard <video controls> or <audio controls> inside. Instead, you would have to build your own audio/video player from scratch.

Brief aside: dialog element

There is the native <dialog> element now implemented in Chrome, and it does come with a built-in focus trap if you use showModal(). And this focus trap actually handles shadow DOM correctly, including closed shadow roots like <video controls>!

Unfortunately, though, it doesn’t quite follow the WAI-ARIA guidelines. The problems are that 1) closing the dialog doesn’t return focus to the previously focused element in the document, and 2) the focus trap doesn’t “cycle” through tabbable elements in the modal – instead, focus escapes to the browser chrome itself.

The first issue is irksome but not impossible to solve: you just have to listen for dialog open/close events and keep track of document.activeElement. It’s even possible to patch the correct behavior onto the native <dialog> element. (Shadow DOM, of course, makes this more complicated because activeElement can be nested inside shadow roots. I.e., you have to keep drilling into document.activeElement.shadowRoot.activeElement, etc.).

As for the second issue, it might not be considered a dealbreaker – at least the focus is trapped, even if it’s not completely compliant with WAI-ARIA. But it’s still disappointing that we can’t just use the <dialog> element as-is and get a fully accessible modal dialog, per the standard definition of “accessible.”

Update: After publishing this post, Chris Coyier clued me in to the inert attribute. Although it’s not shipped in any browser yet, I did write a demo of building a modal dialog with this API. After testing in Chrome and Firefox with the right flags enabled, though, it looks like the behavior is similar to <dialog> – focus is correctly trapped, but escapes to the browser chrome itself.

Second update: After an informal poll of users of assistive technologies, the consensus seems to be that having focus escape to the browser chrome is not ideal, but not a show-stopper as long as you can Shift+Tab to get back into the dialog. So it looks like when inert or <dialog> are more widely available in browsers, that will be the only way to deal with <video controls> and <audio controls> in a focus trap.

Last update (I promise!): Native <dialog> also seems to be the only way to have the Esc key dismiss the modal while focus is inside the <video>/<audio> controls.

Conclusion

Handling focus inside of the shadow DOM is not easy. Managing focus in the DOM has never been particularly easy (see the source code for any accessible dialog component for an example), and shadow DOM just makes things that much trickier by complicating a basic routine like DOM traversal.

Normally, DOM traversal is the kind of straightforward exercise you’d expect to see in a web dev job interview. But once you throw shadow DOM into the mix, I’d expect most working web developers to be unable to come up with the correct algorithm off the tops of their heads. (I know I can’t, and I’ve written it twice.)

As I’ve said in a previous post, though, I think it’s still early days for web components and shadow DOM. Blog posts like this are my attempt to sketch out the current set of problems and working solutions, and to try to point toward better solutions. Hopefully the ecosystem and browser APIs will eventually adapt to support shadow DOM and focus management more broadly.

More discussion about native <dialog> and Tab behavior can be found in this issue.

Thanks to Thomas Steiner and Sam Thorogood for feedback on a draft of this post.

Options for styling web components

When I released emoji-picker-element last year, it was my first time writing a general-purpose web component that could be dropped in to any project or framework. It was also my first time really kicking the tires on shadow DOM.

In the end, I think it was a natural fit. Web components are a great choice when you want something to be portable and self-contained, and an emoji picker fits the bill: you drop it onto a page, maybe with a button to launch it, and when the user clicks an emoji, you insert it into a text box somewhere.

Screenshot of an emoji picker with a search box, a row of emoji categories, and a grid of emoji

What wasn’t obvious to me, though, was how to allow users to style it. What if they wanted a different background color? What if they wanted the emoji to be bigger? What if they wanted a different font for the input field?

This led me to researching all the different ways that a standalone web component can expose a styling API. In this post, I’ll go over each strategy, as well as its strengths and weaknesses from my perspective.

Shadow DOM basics

(Feel free to skip this section if you already know how shadow DOM works.)

The main benefit of shadow DOM, especially for a standalone web component, is that all of your styling is encapsulated. Like JavaScript frameworks that automatically do scoped styles (such as Vue or Svelte), any styles in your web component won’t bleed out into the page, and vice versa. So you’re free to pop in your favorite resets, like so:

* {
  box-sizing: border-box;
}

button {
  cursor: pointer;
}

Another impact of shadow DOM is that DOM APIs cannot “pierce” the shadow tree. So for instance, document.querySelectorAll('button') won’t list any buttons inside of the emoji picker.

Brief interlude: open vs closed shadow DOM

There are two types of shadow DOM: open and closed. I decided to go with open, and all of the examples below assume an open shadow DOM.

In short, closed shadow DOM does not seem to be heavily used, and the drawbacks seem to outweight the benefits. Basically, “open” mode allows some limited JavaScript API access via element.shadowRoot (for instance, element.shadowRoot.querySelectorAll('button') will find the buttons), whereas “closed” mode blocks off all JavaScript access (element.shadowRoot is null). This may sound like a security win, but really there are plenty of workarounds even for closed mode.

So closed mode leaves you with an API surface that is harder to test, doesn’t give users an escape hatch (see below), and doesn’t offer any security benefits. Also, the framework I chose to use, Svelte, uses open mode by default and doesn’t have an option to change it. So it didn’t seem worth the hassle.

Styling strategies

With that brief tangent out of the way, let’s get back to styling. You may have noticed in the above discussion that shadow DOM seems pretty isolated – no styles go in, no styles get out. But there are actually some well-defined ways that styles can be tweaked, and these give you the opportunity to offer an ergonomic styling API to your users. Here are the main options:

  1. CSS variables (aka custom properties)
  2. Classes
  3. Shadow parts
  4. The “escape hatch” (aka inject whatever CSS you want)

One thing to note is that these strategies aren’t “either/or.” You can happily mix all of them in the same web component! It just depends on what makes the most sense for your project.

Option 1: CSS variables (aka custom properties)

For emoji-picker-element, I chose this option as my main approach, as it had the best browser support at the time and actually worked surprisingly well for a number of use cases.

The basic idea is that CSS variables can actually pierce the shadow DOM. No, really! You can have a variable defined at the :root, and then use that variable inside any shadow DOM you like. CSS variables at the :root are effectively global across the entire document (sort of like window in JavaScript).

Here is a CodePen to demonstrate:

Early on, I started to think of some useful CSS variables for my emoji picker:

  • --background to style the background color
  • --emoji-padding to style the padding around each emoji
  • --num-columns to choose the number of columns in the grid

These actually work! In fact, these are some of the variables I actually exposed in emoji-picker-element. Even --num-columns works, thanks to the magic of CSS grid.

However, it would be pretty awful if you had multiple web components on your page, and each of them had generic-sounding variables like --background that you were supposed to define at the :root. What if they conflicted?

Conflicting CSS variables

One strategy for dealing with conflicts is to prefix the variables. This is how Lightning Web Components, the framework we build at Salesforce, does it: everything is prefixed by --lwc-.

I think this makes sense for a design system, where multiple components on a page may want to reference the same variable. But for a standalone component like the emoji picker, I opted for a different strategy, which I picked up from Ionic Framework.

Take a look at Ionic’s button component and modal component. Both of them can be styled with the generic CSS property --background. But what if you want a different background for each? Not a problem!

Here is a simplified example of how Ionic is doing it. Below, I have a foo-component and a bar-component. They each have a different background color, but both are styled with the --background variable:

From the user’s perspective, the CSS is quite intuitive:

foo-component {
  --background: hotpink;
}

bar-component {
  --background: lightsalmon;
}

And if these variables are defined anywhere else, for instance at the :root, then they don’t affect the components at all!

:root {
  --background: black; /* does nothing */
}

Instead, the components revert back to the default background colors they each defined (in this case, lightgreen and lightblue).

How is this possible? Well, the trick is to declare the default value for the variable using the :host() pseudo-class from within the shadow DOM. For instance:

/* inside the shadow DOM */
:host {
  --background: lightblue; /* default value */
}

And then outside the shadow DOM, this can be overridden by targeting the foo-component, because it trumps :host in terms of CSS specificity:

/* outside the shadow DOM */
foo-component {
  --background: hotpink; /* overridden value */
}

It’s a bit hard to wrap your head around, but for anyone using your component, it’s very straightforward! It’s also unlikely to run into any conflicts, since you’d have to be targeting the custom element itself, not any of its ancestors or the :root.

Of course, this may not be enough reassurance for you. Users can still shoot themselves in the foot by doing something like this:

* {
  --background: black; /* this will override the default */
}

So if you’re worried, you can prefix the CSS variables as described above. I personally feel that the risk is pretty low, but it remains to be seen how the web component ecosystem will shake out.

When you don’t need CSS variables

As it turns out, CSS variables aren’t the only case where styles can leak into the shadow DOM: inheritable properties like font-family and color will also seep in.

For something like fonts, you can use this to your advantage by… doing nothing. Yup, just leave your spans and inputs unstyled, and they will match whatever font the surrounding page is using. For my own case, this was just one less thing I had to worry about.

If inheritable properties are a problem for you, though, you can always reset them.

Option 2: classes

Building on the previous section, we get to the next strategy for styling web components: classes. This is another one I used in emoji-picker-element: if you want to toggle dark mode or light mode, it’s as simple as adding a CSS class:

<emoji-picker class="dark"></emoji-picker>
<emoji-picker class="light"></emoji-picker>

(It will also default to the right one based on prefers-color-scheme, but I figured people might want to customize the default behavior.)

Once again, the trick here is to use the :host pseudo-class. In this case, we can pass another selector into the :host() pseudo-class itself:

:host(.dark) {
  background: black;
}
:host(.light) {
  background: white;
}

And here is a CodePen showing it in action:

Of course, you can also mix this approach with CSS variables: for instance, defining --background within the :host(.dark) block. Since you can put arbitrary CSS selectors inside of :host(), you can also use attributes instead of classes, or whatever other approach you’d like.

One potential downside of classes is that they can also run into conflicts – for instance, if the user has dark and light classes already defined elsewhere in their CSS. So you may want to avoid this technique, or use prefixes, if you’re concerned by the risk.

Option 3: shadow parts

CSS shadow parts are a newer spec, so the browser support is not as widespread as CSS variables (still pretty good, though, and improving daily).

The idea of shadow parts is that, as a web component author, you can define “parts” of your component that users can style. CSS Tricks has a good breakdown, but the basic gist is this:

/* outside the shadow DOM */
custom-component::part(foo) {
  background: lightgray;
}
<!-- inside the shadow DOM -->
<span part="foo">
  My background will be lightgray!
</span>

And here is a demo:

I think this strategy is fine, but I actually didn’t end up using it for emoji-picker-element (not yet, anyway). Here is my thought process.

Downsides of shadow parts

First off, it’s hard to decide which “parts” of a web component should be styleable. In the case of the emoji picker, should it be the emoji themselves? What about the skin tone picker, which also contains emoji? What are the right boundaries and names here? (This is not an unsolvable problem, but naming things is hard!)

To be fair, this same criticism could also be applied to CSS variables: naming variables is still hard! But as it turns out, I already use CSS variables to organize my code internally; it just jives with my own mental model. So exposing them publicly didn’t involve a lot of extra naming for me.

Second, by offering a ::part API, I actually lock myself in to certain design decisions, which isn’t necessarily the case with CSS variables. For instance, consider the --emoji-padding variable I use to control the padding around an emoji. The equivalent way of doing this with shadow parts might be:

emoji-picker::part(emoji) {
  padding: 2em;
}

But now, if I ever decide to define the padding some other way (e.g. through width or implicit positioning), or if I decide I actually want a wrapper div to handle the padding, I could potentially break anyone who is styling with the ::part API. Whereas with CSS variables, I can always redefine what --emoji-padding means using my own internal logic.

In fact, this is exactly what I do in emoji-picker-element! The --emoji-padding is not a padding at all, but rather part of a calc() statement that sets the width. I did this for performance reasons – it turned out to be faster (in Chrome anyway) to have fixed cell sizes in a CSS grid. But the user doesn’t have to know this; they can just use --emoji-padding without caring how I implemented it.

Finally, shadow parts expand the API surface in ways that make me a bit uncomfortable. For instance, the user could do something like:

emoji-picker::part(emoji) {
  margin: 1em;
  animation: 1s infinite some-animation;
  display: flex;
  position: relative;
}

With CSS shadow parts, there are just a lot of unexpected ways I could break somebody’s code by changing one of these properties. Whereas with CSS variables, I can explicitly define what I want users to style (such as the padding) and what I don’t (such as display). Of course, I could use semantic versioning to try to communicate breaking changes, but at this point any CSS change on any ::part is potentially breaking.

In (mild) defense of shadow parts

That said, I can definitely see where shadow parts have their place. If you look at my colleague Greg Whitworth’s Open UI definition of a <select> element, it has well-defined parts for everything that makes up a <select>: the button, the listbox, the options, etc. In fact, one of the main goals of the project is to standardize these parts across frameworks and specs. For this kind of situation, shadow parts are a natural fit.

Shadow parts also increase the expressiveness of the user’s CSS: the same thing that makes me squeamish about breaking changes is also what allows users to go hog-wild with styling any ::part however they choose. Personally, I like to restrict the API surface of the code that I ship, but there is an inherent tradeoff here between customizability and breakability, so I don’t believe there is one right answer.

Also, although I organized my own internal styling with CSS variables, it’s actually possible to do this with shadow parts as well. Inside the shadow DOM, you can use ::part(foo), and it works as expected. This can make your shadow parts less brittle, since they’re not just a public API but also used internally. Here is an example:

Once again: these strategies aren’t “either/or.” If you’d like to use a mix of variables, parts, and classes, then by all means, go ahead! You should use whatever feels most natural for the project you’re working on.

Option 4: the escape hatch

The final strategy I’ll mention is what I’ll call “the escape hatch.” This is basically a way for users to bolt whatever CSS they want onto your custom element, regardless of any other styling techniques you’re using. It looks like this:

const style = document.createElement('style')
style.innerHTML = 'h1 { font-family: "Comic Sans"; }'
element.shadowRoot.appendChild(style)

Because of the way open shadow DOM works, users aren’t prevented from appending <style> tags to the shadowRoot. So using this technique, they always have an “escape hatch” in case the styling API you expose doesn’t meet their needs.

This strategy is probably not the one you want to lean on as your primary styling interface, but it is kind of nice that it always exists. This means that users are never frustrated that they can’t style something – there’s always a workaround.

Of course, this technique is also fragile. If anything changes in your component’s DOM structure or CSS classes, then the user’s code may break. But since it’s obvious to the user that they’re using a loophole, I think this is acceptable. Appending your own <style> tag is clearly a “you broke it, you bought it” kind of situation.

Conclusion

There are lots of ways to expose a styling API on your web component. As the specs mature, we’ll probably have even more possibilities with constructable stylesheets or themes (the latter is apparently defunct, but who knows, maybe it will inspire another spec?). Like a lot of things on the web, there is more than one way to do it.

Web components have a bit of a bad rap, and I think the criticisms are mostly justified. But I also think it’s early days for this technology, and we’re still figuring out where web components work well and where they need to improve. No doubt the web components of 2021 will be even better than those of 2020, due to improved browser specs as well as the community’s understanding of how best to use this tool.

Thanks to Thomas Steiner for feedback on a draft of this blog post.

2020 book review

Like most people, 2020 was a weird year for me. I found myself retreating into the cloistered comfort of my living room, playing a lot more videogames and doing less reading.

Maybe I just needed the escapism, or maybe reading itself felt more stressful when all the headlines were so dire. Either way, my Switch reports that I spent hundreds of hours on immersive games like Zelda: Breath of the Wild, Stardew Valley, and Octopath Traveler.

Those are all great games! But since I’ve made a tradition of it, here is a (somewhat shorter) list of the books I read and enjoyed in 2020.

Quick links

Fiction

Nonfiction

Fiction

The Masters of Solitude by Marvin Kaye and Parke Godwin (1978)

I’ve mentioned it before, but post-apocalyptic fiction is one of my favorite genres. (I’m a natural pessimist, I guess!) This book is a bit of a hidden gem – it’s out of print, and if you check the reviews on Goodreads, you’ll see lots of comments saying that it’s a great book that almost nobody’s ever heard of. It’s also my top pick for 2020.

The book takes place hundreds of years in the future, focusing on a religious conflict between now-dominant Wiccans and minority Christians in present-day America. There are lots of fun, subtle references to places on the east coast: “Shando” I assume is Shenandoah, “Charzen” is maybe Charleston, “Mrika” is America (but not the whole continent, more of a “Holy Roman Empire” kind of thing). The book also keeps you at arm’s length by not revealing too many of its secrets early on.

The mythology and world-building are pretty rich here, and I found myself sucked in even without (yet) checking out the second book in the series. For a moment, you can even forget that it’s supposed to take place in the future, as there are elements of magic and fantasy mixed in with the sci-fi. Overall it’s strongly recommended if you’re a sci-fi/fantasy fan.

Parable of the Sower and Parable of the Talents by Octavia Butler (1993, 1998)

Apparently a lot of people read the Earthseed books back in high school or earlier, but these ones weren’t on my radar until this year. The first book in particular I deeply enjoyed: its vision of the future is disturbing, but frankly it’s one of the more believable sci-fi books I’ve read. It’s less about whiz-bang excursions to Alpha Centauri and more about the daily struggle of life on Earth in a warming climate.

It’s also impressive that this series was written back in the 90s, at a time when climate change wasn’t being taken as seriously as today. Nowadays it feels downright prescient – especially when you get to the so-unbelievable-I-had-to-check-the-publish-date depiction of a populist demagogue being elected on a familiar slogan. Overall I found the first book stronger than the second, but both are worth reading.

The Southern Reach Trilogy (Annihilation, Authority, Acceptance) by Jeff VanderMeer (2014)

An interesting and somewhat maddening set of sci-fi books. Unfortunately I feel that, like a lot of mysteries, the first book writes checks that the later ones can’t quite cash. You’ll probably get the most enjoyment out of it if you read the first book and ignore the rest entirely.

Just let all the mysteries from the first book sit in your mind as a delicious enigma. The second and third books don’t do a great job of clearing things up anyway.

Nonfiction

Working in Public: The Making and Maintenance of Open Source Software by Nadia Eghbal (2020)

I’m a bit biased toward this book, since I’m actually quoted in it a couple of times, but I absolutely loved this book. More than just a recapitulation of what I already know after working on open-source software for years, this book actually illuminated some things about modern open-source culture, and even some of my own motivations for writing OSS, that hadn’t really been clear to me before.

In particular, the way she draws a parallel between OSS developers and social media “content creators” was especially eye-opening for me. When you stop treating GitHub issues and pull requests as “contributions,” and start thinking of them more like comments on a YouTube video, the social dynamics start to make a lot more sense. Probably one of the best books on software I’ve ever read, up there with Don’t Make Me Think and The Design of Everyday Things in my personal pantheon.

It Doesn’t Have to Be Crazy at Work by Jason Fried and David Heinemeier Hansson (2018)

This book confirmed a lot of what I already believed, but it’s still nice to see it put to paper in a succinct way. Basecamp seems like a genuinely nice place to work, and a good example for other companies to follow.

If anything, it seems to me that software should be the opposite of an industry where people are encouraged to work 12-hour days, answer emails at all hours, and work on the weekend. The whole point of the job is to automate things so that the systems mostly run themselves. If you get into a purely reactive mode, then it can be a kind of death-spiral where you’re constantly inserting humans into the critical paths of the overall system, which makes everything more fragile and doesn’t play to the strengths of computing in general.

Twilight of Democracy: The Seductive Lure of Authoritarianism by Anne Applebaum (2020)

Over the past few years, I’ve been kind of obsessed with the question of why we’re experiencing a worldwide shift towards illiberalism. I think previous entries in my year-end book reviews do a better job of answering that question, but Applebaum’s book is a more intimate, insider’s story of what it feels like to see this shift play out even among one’s closest friends.

I get the feeling that, among conservatives in particular, the Cold War created an odd set of alliances and bedfellows (free-marketers, foreign-policy hawks, evangelicals), that’s starting to break down. This book is worth reading if you’re interested in those kinds of larger ideological shifts.

The New Class War: Saving Democracy from the Managerial Elite by Michael Lind (2017)

I picked up this book because it was recommended alongside Ezra Klein’s similar Why We’re Polarized. I didn’t actually finish Klein’s book (probably because I picked up enough bits and pieces from his excellent podcast), but I did read this, and I find it to be maybe a more complete picture of why politics feels so fractured nowadays.

The basic argument of the book is that policy decisions in western democracies are increasingly being made by a technocratic elite, and that a backlash is underway from the broader populace that doesn’t feel represented in the new system. In the broad strokes of history, that may be a pretty familiar picture, but the book tells an interesting story of how we got there. A good pairing would be Listen, Liberal by Thomas Frank, about how the Democratic party gradually lost its working-class base.

Programmers are bad at managing state

“Have you tried turning it off and back on again?” is one of the most familiar tropes associated with tech support. But as someone who is often asked by family members for help with misbehaving devices, I find it to be one of my most effective tools.

The solution is primitive, but the logic behind it is surprisingly profound: programmers are bad at managing state.

As a programmer, I understand this fact intuitively. When a program I’ve written has first booted up, it is in its most pristine, perfect state. I have lovingly crafted every variable and array to be exactly how I intend it to be. If I have automated tests, then this is the state that is most heavily tested.

It’s only after this initial bootup phase that everything starts to go to hell. The user clicked on something! The user typed in a field! How could they? How could they besmirch my beautiful, perfect program with their grubby little hands?

Before you know it, the user has not only clicked and typed – they’ve pressed the browser’s back and forward buttons, and the refresh button, and they’ve booted up the program after a few weeks of inactivity, so that some of the data has become stale, or maybe they installed an update and now they’ve got some locally cached files that were designed for an older version of the app… It’s here that users run into the countless errors, crashes, and freezes associated with software that finds itself in a state that the original programmers didn’t intend.

This is why “turn it off and on again” works so well. You put the software back in a state that the programmers predicted, and poof, everything works again. Sometimes, though, you have to take this logic even further.

Recently my wife was having problems launching Steam. She hadn’t run it in a few years, so she reinstalled it and clicked the icon, but it refused to load. I googled around and couldn’t find a solution, so, on a hunch, I deleted the ~/Library/Application Support/Steam folder, where Steam was apparently storing data, and relaunched it. Poof! It worked.

Another time, she was having trouble with a web app that was stuck on a loading bar, no matter how many times she refreshed. So I opened up the handy Chrome DevTools “Application” tab, clicked “Clear storage”, and what do you know? After refreshing and logging in again, everything worked.

Another time, her MacBook refused to print anything on our HP printer after a macOS update. After about an hour of searching various web forums, it turned out that I needed to run a program called HP Uninstaller to remove old HP software. Poof! Everything worked.

Having a dedicated program to clean up your own program’s files may seem a bit ridiculous, like an admission of defeat, but I actually think it’s kind of brilliant. As a programmer, it’s impossible to predict all the states that your program can end up in. You can use something like XState to help visualize it, but when you start multiplying the possible states by the cached configuration files by the different versions of the software by the different teams of people who built each version… it just becomes unfeasible to plan for every outcome. (Let alone write tests for every possible state!) So rather than having software that doesn’t work, maybe it’s better to just admit your own human frailty, and offer a way for users to start from scratch.

Mozilla’s “Refresh Firefox” feature is another brilliant application of this principle. If you haven’t used Firefox in a while, then it will pop up a little alert offering to blow everything away and start from scratch. This is probably a great way to avoid attrition to other browsers: I’m sure plenty of people switch from Browser A to Browser B because on day 1, Browser B feels so much faster – not because B is actually superior to A, but because B isn’t bogged down with a bunch of extensions, settings, history, leftover update files, etc.

If you’ve ever reinstalled Windows or Android from scratch, and observed how remarkably fast and lightweight everything suddenly feels, then you’ve probably also seen this phenomenon in action.

I don’t have a solution to this problem. Software is complicated, and unless you have the luxury of writing code that can be tested with formal proofs, it’s impossible to predict every possible state that your code can be in. So maybe the best bet is to just provide some kind of escape hatch, so that at least it’s easy to “turn it off and back on again.”

Building an accessible emoji picker

In my previous blog post, I introduced emoji-picker-element, a custom element that acts as an emoji picker. In the post, I said that accessibility “shouldn’t be an afterthought,” and made it clear that I took accessibility seriously.

But I didn’t explain how I actually made the component accessible! In this post I’ll try to make up for that oversight.

Reduce motion

If you have motion-based animations in your web app, one of the easiest things you can do to improve accessibility is to disable them with the prefers-reduced-motion CSS media query. (Note that this applies to transform animations but not opacity animations, as it’s my understanding that opacity changes don’t cause nausea for those with vestibular disorders.)

There were two animations in emoji-picker-element that I had to consider: 1) a small “indicator” under the currently-selected tab button, which moves as you click tab buttons, and 2) the skin tone dropdown, which slides down when you click it.

 

For the tab indicator, the fix was fairly simple:

.indicator {
  will-change: opacity, transform;
  transition: opacity 0.1s linear, transform 0.25s ease-in-out;
}

@media (prefers-reduced-motion: reduce) {
  .indicator {
    will-change: opacity;
    transition: opacity 0.1s linear;
  }
}

Note that there is also an opacity transition on this element (which plays when search results appear or disappear), so there is a bit of unfortunate repetition here. But the core idea is to remove the transform animation.

For the skin tone dropdown, the fix was a bit more complicated. The reason is that I have a JavaScript transitionend event listener on the element:

element.addEventListener('transitionend', listener);

If I were to remove the transform animation completely, then this listener would never fire. So I borrowed a technique from the cssremedy project, which looks like this:

@media (prefers-reduced-motion: reduce) {
  .skintone-list {
    transition-duration: 0.001s;
  }
}

Based on my testing in Safari, Firefox, and Chrome, this effectively removes the animation while ensuring that transitionend still fires. (There are other tricks mentioned in that thread, but I found that this solution was sufficient for my use case.)

With these fixes in place, the potentially-nauseating animations are removed for those who prefer things that way. The easiest way to test this is in the Chrome DevTools “Rendering” tab, under “Emulate CSS media feature prefers-reduced-motion”:

Screenshot of Chrome DevTools "prefers reduced motion" Rendering setting

Here it is with motion reduced:

 

As you can see, the elements no longer move around. Instead, they instantly pop in or out of place.

Screen reader and keyboard accessibility

When testing screen reader accessibility, I use four main tools:

  1. NVDA in Firefox on Windows (with SpeechViewer enabled so I can see the text)
  2. VoiceOver in Safari on macOS
  3. Chrome’s Accessibility panel in DevTools (Firefox also has nice accessibility tools! I use them occasionally for a second opinion.)
  4. The axe extension (also available in Lighthouse)

I like testing in actual screen readers, because they sometimes have bugs or differing behavior (just like browsers!). Testing in both NVDA and VoiceOver gives me more confidence that I didn’t mess anything up.

In the following sections, I’ll go over the basic semantic patterns I used in emoji-picker-element, and how those work for screen reader or keyboard users.

Tab buttons and tab panel

For the main emoji picker UI, I decided to use the tab panel pattern with manual activation. This means that each emoji category (“Smileys and emoticons,” “People and body”) is a tab button (role=tab), and the list of emoji underneath is a tab panel (role=tabpanel).

Screenshot showing the emoji categories as tabs and the main grid of emoji as the tab panel

The only thing I had to do to make this pattern work was to add and keydown listeners to move focus left and right between tabs. (Technically I should also add Home and End – I have that as a todo!)

Aside from that, clearly each tab button should be a <button> (so that Enter and Spacebar fire correctly), and it should have an aria-label for assistive text. I also went ahead and added titles that echo the content of aria-label, but I’m considering replacing those since they aren’t accessible to keyboard users. (I.e. title appears when you hover with a mouse, but not when you focus with the keyboard. Plus it sometimes adds extra spoken text in NVDA, which is less than ideal.)

Skin tone dropdown

The skin tone dropdown is modeled on the collapsible dropdown listbox pattern. In other words, it’s basically a fancy <select> element.

Annotated screenshot showing the collapsed skin tone button as a button, and the expanded list of skin tones as a listbox

The button that triggers the dropdown is just a regular <button>, whereas the listbox has role=listbox and its child <button>s have role=option. To implement this, I just used the DevTools to analyze the W3C demo linked above, then tested in both NVDA and VoiceOver to ensure that my implementation had the same behavior as the W3C example.

One pro tip: since the listbox disappears on the blur event, which would happen when you click inside the DevTools itself, you can use the DevTools to remove the blur event and make it easier to inspect the listbox in its “expanded” state.

Screenshot of Chrome DevTools on the W3C collapsible listbox example, showing an arrow pointing at the "remove" button in DevTools next to the "blur" listener in the Event Listeners section

Removing this blur listener may make debugging a bit easier.

Search input and search results

For the search input, I decided to do something a bit clever (which may or may not burn me later!).

By default, the emoji in the tabpanel are simple <button>s aligned in a CSS Grid. They’re given role=menuitem and placed inside of a role=menu container, modeled after the menu pattern.

However, when you start typing in the search input, the tabpanel emoji are instantly replaced with the search results emoji:

 

Visually, this is pretty straightforward, and it aligns with the behavior of most other emoji pickers. I also found it to be good for performance, because I can have one single Svelte #each expression, and let Svelte handle the list updates (as opposed to clearing and re-creating the entire list whenever something changes).

If I had done nothing else, this would have been okay for accessibility. The user can type some text, and then change focus to the search results to see what the results are. But that sounded kind of awkward, so I wanted to do one better.

So instead, I implemented the combobox with listbox popup pattern. When you start typing into the input (which is <input type=search> with role=combobox), the menu with menuitems immediately transforms into a listbox with options instead. Then, using aria-activedescendant and aria-controls, I link the listbox with the combobox, allowing the user to press or down to cycle through the search results. (I also used an aria-describedby to explain this behavior.)

 

In the above video, you can see NVDA reading out the expanded/collapsed state of the combobox, as well as the currently-selected emoji as I cycle through the list by pressing and . (My thanks to Assistiv Labs for providing a free account for OSS testing! This saved me a trip to go boot up my Windows machine.)

So here’s the upshot: from the perspective of a screen reader user, the search input works exactly like a standard combobox with a dropdown! They don’t have to know that the menu/menuitem elements are being replaced at all, or that they’re aligned in a grid.

Now, this isn’t a perfect pattern: for a sighted user, they might find it more intuitive to press and to move horizontally through the grid of emoji (and and to move vertically). However, I found it would be tricky to properly handle the / keys, as the search input itself allows you to move the cursor left and right when it has focus. Whereas the and are unambiguous in this situation, so they’re safe to use.

Plus, this is really a progressive enhancement – mouse or touch users don’t have to know that these keyboard shortcuts exist. So I’m happy with this pattern for now.

Testing

To test this project, I decided to use Jest with testing-library. One of my favorite things about testing-library is that it forces you to put accessibility front-and-center. By design, it makes it difficult to use simple CSS selectors, and encourages you to query by ARIA roles instead.

This made it a bit harder to debug, since I had to inspect the implicit accessibility tree of my component rather than the explicit DOM structure. But it helped keep me honest, and ensure that I was adding the proper ARIA roles during development.

If I had one criticism of this approach, I would say that I find it inherently harder to test using Jest and JSDom rather than a real browser. Rather than having access to the browser DevTools, I had to set debugger statements and use node --inspect-brk to walk through the code in the Chrome Node inspector.

It also wasn’t always clear when one of my ARIA role queries wasn’t working properly. Perhaps when the Accessibility Object Model gains wider adoption, it will become as easy to test the accessibility tree as it is to test CSS selectors.

Conclusion

To get accessibility right, it helps to consider it from the get-go. Like performance, it can be easy to paint yourself into a corner if you quickly build a solution based only on the visuals, without testing the semantics as well.

While building emoji-picker-element, I found that I often had to make tweaks to improve performance (e.g. using one big Svelte #each expression) or accessibility (e.g. changing the aria-hidden and tabindex attributes on elements that are visually hidden but not display: none). I also had to think hard about how to merge the menu component with the listbox to allow the search to work properly.

I don’t consider myself an accessibility expert, so I’m happy to hear from others who have ideas for improving accessibility. For instance, I don’t think the favorites bar is easily reachable to screen reader or keyboard users, and I’m still noodling on how I might improve that. Please feel free to open a GitHub issue if you have any ideas for what I can do better!

Introducing emoji-picker-element: a memory-efficient emoji picker for the web

Screenshot of emoji-picker-element, an emoji picker, in light and dark mode, with grids of emoji faces

Emoji pickers are ubiquitous. It seems that every social media and messaging app needs to have a little grid of cartoon faces you can click on.

There’s nothing inherently wrong with emoji (they’re fun! they’re popular! they make communication livelier!). But the way they’re currently used on the web is wasteful.

The emoji picker landscape

The main problem is that there are a lot of emoji: 1,814 as of the latest version, Emoji v.13.0. That doesn’t even include the skin tone variants (or combinations of skin tones), and each of those emoji has associated shortcodes, tags, ASCII emoticons, etc. And the way most web-based emoji pickers work, all of this data is stored in-memory.

A popular JavaScript library to manage emoji data, emojibase, currently offers two JSON files for English: the main one, which is 854kB, and the “compact” one, which is 543kB. Now imagine that, for every browser tab you have open, each website is potentially loading over a half a megabyte of data, just to show a little grid of emoji!

The median web page is now around 2MB, according to the Internet Archive. Hopefully any emoji picker data is lazy-loaded, but either way, half a megabyte is a big chunk in any reasonable perf budget.

You could say that these websites should ditch the custom picker and just ask people to use the built-in emoji picker for their OS. The problem is that a lot of people don’t know that these exist. (How many Mac users know about Cmd+Ctrl+Space? How many Windows users have memorized Win+.?) Some OSes don’t even have a native emoji picker (try Ubuntu or stock Android for an example). Even ignoring these problems, websites often need to tweak the emoji picker, such as adding their own custom emoji (e.g. Discord, Slack, Mastodon).

Screenshot of the built-in emoji pickers on macOS and Windows

The built-in emoji pickers on macOS and Windows. Ask a non-techie if they’ve ever even seen these.

Shouldn’t browser vendors offer a standard emoji picker element? Something like <input type="emoji">? I’ve actually proposed this in the past, but I’m not aware of any browser vendors that were interested in picking it up.

Also, standardizing an emoji picker would probably require coordination between the JavaScript (TC39) and web (W3C/WHATWG) standards, as you’d ideally want a JavaScript-based API for querying Intl-specific emoji data (e.g. to show autocompletions) in addition to the actual picker element. The degree of collaboration between the browser vendors and the standards bodies would have to be fairly involved, so it seems unlikely to happen soon.

Starting from scratch

When I first wrote Pinafore, I didn’t want to deal with this situation at all. I found the whole idea of custom emoji pickers to be absurd – just use the one built into the OS!

When I realized how unreliable the OS-provided emoji pickers were, though, and after considering the need for Mastodon custom emoji, I eventually decided to use emoji-mart, a React component that Mastodon also uses. For various reasons, though, I grew frustrated with emoji-mart (even as I contributed to it), and I mused about how I might build my own.

What would the ideal web-based emoji picker be like? I settled on a few requirements:

  1. Data should be stored in IndexedDB, not in-memory. The Unicode Consortium is never going to stop adding emoji, so at some point keeping all the emoji and their metadata in-memory is going to become unsustainable (or at least, unwieldy).
  2. It should be a custom element. Web components are a thing; using an emoji picker should be as simple as dropping <emoji-picker></emoji-picker> into your HTML.
  3. It should be lightweight. If every website is going to use their own emoji picker, then it should at least have a low JavaScript footprint.
  4. It should be accessible. Accessibility shouldn’t be an afterthought; the emoji picker should work well for screen reader users, keyboard users – everyone.

So against my better judgment, I embarked on the arduous task: building a full emoji picker, the way I thought it should be built. Today it’s available on npm, and you can try out a demo version here. I call it (somewhat presumptuously): emoji-picker-element.

Design

emoji-picker-element follows the vision I set out above. Usage can be as simple as:

<emoji-picker></emoji-picker>

<script type=module>
  import 'https://unpkg.com/emoji-picker-element'
</script>

Under the hood, emoji-picker-element will download the emojibase data, parse it, and store it in IndexedDB. (By default, it fetches from jsdelivr.) The second time the picker loads, it will just do a HEAD request and check the ETag to see if anything has changed.

This is a bonus of using IndexedDB: we can avoid downloading, parsing, and processing the emoji data a second time, because it’s already available on-disk! Following offline-first principles, emoji-picker-element also lazily updates in the background if the data has changed.

 

emoji-picker-element shows only native emoji (no spritesheets), in the categories defined by emojibase. If any emoji are not supported by the OS or browser, then those will be hidden.

Like most emoji pickers, you can also do a text search, set a skin tone, and see a list of frequently-used emoji. I also added support for custom emoji with custom categories, which is an important feature for Mastodon admins who want to add some pizzazz and personality to their instance.

To keep the bundle size small and the runtime performance fast, I’m using Svelte 3 to implement the picker. The tree-shaken Svelte “runtime” is bundled with the component, so consumers don’t have to care what framework I’m using – it “just works.”

Also, I’m using Shadow DOM, which keeps styles nicely encapsulated, while also offering a neat styling API using CSS variables:

emoji-picker {
  --num-columns: 6;
  --emoji-size: 14px;
  --border-color: black;
}

(If you’re wondering why this works, it’s because CSS variables pierce the shadow DOM.)

Evaluating

So how well did I do? The most important consideration in my mind was performance, so here’s how emoji-picker-element stacks up.

Memory usage

This was the most interesting one to me. Was my hypothesis correct, that storing the emoji data in IndexedDB would reduce the memory footprint?

Using the new Chrome measureMemory() API, I compared four pages:

  • A “blank” HTML page
  • The same page with emoji-picker-element loaded
  • The same page with the emojibase “compact” JSON object loaded
  • The same page with the emojibase full JSON object loaded

Here are the results:

Scenario Bytes Relative to blank page
blank 635 kB 0 B
picker 1.38 MB 744 kB
compact 1.43 MB 792 kB
full 1.77 MB 1.13 MB

As you can see, emoji-picker-element takes up less memory than merely loading the “compact” JSON itself! That’s all the HTML, CSS, and JavaScript needed to render the component, and it’s already smaller than the emoji data alone.

Note that the size of a JSON file (in this case, 854kB for the full data and 543kB for the compact data) is not the same as the memory usage when it’s parsed into a JavaScript object. That’s why it’s important to actually parse the JSON to get the true memory usage.

Disk usage

Given that emoji-picker-element moves data out of memory and into IndexedDB, you might also wonder how much space it takes up on disk. There are three major IndexedDB implementations in browsers (Chrome/Chromium, Firefox/Gecko, and Safari/WebKit), so I wrote a script to calculate the IndexedDB disk usage. Here it is:

Browser Disk usage
Chrome 896kB
Firefox 1.26MB
GNOME Web (WebKit) 1.77MB

(Note that these values may be a bit inconsistent – it seems that all browsers do some amount of compacting over time. These are the lowest I saw.)

These numbers are not small, but they make sense given the size of the emoji data, and the fact that the database contains an index for text search. I find it to be a reasonable tradeoff given the memory savings.

Runtime performance

To calculate the load time, I measure from the beginning of the custom element’s constructor to the requestPostAnimationFrame (polyfilled) after rendering the first set of emoji and favorites bar.

On a Nexus 5 (released in 2013) running Chrome on Android 6, median of 5 iterations, I got the following numbers:

Type Duration
First load (uncached) 2982ms
Subsequent load (cached) 259ms

(This was running on a local network – I’m focusing on CPU performance more than network performance.)

Looking at a Chrome trace of first load shows that the Nexus 5 spends about 200ms rendering the initial view (i.e. the “loading” state), 200ms downloading the data on the local intranet, 400ms processing it, 650ms inserting it into IndexedDB, roughly another second of IndexedDB transaction costs, then 250ms rendering the final result:

Annotated screenshot of a Chrome timeline trace showing the durations described above

In the above screenshot, the total render took 2.76s.

On second load, most of those IndexedDB costs are gone, and we merely fetch from IDB and render:

Another Chrome timeline screenshot, this one showing a much shorter duration (~430ms)

In this case, the total render took 434ms.

Interestingly, because it’s IndexedDB, we can actually move the initial data loading to a web worker, which frees up the main thread. In that case, the initial load looks like this:

Screenshot of Chrome timeline showing a similar trace to the "first load", but now the IndexedDB costs are on the worker thread and the main thread is largely idle

The total time in this case was about 2.5s. Clearly, we’ve just shifted the costs from one thread to another (we haven’t made anything faster), but it’s pretty neat that we can avoid jank this way! In the end, only about 300ms of work happens on the main thread.

This points to a nice application-level optimization: assuming the emoji picker is lazy-loaded, the app can proactively spin up a worker thread to populate the database. This would ensure that the emoji data is already ready when the user first opens the picker.

Bundle size

Currently the bundle size of emoji-picker-element is 39.66kB minified, and 12.3kB minified+Brotli. Those are both a bit too large for my taste, but they do compare favorably with other emoji pickers on npm: the smallest one I could find with a similar feature set is interweave-emoji-picker, which is 40.7kB minified.

Of course, the other emoji pickers don’t also include the entire runtime for their framework. The reason I’m bundling Svelte with the component is 1) Svelte’s runtime is quite small already, 2) it’s also tree-shaken to only include what you need, and 3) I’m assuming that most folks out there aren’t using Svelte, since it’s still an up-and-coming framework.

That said, I do have a separate build for those who are already using Svelte 3, and in that case the bundle size is about 11kB smaller (minified). Either way, emoji-picker-element should ideally be lazy-loaded!

Pitfalls and future work

Not everything went swimmingly well with this project, so I have some ideas for where it can improve in the future.

The woes of native emoji

First off, using native emoji is easier said than done. No OS has Emoji v13 yet, and unless you’re on an Apple device, it’s unlikely that you even have Emoji v12. (Only 19% of Android devices are running Android 10+, which has Emoji v12, whereas 75% of iOS devices are running iOS 13.2+, which has Emoji v12.1.) Or your OS may have incomplete emoji (Microsoft’s choice to use two-letter codes for flags is… odd at best). If you’re using Chrome on Ubuntu, you have no native color emoji at all, unless you install a separate package.

GitLab has a great post detailing all the headaches of supporting native emoji. It involves rendering emoji to canvas and checking the rendered color, as well as handling edge cases such as ligatures that may appear as “double emoji” when rendered improperly. (E.g. “person with red hair” may appear as a person with a floating wig of red hair next to them.) In short, it’s a mess.

"Person" emoji on the left and "red hair" emoji on the right

“Person with red hair” when rendered incorrectly.

However, my goal with this project is to skate to where the puck is going, not where it is. My hope is that browsers and OSes will get their acts together, and start to broadly support native color emoji without hacks or workarounds. (If they don’t, more websites than just emoji-picker-element will appear broken! So they have every incentive to do this.)

Plus, I don’t have to jump through as many hoops as GitLab, because I’m not concerned about supporting every emoji under the sun. (If I were, I would just use a spritesheet!) So my current strategy is a sniff test – check a representative emoji from each emoji version, and hide the entire set if one emoji is unsupported. This isn’t perfect, but I believe it’s good enough for most use cases. And it should improve over time as native emoji support improves.

Shadow DOM

This could be a blog post in and of itself, but shadow DOM is a double-edged sword. On the one hand, I love how easy it is to encapsulate CSS, and to offer a styling API with CSS variables. On the other hand, I found that any libraries that manage focus – e.g. focus-visible, a11y-dialog, or my own arrow-key-navigation – had to be tweaked to account for shadow DOM. Otherwise, the focus behavior didn’t work properly.

I won’t go into the nitty-gritty details (see my Mastodon thread for that). Long story short: none of these libraries worked with shadow DOM, except for focus-visible, which required some manual work. So for a11y-dialog I forked it, and for arrow-key-navigation I had to implement shadow DOM support. Kind of annoying, when all I really wanted was style encapsulation!

But again: I’m trying to skate to where the puck is going. My hope is that eventually browsers will iron out problems with the shadow DOM API, and make it easier for libraries to implement focus traps, custom focus hotkeys, etc.

Conclusion

Building emoji-picker-element was not easy. Properly rendering native emoji required arcane knowledge of emoji support on various platforms and browsers, as well as how to detect it. Add in skin tone variants (which may or may not be supported, even if the base emoji is supported), and it can get very complicated very fast.

I’m deeply indebted to the work of those who came before me: emojibase for providing the emoji data, Emojipedia for being an inexhaustible resource on the subject, and if-emoji and GitLab for demystifying how to detect emoji support.

Of course, I’m also grateful for all the existing emoji pickers that I drew inspiration from – it’s nice not to have to reinvent the wheel! Most emoji pickers have converged on a strikingly similar design, so I didn’t feel the need to be particularly creative with my own design. (Although I do think some of my CSS animations are nice.)

I still believe that the emoji picker is probably something browsers should handle themselves, to avoid the inevitable bloat of every web page loading their own picker. But for the time being, there should at least be a web-based emoji picker that’s as lightweight and performant as possible. emoji-picker-element is my best attempt at that vision.

You can discuss this post on the fediverse and on Lobste.rs. Here is the code and the demo.

Update: I also wrote about how I implemented accessibility in emoji-picker-element.

Linux on the desktop as a web developer

I’ve been using Ubuntu as my primary home desktop OS for the past year and a half, so I thought it would be a good time to write up my experiences. Hopefully this will be interesting to other web developers who are currently using Mac or Windows and may be Linux-curious.

Photo of a Dell laptop with an Ubuntu background and a monitor, keyboard, and mouse

My basic setup. Dell XPS 13, Kensington trackball mouse (yes I’m a weirdo who likes trackballs), Apple magic keyboard (I still prefer the feel), and a BenQ monitor (because I play some games where display lag matters)

Note: in this post, I’m mostly going to be talking about Ubuntu. I’ve played with other Linux distros, but I stick with Ubuntu because if I have a problem, I can Google it and find an answer 99.9% of the time.

Some history

I first switched to Linux in 2007, when I was at university. At the time I perceived it to be a huge step-up over Windows Vista (so much faster! and better for programmers!), but it also came with plenty of headaches:

  • WiFi didn’t work out of the box. I had to use ndiswrapper to wrap Windows drivers.
  • Multi-monitor and presentations were terrible. Every time I used xrandr I knew I would suffer.
  • Poor support for a lot of consumer applications. I recall running Netflix on Firefox in Wine because this was the only way to get it to work.

Around 2012 I switched to Mac – mostly because I noticed that every web developer giving a conference talk was using one. Then I became a dual Windows/Mac user when I joined Microsoft in 2016, and I didn’t consider Linux again until after I left Microsoft in 2018.

I’m happy to say that none of my old Linux headaches exist anymore in 2020. On my Dell XPS 13 (which comes with Ubuntu preinstalled), WiFi and multi-monitor work out-of-the-box. And since it seems everything is either an Electron app or a website these days, it’s rare to find a consumer app that doesn’t support Linux. (At least, the ones I care about; I’m sure you can find a counter-example!) The biggest gripe I have nowadays is with fonts, which is a far cry from fiddling with WiFi drivers.

OK so enough history, let’s talk about the good and the bad about Linux in 2020, from a web developer’s perspective.

The command line

I tend to live and breathe on the command line, and for me the command line on Linux is second-to-none.

The main reason should be clear: if you’re writing code that’s going to run on a server somewhere, that server is probably going to run Linux. Even if you’re not doing much sysadmin stuff, you’re probably using Linux to run your test and CI infrastructure. So eventually your code is going to have to run on Linux.

Using Linux as your desktop machine just makes things that much simpler. All my servers run Ubuntu, as do my Travis CI tests, as does my desktop. I know that my shell scripts will run exactly the same on all these environments. If you’ve ever run into headaches with subtle differences between the Mac and Linux shell (e.g. incompatible versions of grep, tar, and sed with slightly different flags, so you have to brew install coreutils and use ggrep and gtar… ugh), then you know what I’m talking about.

If you’re a Mac user, the hardest part about switching to the Linux terminal is probably just going to be the iTerm keyboard shortcuts you’ve memorized to open tabs and panes. I found the simplest solution was to use tmux instead. As an added bonus, tmux also runs on Mac, so if you learn the keyboard shortcuts once, you can use them everywhere. I set my terminal to automatically open tmux on startup.

Screenshot of tmux running in Ubuntu with a few panes open

Ah, the command line on Linux. Feels like home.

Since the command line is the main selling point of Linux (IMO), it’s tempting to just use Windows with the Windows Subsystem for Linux instead. This is definitely a viable option, and totally reasonable, especially if there’s that one Windows program you really need (or you’re a PC gamer). For me, though, I don’t do much PC gaming, and my experience with WSL is that although compatibility was excellent, the performance was poor. npm install would take orders of magnitude more time on WSL compared to the equivalent Mac or Linux machine. (Keep in mind I was using WSL back in 2016-2018 though, and I’m told it’s improved since then.)

Still, for me, I just don’t find Windows to my taste. The UI has always felt slow and clunky to me, which may just be my perception, although when I read blog posts like this one from Bruce Dawson I feel a bit vindicated. (Right-clicking taskbar icons is slow! Why?) In any case, Ubuntu starts up fast, the system updates are quick and unobtrusive, and it’s not missing any must-have apps for me. So I run 100% Ubuntu, no dual-boot even.

Web development

For the average web developer, most of the stuff you need is going to work just fine on Linux. You can run Chrome and VS Code (or WebStorm, my preference), and all your command-line utilities like node and npm will work the same. You can use nvm to manage Node versions. So far, so good.

As a web developer, the biggest issue I’ve run into is not having a quick way to test all three major browser engines – Blink (Chrome), Gecko (Firefox), and WebKit (Safari). Especially now that Edge has gone Chromium and the Trident/EdgeHTML lineage is slowly dying out, it’s really attractive that, with a Mac, you can test all three major browser engines without having to switch to another machine or use a tool like BrowserStack.

On Linux of course we have Chrome and Firefox, and those run mostly the same as they do on a Mac, so they fit the bill just fine. For WebKit we even have GNOME Web (aka Epiphany Browser), but I only consider it “okay” as a stand-in for Safari. It doesn’t support some of the Safari-specific APIs (e.g. backdrop filter, Apple Pay, etc.), and it’s terribly slow, but it’s good for a quick gut-check to see if some bit of code will run well on Safari or not.

Screenshot of Gnome Web on HTML5Test showing a score of 432

GNOME Web on HTML5Test. Safari it is not.

Unfortunately for me, I don’t consider that “good enough,” especially since the vast majority of Safari users are on iOS, so that’s the platform you really want to test. And here is where Linux runs into its biggest drawback from a web developer’s perspective: debugging iOS Safari.

If you want to debug Chrome or Firefox on Android – no problem. adb runs just fine on Linux, you can run chrome:inspect on Chrome or Remote Debugging on Firefox, and it all works great. For iOS Safari, though, the best option you have is remotedebug-ios-webkit-adapter, which uses ios-webkit-debug-proxy under the hood.

Essentially this is an elaborate suite of tools that makes iOS Safari kinda-sorta look like Chrome, so that you can use the Chrome DevTools to debug it. The most amazing thing about it is… it actually works! As long as you can get the odd dependencies running correctly, you’ll have your familiar Chrome DevTools attached to an iOS device.

Screenshot of debugging example.com in iOS Safari in Chrome DevTools on Linux, showing some iOS-specific APIs in the console

Believe it or not, you can actually debug iOS Safari from Linux.

If you have a spare iPhone or iPod Touch laying around, this is not a bad option. But it’s still a far cry from the streamlined experience on a Mac, where you can quickly run an iOS Simulator with any iOS version of your choice, and where Safari debugging works out-of-the-box.

For accessibility testing, it’s a similar story. Of course Firefox and Chrome have accessibility tools, but they’re no substitute for VoiceOver on Mac or NVDA on Windows. Linux does have the Orca screen reader, but I don’t see much point in testing it, since it’s not representative of actual screen reader usage. Especially given that screen readers may have bugs or quirks, I prefer testing the real deal. So I keep a Mac Mini and cheap Windows desktop around for this reason.

Conclusion

So in short, using Linux as your desktop environment if you’re a web developer is pretty great. You probably won’t miss much, as soon as you rewire your brain to get the keyboard shortcuts right.

I find that the main things I miss these days are some of Apple’s best built-in apps, such as Preview or Garage Band. I love Preview for taking a quick screenshot and drawing arrows and boxes on it (something I do surprisingly often), and I haven’t found any good substitutes on Linux. (I use Pinta, which is okay.) Other apps like ImageOptim also have no Linux version.

So if you depend on some Mac-only apps, or if you need best-in-class Safari and iOS debugging, then I wouldn’t recommend Linux over Mac. If your main focus is accessibility, it also might not be sufficient for you (although something like Assistiv Labs may change this calculus). But for everything else, it’s a great desktop OS for web development.

Thanks to Ben Frain for asking about my Linux experiences and inspiring this blog post.

Customizing fonts in Firefox on Linux

I recently updated my desktop machine from Ubuntu 18.04 to 20.04. (I tend to be skittish and stick to the LTS releases.) Everything went great, except that when I opened some of my go-to websites (such as GitHub), the fonts just looked… off.

Screenshot of GitHub rendered in Nimbus Sans

After some research, I learned a few things:

  1. Firefox relies on an OS-level utility called fontconfig to choose fonts.
  2. On Ubuntu 20.04, fontconfig has a default config file that automatically replaces proprietary fonts like Helvetica with free fonts like Nimbus Sans.
  3. If a web page’s CSS has Helvetica in the font-family property, Firefox will ask the OS, “Hey, do you have Helvetica?” And the OS will say, “Yep!” and serve up Nimbus Sans.

Nimbus Sans is not an attractive font to my eyes. I would have much preferred an alternative like DejaVu Sans or Liberation Sans (one of which, I’m pretty sure, is what Ubuntu 18.04 was using). So I did some digging.

Surprisingly, I couldn’t find a simple answer to “how do I tell Firefox which fonts to use on Linux?”. The fontconfig man pages are too dense to be much help. But after stumbling upon this helpful blog post, I eventually found a solution.

Create a fontconfig file

First, I created a file at ~/.config/fontconfig/fonts.conf (you may have to create the directory). Here’s mine:

<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">
<fontconfig>
  <alias>
    <family>Helvetica</family>
    <prefer>
      <family>Liberation Sans</family>
    </prefer>
  </alias>
</fontconfig>

Basically, this file says, “If you’re trying to render Helvetica, prefer Liberation Sans instead.”

Create a test HTML page

Next, I created a small HTML page that just renders various CSS font-familys, like this:

<h2 style="font-family: Helvetica">Helvetica</h2>
<h2 style="font-family: system-ui">system-ui</h2>
<h2 style="font-family: sans-serif">sans-serif</h2>
<h2 style="font-family: Arial">Arial</h2>

If you’re curious, by default Ubuntu 20.04 will give Nimbus Sans for Helvetica, DejaVu Serif for system-ui, DejaVu Sans for sans-serif, and Liberation Sans for Arial (on Firefox 75).

Then my debug cycle became:

  1. Edit the fonts.conf file.
  2. Reload the page in Firefox (you may have to close the tab and open a new one).
  3. Open the Firefox developer tools to the “Fonts” section.

Screenshot of Firefox DevTools showing the "fonts used" section

Using this technique, I could finally get GitHub (as well as some other sites) to better suit my tastes:

Screenshot of GitHub rendered in Liberation Sans

Here is a zoomed-in before and after (Nimbus Sans on top, Liberation Sans on bottom):

Screenshot comparing Liberation Sans to Nimbus Sans in GitHub

Hopefully this blog post is helpful to someone else trying to figure out fontconfig. Good luck!

Notes

This research upended my understanding of how font-family works in CSS. I figured that if a website declares font-family: Helvetica, sans-serif, then it will fall back to sans-serif if Helvetica doesn’t exist on the user’s OS. But given that the OS can lie about which fonts it supports (and thus choose “Helvetica”), this can really change how fonts render.

Apparently fc-match Helvetica is supposed to show which font fontconfig chooses for Helvetica. This didn’t work for me. Running this command (with or without sudo) would always return “Nimbus Sans,” regardless of what Firefox was showing. Clearing the cache with fc-cache -rv also had no effect. Go figure.

Also, the Firefox font settings (Preferences → Fonts and Colors) don’t work. Well, maybe they work for choosing the default sans-serif font, but they don’t help if a website’s CSS explicitly lists Helvetica first. (As they often do, which is perhaps a testament to the Mac-centrism of most web developers. I even found this pattern in my own website’s CSS, which probably dates back to when I was a Mac user.)

Also, be aware that Chrome uses its own system for managing fonts, different from Firefox’s. If you test Chrome, you’ll get different results.

Fixing memory leaks in web applications

Part of the bargain we struck when we switched from building server-rendered websites to client-rendered SPAs is that we suddenly had to take a lot more care with the resources on the user’s device. Don’t block the UI thread, don’t make the laptop’s fan spin, don’t drain the phone’s battery, etc. We traded better interactivity and “app-like” behavior for a new class of problems that don’t really exist in the server-rendered world.

One of these problems is memory leaks. A poorly-coded SPA can easily eat up megabytes or even gigabytes of memory, continuing to gobble up more and more resources, even as it’s sitting innocently in a background tab. At this point, the page might start to slow to a crawl, or the browser may just terminate the tab and you’ll see Chrome’s familiar “Aw, snap!” page.

Chrome page saying "Aw snap! Something went wrong while displaying this web page."

(Of course, a server-rendered website can also leak memory on the server side. But it’s extremely unlikely to leak memory on the client side, since the browser will clear the memory every time you navigate between pages.)

The subject of memory leaks is not well-covered in the web development literature. And yet, I’m pretty sure that most non-trivial SPAs leak memory, unless the team behind them has a robust infrastructure for catching and fixing memory leaks. It’s just far too easy in JavaScript to accidentally allocate some memory and forget to clean it up.

So why is so little written about memory leaks? My guesses:

  • Lack of complaints: most users are not diligently watching Task Manager while they surf the web. Typically, you won’t hear about it from your users unless the leak is so bad that the tab is crashing or the app is slowing down.
  • Lack of data: the Chrome team doesn’t provide data about how much memory websites are using in the wild. Nor are websites often measuring this themselves.
  • Lack of tooling: it’s still not easy to identify or fix memory leaks with existing tooling.
  • Lack of caring: browsers are pretty good at killing tabs that consume too much memory. Plus people seem to blame the browser rather than the websites.

In this post, I’d like to share some of my experience fixing memory leaks in web applications, and provide some examples of how to effectively track them down.

Anatomy of a memory leak

Modern web app frameworks like React, Vue, and Svelte use a component-based model. Within this model, the most common way to introduce a memory leak is something like this:

window.addEventListener('message', this.onMessage.bind(this));

That’s it. That’s all it takes to introduce a memory leak. If you call addEventListener on some global object (the window, the <body>, etc.) and then forget to clean it up with removeEventListener when the component is unmounted, then you’ve created a memory leak.

Worse, you’ve just leaked your entire component. Because this.onMessage is bound to this, the component has leaked. So have all of its child components. And very likely, so have all the DOM nodes associated with the components. This can get very bad very fast.

Here is the fix:

// Mount phase
this.onMessage = this.onMessage.bind(this);
window.addEventListener('message', this.onMessage);

// Unmount phase
window.removeEventListener('message', this.onMessage);

Note that we saved a reference to the bound onMessage function. You have to pass in exactly the same function to removeEventListener that you passed in to addEventListener, or else it won’t work.

The memory leak landscape

In my experience, the most common sources of memory leaks are APIs like these:

  1. addEventListener. This is the most common one. Call removeEventListener to clean it up.
  2. setTimeout / setInterval. If you create a recurring timer (e.g. to run every 30 seconds), then you need to clean it up with clearTimeout or clearInterval. (setTimeout can leak if it’s used like setInterval – i.e., scheduling a new setTimeout inside of the setTimeout callback.)
  3. IntersectionObserver, ResizeObserver, MutationObserver, etc. These new-ish APIs are very convenient, but they are also likely to leak. If you create one inside of a component, and it’s attached to a globally-available element, then you need to call disconnect() to clean them up. (Note that DOM nodes which are garbage-collected will have their listeners and observers garbage-collected as well. So typically, you only need to worry about global elements, e.g. the <body>, the document, an omnipresent header/footer element, etc.)
  4. Promises, Observables, EventEmitters, etc. Any programming model where you’re setting up a listener can leak memory if you forget to stop listening. (A Promise can leak if it’s never resolved or rejected, in which case any .then() callbacks attached to it will leak.)
  5. Global object stores. With something like Redux the state is global, so if you’re not careful, you can just keep appending memory to it and it will never get cleaned up.
  6. Infinite DOM growth. If you implement an infinite scrolling list without virtualization, then the number of DOM nodes will grow without bound.

Of course, there are plenty of other ways to leak memory, but these are the most common ones I’ve seen.

Identifying memory leaks

This is the hard part. I’ll start off by saying that I just don’t think any of the tooling out there is very good. I’ve tried Firefox’s memory tool, the Edge and IE memory tools, and even Windows Performance Analyzer. The best-in-class is still the Chrome Developer Tools, but it has a lot of rough edges that are worth knowing about.

In the Chrome DevTools, our main tool of choice is going to be the “heap snapshot” tool in the “Memory” tab. There are other memory tools in Chrome, but I don’t find them very helpful for identifying leaks.

Screenshot of the Chrome DevTools Memory tab with the Heap Snapshot tool

The Heap Snapshot tool allows you to take a memory capture of the main thread or web workers or iframes.

When you click the “take snapshot” button, you’ve captured all the live objects in a particular JavaScript VM on that web page. This includes objects referenced by the window, objects referenced by setInterval callbacks, etc. Think of it as a frozen moment in time representing all the memory used by that web page.

The next step is to reproduce some scenario that you think may be leaking – for instance, opening and closing a modal dialog. Once the dialog is closed, you’d expect memory to return back to the previous level. So you take another snapshot, and then diff it with the previous snapshot. This diffing is really the killer feature of the tool.

Diagram showing a first heapsnapshot followed by a leaking scenario followed by a second heap snapshot which should be equal to the first

However, there are a few limitations of the tool that you should be aware of:

  1. Even if you click the little “collect garbage” button, you may need to take a few consecutive snapshots for Chrome to truly clean up the unreferenced memory. In my experience, three should be enough. (Check the total memory size of each snapshot – it should eventually stabilize.)
  2. If you have web workers, service workers, iframes, shared workers, etc., then this memory will not be represented in the heap snapshot, because it lives in another JavaScript VM. You can capture this memory if you want, but just be sure you know which one you’re measuring.
  3. Sometimes the snapshotter will get stuck or crash. In that case, just close the browser tab and start all over again.

At this point, if your app is non-trivial, then you’re probably going to see a lot of leaking objects between the two snapshots. This is where things get tricky, because not all of these are true leaks. Many of these are just normal usage – some object gets de-allocated in favor of another one, something gets cached in a way that will get cleaned up later, etc.

Cutting through the noise

I’ve found that the best way to cut through the noise is to repeat the leaking scenario several times. For instance, instead of just opening and closing a modal dialog once, you might open and close it 7 times. (7 is a nice conspicuous prime number.) Then you can check the heap snapshot diff to see if any objects leaked 7 times. (Or 14 times, or 21 times.)

Screenshot of the Chrome DevTools heap snapshot diff showing six heap snapshot captures with several objects leaking 7 times

A heap snapshot diff. Note that we’re comparing snapshot #6 to snapshot #3, because I take three captures in a row to allow more garbage collection to occur. Also note that several objects are leaking 7 times.

(Another helpful technique is to run through the scenario once before recording the first snapshot. Especially if you are using a lot of code-splitting, then your scenario is likely to have a one-time memory cost of loading the necessary JavaScript modules.)

At this point, you might wonder why we should sort by the number of objects rather than the total memory. Intuitively, we’re trying to reduce the amount of memory leaking, so shouldn’t we focus on the total memory usage? Well, this doesn’t work very well, for an important reason.

When something is leaking, it’s because (to paraphrase Joe Armstrong) you’re holding onto the banana, but you ended up getting the banana, the gorilla holding the banana, and the whole jungle. If you measure based on total bytes, you’re measuring the jungle, not the banana.

Let’s go back to the addEventListener example above. The source of the leak is an event listener, which is referencing a function, which references a component, which probably references a ton of stuff like arrays, strings, and objects.

If you sort the heap snapshot diff by total memory, then it’s going to show you a bunch of arrays, strings, and objects – most of which are probably unrelated to the leak. What you really want to find is the event listener, but this takes up a minuscule amount of memory compared to the stuff it’s referencing. To fix the leak, you want to find the banana, not the jungle.

So if you sort by the number of objects leaked, you’re going to see 7 event listeners. And maybe 7 components, and 14 sub-components, or something like that. That “7” should stand out like a sore thumb, since it’s such an unusual number. And no matter how many times you repeat the scenario, you should see exactly that number of objects leaking. This is how you can quickly find the source of the leak.

Walking the retainer tree

The heap snapshot diff will also show you a “retainer” chain showing which objects are pointing to which other objects and thus keeping the memory alive. This is how you can figure out where the leaking object was allocated.

Screenshot of a retainer chain showing someObject referenced by a closure referenced by an event listener

The retainer chain shows you which object is referencing the leaked object. The way to read it is that each object is referenced by the object below it.

In the above example, there is a variable called someObject which is referenced by a closure (aka “context”), which is referenced by an event listener. If you click the source link, it will take you to the JavaScript declaration, which is fairly straightforward:

class SomeObject () { /* ... */ }

const someObject = new SomeObject();
const onMessage = () => { /* ... */ };
window.addEventListener('message', onMessage);

In the above example, the “context” is the onMessage closure which references the someObject variable. (This is a contrived example; real memory leaks can be much less obvious!)

But the heap snapshotting tool has several limitations:

  1. If you save and re-load the snapshot file, then you will lose all the file references to where the object was allocated. So for instance, you won’t see that the event listener’s closure comes from line 22 of foo.js. Since this is really critical information, it’s almost useless to save and send heap snapshot files.
  2. If there are WeakMaps involved, then Chrome will show you those references even though they don’t really matter – those objects would be de-allocated as soon as the other references are cleaned up. So they are just noise.
  3. Chrome classifies the objects by their prototype. So the more you use actual classes/functions and the less you use anonymous objects, the easier it will be to see what exactly is leaking. As an example, imagine if our leak was somehow due to an object rather than an EventListener. Since object is extremely generic, we’re unlikely to see exactly 7 of them leaking.

This is my basic strategy for identifying memory leaks. I’ve successfully used this technique to find dozens of memory leaks in the past.

This guide is just the start, though – beyond this, you will also have to be handy with setting breakpoints, logging, and testing your fix to see if it resolves the leak. Unfortunately, this is just inherently a time-consuming process.

Automated memory leak analysis

I’ll precede this by saying that I haven’t found a great way to automate the detection of memory leaks. Chrome has a non-standard performance.memory API, but for privacy reasons it doesn’t have a very precise granularity, so you can’t really use it in production to identify leaks. The W3C Web Performance Working Group has discussed memory tooling in the past, but has yet to agree on a new standard to replace this API.

In a lab or synthetic testing environment, you can increase the granularity on this API by using the Chrome flag --enable-precise-memory-info. You can also create heap snapshot files by calling the proprietary Chromedriver command :takeHeapSnapshot. This has the same limitation mentioned above, though – you probably want to take three in a row and discard the first two.

Since event listeners are the most common source of memory leaks, another technique that I’ve used is to monkey-patch the addEventListener and removeEventListener APIs to count the references and ensure they return to zero. Here is an example of how to do that.

In the Chrome DevTools, you can also use the proprietary getEventListeners() API to see the event listeners attached to a particular element. Note that this can only be used in DevTools, though.

Update: Mathias Bynens has informed me of another useful DevTools API: queryObjects(), which can show you all objects created with a particular constructor. Christoph Guttandin also has an interesting blog post about using this API for automated memory leak detection in Puppeteer.

Summary

The state of finding and fixing memory leaks in web apps is still fairly rudimentary. In this blog post, I’ve covered some of the techniques that have worked for me, but admittedly this is still a difficult and time-consuming process.

As with most performance problems, an ounce of prevention can be worth a pound of cure. You might find it worthwhile to put synthetic testing in place rather than trying to debug a memory leak after the fact. Especially if there are several leaks on a page, it will probably turn into an onion-peeling exercise – you fix one leak, then find another, then repeat (while weeping the whole time!). Code review can also help catch common memory leak patterns, if you know what to look for.

JavaScript is a memory-safe language, so it’s somewhat ironic how easy it is to leak memory in a web application. Part of this is just inherent to UI design, though – we need to listen for mouse events, scroll events, keyboard events, etc., and these are all patterns that can easily lead to memory leaks. But by trying to keep our web applications’ memory usage low, we can improve runtime performance, avoid crashes, and be respectful of resource constraints on the user’s device.

Thanks to Jake Archibald and Yang Guo for feedback on a draft of this post. And thanks to Dinko Bajric for inventing the “choose a prime number” technique, which I’ve found so helpful in memory leak analysis.