Archive for the ‘performance’ Category

SPAs: theory versus practice

I’ve been thinking a lot recently about Single-Page Apps (SPAs) and Multi-Page Apps (MPAs). I’ve been thinking about how MPAs have improved over the years, and where SPAs still have an edge. I’ve been thinking about how complexity creeps into software, and why a developer may choose a more complex but powerful technology at the expense of a simpler but less capable technology.

I think this core dilemma – complexity vs simplicity, capability vs maintainability – is at the heart of a lot of the debates about web app architecture. Unfortunately, these debates are so often tied up in other factors (a kind of web dev culture war, Twitter-stoked conflicts, maybe even a generational gap) that it can be hard to see clearly what the debate is even about.

At the risk of grossly oversimplifying things, I propose that the core of the debate can be summed up by these truisms:

  1. The best SPA is better than the best MPA.
  2. The average SPA is worse than the average MPA.

The first statement should be clear to most seasoned web developers. Show me an MPA, and I can show you how to make it better with JavaScript. Added too much JavaScript? I can show you some clever ways to minimize, defer, and multi-thread that JavaScript. Ran into some bugs, because now you’ve deviated from the browser’s built-in behavior? There are always ways to fix it! You’ve got JavaScript.

Whereas with an MPA, you are delegating some responsibility to the browser. Want to animate navigations between pages? You can’t (yet). Want to avoid the flash of white? You can’t, until Chrome fixes it (and it’s not perfect yet). Want to avoid re-rendering the whole page, when there’s only a small subset that actually needs to change? You can’t; it’s a “full page refresh.”

My second truism may be more controversial than the first. But I think time and experience have shown that, whatever the promises of SPAs, the reality has been less convincing. It’s not hard to find examples of poorly-built SPAs that score badly on a variety of metrics (performance, accessibility, reliability), and which could have been built better and more cheaply as a bog-standard MPA.

Example: subsequent navigations

To illustrate, let’s consider one of the main value propositions of an SPA: making subsequent navigations faster.

Rich Harris recently offered an example of using the SvelteKit website (SPA) compared to the Astro website (MPA), showing that page navigations on the Svelte site were faster.

Now, to be clear, this is a bit of an unfair comparison: the Svelte site is preloading content when you hover over links, so there’s no network call by the time you click. (Nice optimization!) Whereas the Astro site is not using a Service Worker or other offlining – if you throttle to 3G, it’s even slower relative to the Svelte site.

But I totally believe Rich is right! Even with a Service Worker, Astro would have a hard time beating SvelteKit. The amount of DOM being updated here is small and static, and doing the minimal updates in JavaScript should be faster than asking the browser to re-render the full HTML. It’s hard to beat element.innerHTML = '...'.

However, in many ways this site represents the ideal conditions for an SPA navigation: it’s small, it’s lightweight, it’s built by the kind of experts who build their own JavaScript framework, and those experts are also keen to get performance right – since this website is, in part, a showcase for the framework they’re offering. What about real-world websites that aren’t built by JavaScript framework authors?

Anthony Ricaud recently gave a talk (in French – apologies to non-Francophones) where he analyzed the performance of real-world SPAs. In the talk, he asks: What if these sites used standard MPA navigations?

To answer this, he built a proxy that strips the site of its first-party JavaScript (leaving the kinds of ads and trackers that, sadly, many teams are not allowed to forgo), as well as another version of the proxy that doesn’t strip any JavaScript. Then, he scripted WebPageTest to click an internal link, measuring the load times for both versions (on throttled 4G).

So which was faster? Well, out of the three sites he tested, on both mobile (Moto G4) and desktop, the MPA was either just as fast or faster, every time. In some cases, the WebPageTest filmstrips even showed that the MPA version was faster by several seconds. (Note again: these are subsequent navigations.)

On top of that, the MPA sites gave immediate feedback to the user when clicking – showing a loading indicator in the browser chrome. Whereas some of the SPAs didn’t even manage to show a “skeleton” screen before the MPA had already finished loading.

Screenshot from conference talk showing a speaker on the left and a WebPageTest filmstrip on the right. The filmstrip compares two sites: the first takes 5.5 seconds and the second takes 2.5 seconds

Screenshot from Anthony Ricaud’s talk. The SPA version is on top (5.5s), and the MPA version is on bottom (2.5s).

Now, I don’t think this experiment is perfect. As Anthony admits, removing inline <script>s removes some third-party JavaScript as well (the kind that injects itself into the DOM). Also, removing first-party JavaScript removes some non-SPA-related JavaScript that you’d need to make the site interactive, and removing any render-blocking inline <script>s would inherently improve the visual completeness time.

Even with a perfect experiment, there are a lot of variables that could change the outcome for other sites:

  • How fast is the SSR?
  • Is the HTML streamed?
  • How much of the DOM needs to be updated?
  • Is a network request required at all?
  • What JavaScript framework is being used?
  • How fast is the client CPU?
  • Etc.

Still, it’s pretty gobsmacking that JavaScript was slowing these sites down, even in the one case (subsequent navigations) where JavaScript should be making things faster.

Exhausted developers and clever developers

Now, let’s return to my truisms from the start of the post:

  1. The best SPA is better than the best MPA.
  2. The average SPA is worse than the average MPA.

The cause of so much debate, I think, is that two groups of developers may look at this situation, agree on the facts on the ground, but come to two different conclusions:

“The average SPA sucks? Well okay, I should stop building SPAs then. Problem solved.” – Exhausted developer

 

“The average SPA sucks? That’s just because people haven’t tried hard enough! I can think of 10 ways to fix it.” – Clever developer

Let’s call these two archetypes the exhausted developer and the clever developer.

The exhausted developer has had enough with managing the complexity of “modern” web sites and web applications. Too many build tools, too many code paths, too much to think about and maintain. They have JavaScript fatigue. Throw it all away and simplify!

The clever developer is similarly frustrated by the state of modern web development. But they also deeply understand how the web works. So when a tool breaks or a framework does something in a sub-optimal way, it irks them, because they can think of a better way. Why can’t a framework or a tool fix this problem? So they set out to find a new tool, or to build it themselves.

The thing is, I think both of these perspectives are right. Clever developers can always improve upon the status quo. Exhausted developers can always save time and effort by simplifying. And one group can even help the other: for instance, maybe Parcel is approachable for those exhausted by Webpack, but a clever developer had to go and build Parcel first.

Conclusion

The disparity between the best and the average SPA has been around since the birth of SPAs. In the mid-2000s, people wanted to build SPAs because they saw how amazing GMail was. What they didn’t consider is that Google had a crack team of experts monitoring every possible problem with SPAs, right down to esoteric topics like memory leaks. (Do you have a team like that?)

Ever since then, JavaScript framework and tooling authors have been trying to democratize SPA tooling, bringing us the kinds of optimizations previously only available to the Googles and the Facebooks of the world. Their intentions have been admirable (I would put my own fuite on that pile), but I think it’s fair to say the results have been mixed.

An expert developer can stand up on a conference stage and show off the amazing scores for their site (perfect performance! perfect accessibility! perfect SEO!), and then an excited conference-goer returns to their team, convinces them to use the same tooling, and two years later they’ve built a monstrosity. When this happens enough times, the same conference-goer may start to distrust the next dazzling demo they see.

And yet… the web dev community marches forward. Today I can grab any number of “starter” app toolkits and build something that comes out-of-the-box with code-splitting, Service Workers, tree-shaking, a thousand different little micro-optimizations that I don’t even have to know the names of, because someone else has already thought of it and gift-wrapped it for me. That is a miracle, and we should be grateful for it.

Given enough innovation in this space, it is possible that, someday, the average SPA could be pretty great. If it came batteries-included with proper scroll, focus, and screen reader announcements, tooling to identify performance problems (including memory leaks), progressive DOM rendering (e.g. Jake Archibald’s hack), and a bunch of other optimizations, it’s possible that developers would fall into the “pit of success” and consistently make SPAs that outclass the equivalent MPA. I remain skeptical that we’ll get there, and even the best SPA would still have problems (complexity, performance on slow clients, etc.), but I can’t fault people for trying.

At the same time, browsers never stop taking the lessons from userland and upstreaming them into the browser itself, giving us more lines of code we can potentially delete. This is why it’s important to periodically re-evaluate the assumptions baked into our tooling.

Today, I think the core dilemma between SPAs and MPAs remains unresolved, and will maybe never be resolved. Both SPAs and MPAs have their strengths and weaknesses, and the right tool for the job will vary with the size and skills of the team and the product they’re trying to build. It will also vary over time, as browsers evolve. The important thing, I think, is to remain open-minded, skeptical, and analytical, and to accept that everything in software development has tradeoffs, and none of those tradeoffs are set in stone.

Style scoping versus shadow DOM: which is fastest?

Last year, I asked the question: Does shadow DOM improve style performance? I didn’t give a clear answer, so perhaps it’s no surprise that some folks weren’t sure what conclusion to draw.

In this post, I’d like to present a new benchmark that hopefully provides a more solid answer.

TL;DR: My new benchmark largely confirmed my previous research, and shadow DOM comes out as the most consistently performant option. Class-based style scoping slightly beats shadow DOM in some scenarios, but in others it’s much less performant. Firefox, thanks to its multi-threaded style engine, is much faster than Chrome or Safari.

Shadow DOM and style performance

To recap: shadow DOM has some theoretical benefits to style calculation, because it allows the browser to work with a smaller DOM size and smaller CSS rule set. Rather than needing to compare every CSS rule against every DOM node on the page, the browser can work with smaller “sub-DOMs” when calculating style.

However, browsers have a lot of clever optimizations in this area, and userland “style scoping” solutions have emerged (e.g. Vue, Svelte, and CSS Modules) that effectively hook into these optimizations. The way they typically do this is by adding a class or an attribute to the CSS selector: e.g. * { color: red } becomes *.xxx { color: red }, where xxx is a randomly-generated token unique to each component.

After crunching the numbers, my post showed that class-based style scoping was actually the overall winner. But shadow DOM wasn’t far behind, and it was the more consistently fast option.

These nuances led to a somewhat mixed reaction. For instance, here’s one common response I saw (paraphrasing):

The fastest option overall is class-based scoped styles, ala Svelte or CSS Modules. So shadow DOM isn’t really that great.

But looking at the same data, you could reach another, totally reasonable, conclusion:

With shadow DOM, the performance stays constant instead of scaling with the size of the DOM or the complexity of the CSS. Shadow DOM allows you to use whatever CSS selectors you want and not worry about performance.

Part of it may have been people reading into the data what they wanted to believe. If you already dislike shadow DOM (or web components in general), then you can read my post and conclude, “Wow, shadow DOM is even more useless than I thought.” Or if you’re a web components fan, then you can read my post and think, “Neat, shadow DOM can improve performance too!” Data is in the eye of the beholder.

To drive this point home, here’s the same data from my post, but presented in a slightly different way:

Chart image, see table below for the same data

Click for details

This is 1,000 components, 10 rules per component.

Selector performance (ms) Chrome Firefox Safari
Class selectors 58.5 22 56
Attribute selectors 597.1 143 710
Class selectors – shadow DOM 70.6 30 61
Attribute selectors – shadow DOM 71.1 30 81

As you can see, the case you really want to avoid is the second one – bare attribute selectors. Inside of the shadow DOM, though, they’re fine. Class selectors do beat shadow DOM overall, but only by a rounding error.

My post also showed that more complex selectors are consistently fast inside of the shadow DOM, even if they’re much slower at the global level. This is exactly what you would expect, given how shadow DOM works – the real surprise is just that shadow DOM doesn’t handily win every category.

Re-benchmarking

It didn’t sit well with me that my post didn’t draw a firm conclusion one way or the other. So I decided to benchmark it again.

This time, I tried to write a benchmark to simulate a more representative web app. Rather than focusing on individual selectors (ID, class, attribute, etc.), I tried to compare a userland “scoped styles” implementation against shadow DOM.

My new benchmark generates a DOM tree based on the following inputs:

  • Number of “components” (web components are not used, since this benchmark is about shadow DOM exclusively)
  • Elements per component (with a random DOM structure, with some nesting)
  • CSS rules per component (randomly generated, with a mix of tag, class, attribute, :not(), and :nth-child() selectors, and some descendant and compound selectors)
  • Classes per component
  • Attributes per component

To find a good representative for “scoped styles,” I chose Vue 3’s implementation. My previous post showed that Vue’s implementation is not as fast as that of Svelte or CSS Modules, since it uses attributes instead of classes, but I found Vue’s code to be easier to integrate. To make things a bit fairer, I added the option to use classes rather than attributes.

One subtlety of Vue’s style scoping is that it does not scope ancestor selectors. For instance:

/* Input */
div div {}

/* Output - Vue */
div div[data-v-xxx] {}

/* Output - Svelte */
div.svelte-xxx div.svelte-xxx {}

(Here is a demo in Vue and a demo in Svelte.)

Technically, Svelte’s implementation is more optimal, not only because it uses classes rather than attributes, but because it can rely on the Bloom filter optimization for ancestor lookups (e.g. :not(div) div.svelte-xxx:not(div) div.svelte-xxx, with .svelte-xxx in the ancestor). However, I kept the Vue implementation because 1) this analysis is relevant to Vue users at least, and 2) I didn’t want to test every possible permutation of “scoped styles.” Adding the “class” optimization is enough for this blog post – perhaps the “ancestor” optimization can come in future work.

Note: In benchmark after benchmark, I’ve seen that class selectors are typically faster than attribute selectors – sometimes by a lot, sometimes by a little. From the web developer’s perspective, it may not be obvious why. Part of it is just browser vendor priorities: for instance, WebKit invented the Bloom filter optimization in 2011, but originally it only applied to tags, classes, and IDs. They expanded it to attributes in 2018, and Chrome and Firefox followed suit in 2021 when I filed these bugs on them. Perhaps something about attributes also makes them intrinsically harder to optimize than classes, but I’m not a browser developer, so I won’t speculate.

Methodology

I ran this benchmark on a 2021 MacBook Pro (M1), running macOS Monterey 12.4. The M1 is perhaps not ideal for this, since it’s a very fast computer, but I used it because it’s the device I had, and it can run all three of Chrome, Firefox, and Safari. This way, I can get comparable numbers on the same hardware.

In the test, I used the following parameters:

Parameter Value
Number of components 1000
Elements per component 10
CSS rules per component 10
Classes per element 2
Attributes per element 2

I chose these values to try to generate a reasonable “real-world” app, while also making the app large enough and interesting enough that we’d actually get some useful data out of the benchmark. My target is less of a “static blog” and more of a “heavyweight SPA.”

There are certainly more inputs I could have added to the benchmark: for instance, DOM depth. As configured, the benchmark generates a DOM with a maximum depth of 29 (measured using this snippet). Incidentally, this is a decent approximation of a real-world app – YouTube measures 28, Reddit 29, and Wikipedia 17. But you could certainly imagine more heavyweight sites with deeper DOM structures, which would tend to spend more time in descendant selectors (outside of shadow DOM, of course – descendant selectors cannot cross shadow boundaries).

For each measurement, I took the median of 5 runs. I didn’t bother to refresh the page between each run, because it didn’t seem to make a big difference. (The relevant DOM was being blown away every time.) I also didn’t randomize the stylesheets, because the browsers didn’t seem to be doing any caching that would require randomization. (Browsers have a cache for stylesheet parsing, as I discussed in this post, but not for style calculation, insofar as it matters for this benchmark anyway.)

Update: I realized this comment was a bit blasé, so I re-ran the benchmark with a fresh browser session between each sample, just to make sure the browser cache wasn’t affecting the numbers. You can find those numbers at the end of the post. (Spoiler: no big change.)

Although the benchmark has some randomness, I used random-seedable with a consistent seed to ensure reproducible results. (Not that the randomness was enough to really change the numbers much, but I’m a stickler for details.)

The benchmark uses a requestPostAnimationFrame polyfill to measure style/layout/paint performance (see this post for details). To focus on style performance only, a DOM structure with only absolute positioning is used, which minimizes the time spent in layout and paint.

And just to prove that the benchmark is actually measuring what I think it’s measuring, here’s a screenshot of the Chrome DevTools Performance tab:

Screenshot of Chrome DevTools showing a large amount of time taken up by the User Timing called "total" with most of that containing a time slice called "Recalculate style"

Note that the measured time (“total”) is mostly taken up by “Recalculate Style.”

Results

When discussing the results, it’s much simpler to go browser-by-browser, because each one has different quirks.

One of the things I like about analyzing style performance is that I see massive differences between browsers. It’s one of those areas of browser performance that seems really unsettled, with lots of work left to do.

That is… unless you’re Firefox. I’m going to start off with Firefox, because it’s the biggest outlier out of the three major browser engines.

Firefox

Firefox’s Stylo engine is fast. Like, really fast. Like, so fast that, if every browser were like Firefox, there would be little point in discussing style performance, because it would be a bit like arguing over the fastest kind of for-loop in JavaScript. (I.e., interesting minutia, but irrelevant except for the most extreme cases.)

In almost every style calculation benchmark I’ve seen over the past five years, Firefox smokes every other browser engine to the point where it’s really in a class of its own. Whereas other browsers may take over 1,000ms in a given scenario, Firefox will take ~100ms for the same scenario on the same hardware.

So keep in mind that, with Firefox, we’re going to be talking about really small numbers. And the differences between them are going to be even smaller. But here they are:

Chart data, see details in table below

Click for table
Scenario Firefox 101
Scoping – classes 30
Scoping – attributes 38
Shadow DOM 26
Unscoped 114

Note that, in this benchmark, the first three bars are measuring roughly the same thing – you end up with the same DOM with the same styles. The fourth case is a bit different – all the styles are purely global, with no scoping via classes or attributes. It’s mostly there as a comparison point.

My takeaway from the Firefox data is that scoping with either classes, attributes, or shadow DOM is fine – they’re all pretty fast. And as I mentioned, Firefox is quite fast overall. As we move on to other browsers, you’ll see how the performance numbers get much more varied.

Chrome

The first thing you should notice about Chrome’s data is how much higher the y-axis is compared to Firefox. With Firefox, we were talking about ~100ms at the worst, whereas now with Chrome, we’re talking about an order of magnitude higher: ~1,000ms. (Don’t feel bad for Chrome – the Safari numbers will look pretty similar.)

Chart data, see details in table below

Click for table
Scenario Chrome 102
Scoping – classes 357
Scoping – attributes 614
Shadow DOM 49
Unscoped 1022

Initially, the Chrome data tells a pretty simple story: shadow DOM is clearly the fastest, followed by style scoping with classes, followed by style scoping with attributes, followed by unscoped CSS. So the message is simple: use Shadow DOM, but if not, then use classes instead of attributes for scoping.

I noticed something interesting with Chrome, though: the performance numbers are vastly different for these two cases:

  • 1,000 components: insert 1,000 different <style>s into the <head>
  • 1,000 components: concatenate those styles into one big <style>

As it turns out, this simple optimization greatly improves the Chrome numbers:

Chart data, see details in table below

Click for table
Scenario Chrome 102 – separate styles Chrome 102 – concatenated
Classes 357 48
Attributes 614 43

When I first saw these numbers, I was confused. I could understand this optimization in terms of reducing the cost of DOM insertions. But we’re talking about style calculation – not DOM API performance. In theory, it shouldn’t matter whether there are 1,000 stylesheets or one big stylesheet. And indeed, Firefox and Safari show no difference between the two:

Chart data, see details in table below

Click for table
Scenario Firefox 101 – separate styles Firefox 101 – concatenated
Classes 30 29
Attributes 38 38

Chart data, see details in table below

Click for table
Scenario Safari 15.5 – separate styles Safari 15.5. – concatenated
Classes 75 73
Attributes 812 820

This behavior was curious enough that I filed a bug on Chromium. According to the Chromium engineer who responded (thank you!), this is because of a design decision to trade off some initial performance in favor of incremental performance when stylesheets are modified or added. (My benchmark is a bit unfair to Chrome, since it only measures the initial calculation. A good idea for a future benchmark!)

This is actually a pretty interesting data point for JavaScript framework and bundler authors. It seems that, for Chromium anyway, the ideal technique is to concatenate stylesheets similarly to how JavaScript bundlers do code-splitting – i.e. trying to concatenate as much as possible, while still splitting in some cases to optimize for caching across routes. (Or you could go full inline and just put one big <style> on every page.) Keep in mind, though, that this is a peculiarity of Chromium’s current implementation, and it could go away at any moment if Chromium decides to change it.

In terms of the benchmark, though, it’s not clear to me what to do with this data. You might imagine that it’s a simple optimization for a JavaScript framework (or meta-framework) to just concatenate all the styles together, but it’s not always so straightforward. When a component is mounted, it may call getComputedStyle() on its own DOM nodes, so batching up all the style insertions until after a microtask is not really feasible. Some meta-frameworks (such as Nuxt and SvelteKit) leverage a bundler to concatenate the styles and insert them before the component is mounted, but it feels a bit unfair to depend on that for the benchmark.

To me, this is one of the core advantages of shadow DOM – you don’t have to worry if your bundler is configured correctly or if your JavaScript framework uses the right kind of style scoping. Shadow DOM is just performant, all of the time, full stop. That said, here is the Chrome comparison data with the concatenation optimization applied:

Chart data, see details in table below

Click for table
Scenario Chrome 102 (with concatenation optimization)
Scoping – classes 48
Scoping – attributes 43
Shadow DOM 49
Unscoped 1022

The first three are close enough that I think it’s fair to say that all of the three scoping methods (class, attribute, and shadow DOM) are fast enough.

Note: You may wonder if Constructable Stylesheets would have an impact here. I tried a modified version of the benchmark that uses these, and didn’t observe any difference – Chrome showed the same behavior for concatenation vs splitting. This makes sense, as none of the styles are duplicated, which is the main use case Constructable Stylesheets are designed for. I have found elsewhere, though, that Constructable Stylesheets are more performant than <style> tags in terms of DOM API performance, if not style calculation performance (e.g. see here, here, and here).

Safari

In our final tour of browsers, we arrive at Safari:

Chart data, see details in table below

Click for table
Scenario Safari 15.5
Scoping – classes 75
Scoping – attributes 812
Shadow DOM 94
Unscoped 840

To me, the Safari data is the easiest to reason about. Class scoping is fast, shadow DOM is fast, and unscoped CSS is slow. The one surprise is just how slow attribute selectors are compared to class selectors. Maybe WebKit has some more optimizations to do in this space – compared to Chrome and Firefox, attributes are just a much bigger performance cliff relative to classes.

This is another good example of why class scoping is superior to attribute scoping. It’s faster in all the engines, but the difference is especially stark in Safari. (Or you could use shadow DOM and not worry about it at all.)

Conclusion

Performance shouldn’t be the main reason you choose a technology like scoped styles or shadow DOM. You should choose it because it fits well with your development paradigm, it works with your framework of choice, etc. Style performance usually isn’t the biggest bottleneck in a web application, although if you have a lot of CSS or a large DOM size, then you may be surprised by the amount of “Recalculate Style” costs in your next performance trace.

One can also hope that someday browsers will advance enough that style calculation becomes less of a concern. As I mentioned before, Stylo exists, it’s very good, and other browsers are free to borrow its ideas for their own engines. If every browser were as fast as Firefox, I wouldn’t have a lot of material for this blog post.

Chart data, see details in table below

This is the same data presented in this post, but on a single chart. Just notice how much Firefox stands out from the other browsers.

Click for table
Scenario Chrome 102 Firefox 101 Safari 15.5
Scoping – classes 357 30 75
Scoping – attributes 614 38 812
Shadow DOM 49 26 94
Unscoped 1022 114 840
Scoping – classes – concatenated 48 29 73
Scoping – attributes – concatenated 43 38 820

For those who dislike shadow DOM, there is also a burgeoning proposal in the CSS Working Group for style scoping. If this proposal were adopted, it could provide a less intrusive browser-native scoping mechanism than shadow DOM, similar to the abandoned <style scoped> proposal. I’m not a browser developer, but based on my reading of the spec, I don’t see why it couldn’t offer the same performance benefits we see with shadow DOM.

In any case, I hope this blog post was interesting, and helped shine light on an odd and somewhat under-explored space in web performance. Here is the benchmark source code and a live demo in case you’d like to poke around.

Thanks to Alex Russell and Thomas Steiner for feedback on a draft of this blog post.

Afterword – more data

Updated June 23, 2022

After writing this post, I realized I should take my own advice and automate the benchmark so that I could have more confidence in the numbers (and make it easier for others to reproduce).

So, using Tachometer, I re-ran the benchmark, taking the median of 25 samples, where each sample uses a fresh browser session. Here are the results:

Chart data, see details in table below

Click for table
Scenario Chrome 102 Firefox 101 Safari 15.5
Scoping – classes 277.1 45 80
Scoping – attributes 418.8 54 802
Shadow DOM 56.80000001 67 82
Unscoped 820.4 190 857
Scoping – classes – concatenated 44.30000001 42 80
Scoping – attributes – concatenated 44.5 51 802
Unscoped – concatenated 251.3 167 865

As you can see, the overall conclusion of my blog post doesn’t change, although the numbers have shifted slightly in absolute terms.

I also added “Unscoped – concatenated” as a category, because I realized that the “Unscoped” scenario would benefit from the concatenation optimization as well (in Chrome, at least). It’s interesting to see how much of the perf win is coming from concatenation, and how much is coming from scoping.

If you’d like to see the raw numbers from this benchmark, you can download them here.

Second afterword – even more data

Updated June 25, 2022

You may wonder how much Firefox’s Stylo engine is benefiting from the 10 cores in that 2021 Mac Book Pro. So I unearthed my old 2014 Mac Mini, which has only 2 cores but (surprisingly) can still run macOS Monterey. Here are the results:

Chart data, see details in table below

Click for table
Scenario Chrome 102 Firefox 101 Safari 15.5
Scoping – classes 717.4 107 187
Scoping – attributes 1069.5 162 2853
Shadow DOM 227.7 117 233
Unscoped 2674.5 452 3132
Scoping – classes – concatenated 189.3 104 188
Scoping – attributes – concatenated 191.9 159 2826
Unscoped – concatenated 865.8 422 3148

(Again, this is the median of 25 samples. Raw data.)

Amazingly, Firefox seems to be doing even better here relative to the other browsers. For “Unscoped,” it’s 14.4% of the Safari number (vs 22.2% on the MacBook), and 16.9% of the Chrome number (vs 23.2% on the MacBook). Whatever Stylo is doing, it’s certainly impressive.

Memory leaks: the forgotten side of web performance

I’ve researched and learned enough about client-side memory leaks to know that most web developers aren’t worrying about them too much. If a web app leaks 5 MB on every interaction, but it still works and nobody notices, then does it matter? (Kinda sounds like a “tree in the forest” koan, but bear with me.)

Even those who have poked around in the browser DevTools to dabble in the arcane art of memory leak detection have probably found the experience… daunting. The effort-to-payoff ratio is disappointingly high, especially compared to the hundreds of other things that are important in web development, like security and accessibility.

So is it really worth the effort? Do memory leaks actually matter?

I would argue that they do matter, if only because the lack of care (as shown by public-facing SPAs leaking up to 186 MB per interaction) is a sign of the immaturity of our field, and an opportunity for growth. Similarly, five years ago, there was much less concern among SPA authors for accessibility, security, runtime performance, or even ensuring that the back button maintained scroll position (or that the back button worked at all!). Today, I see a lot more discussion of these topics among SPA developers, and that’s a great sign that our field is starting to take our craft more seriously.

So why should you, and why shouldn’t you, care about memory leaks? Obviously I’m biased because I have an axe to grind (and a tool I wrote, fuite), but let me try to give an even-handed take.

Memory leaks and software engineering

In terms of actual impact on the business of web development, memory leaks are a funny thing. If you speed up your website by 2 seconds, everyone agrees that that’s a good thing with a visible user impact. If you reduce your website’s memory leak by 2 MB, can we still agree it was worth it? Maybe not.

Here are some of the unique characteristics of memory leaks that I’ve observed, in terms of how they actually fit into the web development process. Memory leaks are:

  1. Low-impact until critical
  2. Hard to diagnose
  3. Trivial to fix once diagnosed

Low-impact…

Most web apps can leak memory and no one will ever notice. Not the user, not the website author – nobody. There are a few reasons for this.

First off, browsers are well aware that the web is a leaky mess and are already ruthless about killing background tabs that consume too much memory. (My former colleague on the Microsoft Edge performance team, Todd Reifsteck, told me way back in 2016 that “the web leaks like a sieve.”) A lot of users are tab hoarders (essentially using tabs as bookmarks), and there’s a tacit understanding between browser and user that you can’t really have 100 tabs open at once (in the sense that the tab is actively running and instantly available). So you click on a tab that’s a few weeks old, boom, there’s a flash of white while the page loads, and nobody seems to mind much.

Second off, even for long-lived SPAs that the user may habitually check in on (think: GMail, Evernote, Discord), there are plenty of opportunities for a page refresh. The browser needs to update. The user doesn’t trust that the data is fresh and hits F5. Something goes wrong because programmers are terrible at managing state, and users are well aware that the old turn-it-off-and-back-on-again solves most problems. All of this means that even a multi-MB leak can go undetected, since a refresh will almost always occur before an Out Of Memory crash.

Screenshot of Chrome browser window with sad tab and "aw snap something went wrong" message

Chrome’s Out Of Memory error page. If you see this, something has gone very wrong.

Third, it’s a tragedy-of-the-commons situation, and people tend to blame the browser. Chrome is a memory hog. Firefox gobbles up RAM. Safari is eating all my memory. For reasons I can’t quite explain, people with 100+ open tabs are quick to blame the messenger. Maybe this goes back to the first point: tab hoarders expect the browser to automatically transition tabs from “thing I’m actively using” to “background thing that is basically a bookmark,” seamlessly and without a hitch. Browsers have different heuristics about this, some heuristics are better than others, and so in that sense, maybe it is the browser’s “fault” for failing to adapt to the user’s tab-hoarding behavior. In any case, the website author tends to escape the blame, especially if their site is just 1 out of 100 naughty tabs that are all leaking memory. (Although this may change as more browsers call out tabs individually in Task Manager, e.g. Edge and Safari.)

…Until critical

What’s interesting, though, is that every so often a memory leak will get so bad that people actually start to notice. Maybe someone opens up Task Manager and wonders why a note-taking app is consuming more RAM than DOTA. Maybe the website slows to a crawl after a few hours of usage. Maybe the users are on a device with low available memory (and of course the developers, with their 32GB workstations, never noticed).

Here’s what often happens in this case: a ticket lands on some web developer’s desk that says “Memory usage is too high, fix it.” The developer thinks to themselves, “I’ve never given much thought to memory usage, well let’s take a stab at this.” At some point they probably open up DevTools, click “Memory,” click “Take snapshot,” and… it’s a mess. Because it turns out that the SPA leaks, has always leaked, and in fact has multiple leaks that have accumulated over time. The developer assumes this is some kind of sudden-onset disease, when in fact it’s a pre-existing condition that has gradually escalated to stage-4.

The funny thing is that the source of the leak – the event listener, the subscriber, whatever – might not even be the proximate cause of the recent crisis. It might have been there all along, and was originally a tiny 1 MB leak nobody noticed, until suddenly someone attached a much bigger object to the existing leak, and now it’s a 100 MB leak that no one can ignore.

Unfortunately to get there, you’re going to have to hack your way through the jungle of the half-dozen other leaks that you ignored up to this point. (We fixed the leak! Oh wait, no we didn’t. We fixed the other leak! Oh wait, there’s still one more…) But that’s how it goes when you ignore a chronic but steadily worsening illness until the moment it becomes a crisis.

Hard to diagnose

This brings us to the second point: memory leaks are hard to diagnose. I’ve already written a lot about this, and I won’t rehash old content. Suffice it to say, the tooling is not really up to the task (despite some nice recent innovations), even if you’re a veteran with years of web development experience. Some gotchas that tripped me up include the fact that you have to ignore WeakMaps and circular references, and that the DevTools console itself can leak memory.

Oh and also, browsers themselves can have memory leaks! For instance, see these ResizeObserver/IntersectionObserver leaks in Chromium, Firefox, and Safari (fixed in all but Firefox), or this Chromium leak in lazy-loading images (not fixed), or this discussion of a leak in Safari. Of course, the tooling will not help you distinguish between browser leaks and web page leaks, so you just kinda have to know this stuff. In short: good luck!

Even with the tool that I’ve written, fuite, I won’t claim that we’ve reached a golden age of memory leak debugging. My tool is better than what’s out there, but that’s not saying much. It can catch the dumb stuff, such as leaking event listeners and DOM nodes, and for the more complex stuff like leaking collections (Arrays, Maps, etc.), it can at least point you in the right direction. But it’s still up to the web developer to decide which leaks are worth chasing (some are trivial, others are massive), and to track them down.

I still believe that the browser DevTools (or perhaps professional testing tools, such as Cypress or Sentry), should be the ones to handle this kind of thing. The browser especially is in a much better position to figure out why memory is leaking, and to point the web developer towards solutions. fuite is the best I could do with userland tooling (such as Puppeteer), but overall I’d still say we’re in the Stone Age, not the Space Age. (Maybe fuite pushed us to the Bronze Age, if I’m being generous to myself.)

Trivial to fix once diagnosed

Here’s the really surprising thing about memory leaks, though, and perhaps the reason I find them so addictive and keep coming back to them: once you figure out where the leak is coming from, they’re usually trivial to fix. For instance:

  • You called addEventListener but forgot to call removeEventListener.
  • You called setInterval, but forgot to call clearInterval when the component unloaded.
  • You added a DOM node, but forgot to remove it when the page transitions away.
  • Etc.

You might have a multi-MB leak, and the fix is one line of code. That’s a massive bang-for-the-buck! That is, if you discount the days of work it might have taken to find that line of code.

This is where I would like to go with fuite. It would be amazing if you could just point a tool at your website and have it tell you exactly which line of code caused a leak. (It’d be even better if it could open a pull request to fix the leak, but hey, let’s not get ahead of ourselves.)

I’ve taken some baby steps in this direction by adding stacktraces for leaking collections. So for instance, if you have an Array that is growing by 1 on every user interaction, fuite can tell you which line of code actually called Array.push(). This is a huge improvement over v1.0 of fuite (which just told you the Array was leaking, but not why), and although there are edge cases where it doesn’t work, I’m pretty proud of this feature. My goal is to expand this to other leaks (event listeners, DOM nodes, etc.), although since this is just a tool I’m building in my spare time, we’ll see if I get to it.

Screenshot of console output showing leaking collections and stacktraces for each

fuite showing stacktraces for leaking collections.

After releasing this tool, I also learned that Facebook has built a similar tool and is planning to open-source it soon. That’s great! I’m excited to see how it works, and I’m hoping that having more tools in this space will help us move past the Stone Age of memory leak debugging.

Conclusion

So to bring it back around: should you care about memory leaks? Well, if your boss is yelling at you because customers are complaining about Out Of Memory crashes, then yeah, you absolutely should. Are you leaking 5 MB, and nobody has complained yet? Well, maybe an ounce of prevention is worth a pound of cure in this case. If you start fixing your memory leaks now, it might avoid that crisis in the future when 5 MB suddenly grows to 50 MB.

Alternatively, are you leaking a measly ~1 kB because your routing library is appending some metadata to an Array? Well, maybe you can let that one slide. (fuite will still report this leak, but I would argue that it’s not worth fixing.)

On the other hand, all of these leaks are important in some sense, because even thinking about them shows a dedication to craftsmanship that is (in my opinion) too often lacking in web development. People write a web app, they throw something buggy over the wall, and then they rewrite their frontend four years later after users are complaining too much. I see this all the time when I observe how my wife uses her computer – she’s constantly telling me that some app gets slower or buggier the longer she uses it, until she gives up and refreshes. Whenever I help her with her computer troubles, I feel like I have to make excuses for my entire industry, for why we feel it’s acceptable to waste our users’ time with shoddy, half-baked software.

Maybe I’m just a dreamer and an idealist, but I really enjoy putting that final polish on something and feeling proud of what I’ve created. I notice, too, when the software I use has that extra touch of love and care – and it gives me more confidence in the product and the team behind it. When I press the back button and it doesn’t work, I lose a bit of trust. When I press Esc on a modal and it doesn’t close, I lose a bit of trust. And if an app keeps slowing down until I’m forced to refresh, or if I notice the memory steadily creeping up, I lose a bit of trust. I would like to think that fixing memory leaks is part of that extra polish that won’t necessarily win you a lot of accolades, but your users will subtly notice, and it will build their confidence in your software.

Thanks to Jake Archibald and Todd Reifsteck for feedback on a draft of this post.

Introducing fuite: a tool for finding memory leaks in web apps

Debugging memory leaks in web apps is hard. The tooling exists, but it’s complicated, cumbersome, and often doesn’t answer the simple question: Why is my app leaking memory?

Because of this, I’d wager that most web developers are not actively monitoring for memory leaks. And of course, if you’re not testing something, it’s easy for bugs to slip through.

When I first started looking into memory leaks, I assumed it was a rare thing. How could JavaScript – a language with an automatic garbage collector – be a big source of memory leaks? But the more I learned, the more I suspected that memory leaks were actually quite common in Single Page Apps (SPAs) – it’s just that nobody is testing for it!

Since most web developers aren’t fiddling with the Chrome memory tools for the fun of it, they probably won’t notice a leak until the browser tab crashes with an Out Of Memory error, or the page slows down, or someone happens to open up the Task Manager and notice that a website is using many megabytes (or even gigabytes!) of memory. But at that point, it’s gotten bad enough that there may be multiple leaks on the same page.

I’ve written about memory leaks in the past, but my advice basically boils down to: “Use the Chrome DevTools, follow these dozen tedious steps, and then maybe you can figure out why your page is leaking.” This is not a great developer experience, and I’m sure many readers just shook their heads in despair and moved on. It would be much better if a tool could find memory leaks automatically.

That’s why I wrote fuite (French for “leak”). fuite is a CLI tool that you can point at any URL, and it will analyze the page for memory leaks:

npx fuite https://example.com

That’s it! By default, it assumes that the site is a client-rendered SPA, and it will crawl the page for internal links (such as /about or /contact). Then, for each link, it runs the following steps:

  1. Click the link
  2. Press the browser back button
  3. Repeat to see if memory grows

If fuite finds any leaks, it will show which objects are suspected of causing the leak:

Test         : Go to /foo and back
Memory change: +10 MB
Leak detected: Yes

Leaking objects:

| Object            | # added | Retained size increase |
| ----------------- | ------- | ---------------------- |
| HTMLIFrameElement | 1       | +10 MB                 |

Leaking event listeners:

| Event        | # added | Nodes  |
| ------------ | ------- | ------ |
| beforeunload | 2       | Window |

Leaking DOM nodes:

DOM size grew by 6 node(s)  

To do this, fuite uses the basic strategy outlined in my blog post. It will launch Chrome, run some scenario n number of times (7 by default) and see if any objects are leaking a multiple of n times (7, 14, 21, etc.).

fuite will also analyze any Arrays, Objects, Maps, Sets, event listeners, and the overall DOM to see if any of those are leaking. For instance, if an Array grows by exactly 7 after 7 iterations, then it’s probably leaking.

Testing real-world websites

Somewhat surprisingly, the “basic” scenario of clicking internal links and pressing the back button is enough to find memory leaks in many SPAs. I tested fuite against the home pages for 10 popular frontend frameworks, and found leaks in all of them:

Site Leak detected Internal links Average growth Max growth
Site 1 yes 8 27.2 kB 43 kB
Site 2 yes 10 50.4 kB 78.9 kB
Site 3 yes 27 98.8 kB 135 kB
Site 4 yes 8 180 kB 212 kB
Site 5 yes 13 266 kB 1.07 MB
Site 6 yes 8 638 kB 1.15 MB
Site 7 yes 7 1.37 MB 2.25 MB
Site 8 yes 15 3.49 MB 4.28 MB
Site 9 yes 43 5.57 MB 7.37 MB
Site 10 yes 16 14.9 MB 186 MB

In this case, “internal links” refers to the number of internal links tested, “average growth” refers to the average memory growth for every link (i.e. clicking it and then pressing the back button), and “max growth” refers to whichever internal link was leaking the most. Note that these numbers don’t include one-time setup costs, as fuite does one preflight iteration before the normal 7 iterations.

To confirm these results yourself, you can use the Chrome DevTools Memory tab. Here is a screenshot of the worst-performing site from my set, where I click a link, press the back button, take a heap snapshot, and repeat:

Screenshot of the Chrome DevTools memory heapsnapshots list, showing memory starting at 18.7MB and increasing by roughly 6MB every iteration until reaching 41 MB on iteration 5

On this particular site, memory grows by about 6 MB every time you click a link and go back.

To avoid naming and shaming, I haven’t listed the actual websites. The point is just to show a representative sample of some popular SPAs – the authors of those websites are free to run fuite themselves and track down these leaks. (Please do!)

Caveats

Note, though, that not every leak in an SPA is an egregious problem that needs to be addressed. SPAs need to, for example, maintain the focus and scroll state to properly support accessibility, which means that there may be some small metadata that is stored for every page navigation. fuite will dutifully report such leaks (because they are leaks), but it’s up to the developer to decide if a tiny leak is worth chasing or not.

Some memory growth may also be due to browser-internal changes (such as JITing), which the web page can’t really control. So the memory growth numbers are an imperfect measure of what you stand to gain by fixing leaks – it could very well be that a few kBs of growth are unavoidable. (Although fuite tries to ignore browser-internal growth, and will only say “leaks detected” if there is actionable advice for the web developer.)

In rare cases, some memory growth may also be due to outright browser bugs. While analyzing the sites above, I actually found one (Site #4) that seems to be suffering from this Chrome bug due to <img loading="lazy"> not being unloaded. Unfortunately it’d be hard for fuite to detect browser bugs, so if you’re mystified by a leak, it’s good to cross-check against other browsers!

Also note that it’s almost impossible for a Multi-Page App (MPA) to leak, because the browser clears memory on every page navigation. (Assuming no browser bugs, of course.) During my testing, I found two frontend frameworks whose home pages were MPAs, and unsurprisingly, fuite couldn’t find any leaks in them. These were excluded from the results above.

Memory leaks are more of a concern for SPAs, where memory isn’t cleared automatically on each navigation. fuite is primarily designed for SPAs, although you can run it on MPAs too.

fuite currently only measures the JavaScript heap memory in the main frame of the page, so cross-origin iframes, Web Workers, and Service Workers are not measured. Something like performance.measureUserAgentSpecificMemory() would be more accurate, but it’s only available in cross-origin isolated contexts, so it’s not practical for a general-purpose tool right now.

Other memory leak scenarios

The “crawl for internal links” scenario is just the default one – you can also build your own. fuite is built on top of Puppeteer, so for whatever scenario you want to test, you essentially just need to write a Puppeteer script to tell the browser what to do. Some common scenarios you might test are:

  • Open a modal dialog and then close it
  • Hover over an element to show a tooltip, then mouse away to dismiss it
  • Scroll through an infinite-loading list, then navigate away and back
  • Etc.

In each of these scenarios, you would expect memory to be the same before and after. But of course, it’s not always so simple with web apps! You may be surprised how many of your dialogs and tooltips are harboring memory leaks.

To analyze leaks, fuite captures heap snapshot files, which you can load in the Chrome DevTools to inspect. It also has a --debug mode that you can use for more fine-grained analysis: stepping through the test as it’s running, debugging the browser in real-time, analyzing the leaking objects, etc.

Under the hood, fuite is a fairly basic tool, and I won’t claim that it can do 100% of the work of fixing memory leaks. There is still the human component of figuring out why your objects were allocated and retained, and then finding a reasonable fix. But my goal is to automate ~95% of the work, so that it actually becomes achievable to fix memory leaks in web apps.

You can find fuite on GitHub. Happy leak hunting!

Update: I made a video tutorial showing how to debug memory leaks with fuite.

One weird trick to improve your website’s performance

Every so often, I come across a web performance post from what I like to call the “one weird trick” genre. It goes something like this:

“I improved my page load time by 50% by adding one line of CSS!”

or

“It’s 2x faster to use this JavaScript API than this other one!”

The thing is, I love a good performance post. I love when someone finds some odd little unexplored corner of browser performance and shines a light on it. It might actually provide some good data that can influence framework authors, library developers, and even browser vendors to improve their performance.

But more often than not, the “one weird trick” genre drives me nuts, because of what’s not included in the post:

  • Did you test on multiple browsers?
  • Did you profile to try to understand why something is slower or faster?
  • Did you publish your benchmark so that others can verify your results?

That’s why I wrote “How to write about web performance”, where I tried to summarize everything that I think makes for a great web perf post. But of course, not everyone reads my blog religiously (how dare they?), so the “one weird trick” genre continues unabated.

Look, I get it. Writing about performance is hard. And we’re not all experts. I’ve made the same mistakes myself, in posts like “High performance web worker messages” (2016) – where I found the “one weird trick” that it’s faster to stringify an object before sending it to a web worker. Of course this makes little sense (the browser should be able to serialize the object faster than you can do it yourself), and Surma has demonstrated that there’s no need to do this stringify dance in modern versions of Chrome. (As I’ve said before: if you’re not wrong about web perf today, you’ll be wrong tomorrow when browsers change!)

That said, I do occasionally find a post that really exemplifies what’s great about the web perf genre. For instance, this post by Eoin Hennessy about improving Webpack performance really ticks all the boxes. The author wasn’t satisfied with finding “one weird trick” – they had to understand why the trick worked. So they actually went to the trouble of building Node from source (!) to find the true root cause, and they even submitted a patch to Webpack to fix it.

A post like this, like a good mystery novel, has everything that makes for a satisfying story: the problem, the search, the resolution, the ending. Unlike the “one weird trick” posts, this one doesn’t leave me craving more. Instead, it leaves me feeling like I truly learned something about how browser engines work.

So if you’ve found “one weird trick,” that’s great! There might actually be something really interesting there. But unless you do the extra research, it’s hard to say more than just “Well, this technique worked for me, on my website, in Chrome, in this scenario…” (etc.). If you want to extrapolate from your results to something more widely-applicable, you have to put in the work.

So here are some things you can do. Test in multiple browsers. File a browser bug if one is slower than the others. Ask around if you know any web perf experts or folks who work at browser vendors. Take a performance profile. And if you put in just a bit of extra effort, you might find more than “one weird trick” – you might find a valuable learning opportunity for web developers, browser vendors, or anyone interested in how the web works.

How to write about web performance

I’ve been writing about performance for a long time. I like to think I’ve gotten pretty good at it, but sometimes I look back on my older blog posts and cringe at the mistakes I made.

This post is an attempt to distill some of what I’ve learned over the years to offer as advice to other aspiring tinkerers, benchmarkers, and anyone curious about how browsers actually work when you put them to the test.

Why write about web performance?

The first and maybe most obvious question is: why bother? Why write about web performance? Isn’t this something that’s better left to the browser vendors themselves?

In some ways, this is true. Browser vendors know how their product actually works. If some part of the system is performing slowly, you can go knock on the door of your colleague who wrote the code and ask them why it’s slow. (Or send them a DM, I suppose, in our post-pandemic world.)

But in other ways, browser vendors really aren’t in a good position to talk frankly about web performance. Browsers are in the business of selling browsers. Web performance claims are often used in marketing, with claims like “Browser X is 25% faster than Browser Y,” which might need to get approved by the marketing department, the legal department, not to mention various owners and stakeholders…

And that’s only if your browser is the fast one. If you run a benchmark and it turns out that your browser is the slow one, or it’s a mixed bag, then browser vendors will keep pretty quiet about it. This is why whenever a browser vendor releases a new benchmark, surprise surprise! Their browser wins. So the browser vendors’ hands are pretty tied when it comes to accurately writing about how their product actually works.

Of course, there are exceptions to this rule. Occasionally you will find someone’s personal blog, or a comment on a public bugtracker, which betrays that their browser is actually not so great in some benchmark. But nobody is going to go out of their way to sing from the mountaintops about how lousy their browser is in a benchmark. If anything, they’ll talk about it after they’ve done the work to make things faster, meaning the benchmark already went through several rounds of internal discussion, and was maybe used to evaluate some internal initiative to improve performance – a process that might last years before the public actually hears about it.

Other times, browser vendors will release a new feature, announce it with some fanfare, and then make vague claims about how it improves performance without delving into any specifics. If you actually look into these claims, though, you might find that the performance improvement is pretty meager, or it only manifests in a specific context. (Don’t expect the team who built the feature to eagerly tell you this, though.)

By the way, I don’t blame the browser vendors at all for this situation. I worked on the performance team at Microsoft Edge (back in the EdgeHTML days, before the switch to Chromium), and I did the same stuff. I wrote about scrolling performance because, at the time, our engine was the best at scrolling. I wrote about input responsiveness after we had already made it faster. (Not before! Definitely not before.) I designed benchmarks that explicitly showed off the improvements we had made. I worked on marketing videos that showed our browser winning in experiments where we already knew we’d win.

And if you think I’m picking on Microsoft, I could easily find examples of the other browser vendors doing the same thing. But I choose not to, because I’d rather pick on myself. (If you work for a browser vendor and are reading this, I’m sure some examples come to mind.)

Don’t expect a car company to tell you that their competitor has better mileage. Don’t expect them to admit that their new model has a lousy safety rating. That’s what Consumer Reports is for. In the same way, if you don’t work at a browser vendor (I don’t, anymore), then you are blessedly free to say whatever you want about browsers, and to honestly assess their claims and compare them to each other in fair, unbiased benchmarks.

Plus, as a web developer, you might actually be in a better position to write a benchmark that is more representative of real-world code. Browser developers spend most of their day writing C, C++, and Rust, not necessarily HTML, CSS, and JavaScript. So they aren’t always familiar with the day-to-day concerns of working web developers.

The subtle science of benchmarking

Okay, so that was my long diatribe about why you’d bother writing about web performance. So how do you actually go about doing it?

First off, I’d say to write the benchmark before you start writing your blog post. Your conclusions and high-level takeaways may be vastly different depending on the results of the benchmark. So don’t assume you already know what the results are going to be.

I’ve made this mistake in the past! Once, I wrote an entire blog post before writing the benchmark, and then the benchmark completely upended what I was going to say in the post. I had to scrap the whole thing and start from scratch.

Benchmarking is science, and you should treat it with the seriousness of a scientific endeavor. Expect peer review, which means – most importantly! – publish your benchmark publicly and provide instructions for others to test it. Because believe me, they will! I’ve had folks notify me of a bug in my benchmark after I published a post, so I had to go back and edit it to correct the results. (This is annoying, and embarrassing, but it’s better than willfully spreading misinformation.)

Since you may end up running your benchmark multiple times, and even generating your charts and tables multiple times, make an effort to streamline the process of gathering the data. If some step is manual, try to automate it.

These days, I like Tachometer because it automates a lot of the boring parts of benchmarking – launching a browser, taking a measurement, taking multiple measurements, taking enough measurements to achieve statistical significance, etc. Unfortunately it doesn’t automate the part where you generate charts and graphs, but I usually write small scripts to output the data in a format where I can easily import it into a spreadsheet app.

This also leads to an important point: take accurate measurements. A common mistake is to use Date.now() – instead, you should use performance.now(), since this gives you a high-resolution timestamp. Or even better, use performance.mark() and performance.measure() – these are also high-resolution, but with the added benefit that you can actually see your measurements laid out visually in the Chrome DevTools. This is a great way to double-check that you’re actually measuring what you think you’re measuring.

Screenshot of a performance trace in Chrome DevTools with an annotation in the User Timing section for the "total" span saying "What the benchmark is measuring" and stacktraces in the main thread with the annotation "What the browser is doing"

Note: Sadly, the Firefox and Safari DevTools still don’t show performance marks/measures in their profiler traces. They really should; IE11 had this feature years ago.

As mentioned above, it’s also a good idea to take multiple measurements. Benchmarks will always show variance, and you can prove just about anything if you only take one sample. For best results, I’d say take at least three measurements and then calculate the median, or better yet, use a tool like Tachometer that will use a bunch of fancy statistics to find the ideal number of samples.

Humility

Writing about web performance is really hard, so it’s important to be humble. Browsers are incredibly complex, so you have to accept that you will probably be wrong about something. And if you’re not wrong, then you will be wrong in 5 years when browsers update their engines to make your results obsolete.

There are a few ways you can limit your likelihood of wrongness, though. Here are a few strategies that have worked well for me in the past.

First off, test in multiple browser engines. This is a good way to figure out if you’ve identified a quirk in a particular browser, or a fundamental truth about how the web works. Heck, if you write a benchmark where one browser performs much more poorly than the other ones, then congratulations! You’ve found a browser bug, and now you have a reproducible test case that you can file on that browser.

(And if you think they won’t be grateful or won’t fix the problem, then prepare to be surprised. I’ve filed several such bugs on browsers, and they usually at least acknowledge the issue if not outright fix it within a few releases. Sometimes browser developers are grateful when you file a bug like this, because they might already know something is a problem, but without bug reports from customers, they weren’t able to convince management to prioritize it.)

Second, reduce the variables. Test on the same hardware, if possible. (For testing the three major browser engines – Blink, Gecko, and WebKit – this sadly means you’re always testing on macOS. Do not trust WebKit on Windows/Linux; I’ve found its performance to be vastly different from Safari’s.) Browsers can differ based on whether the device is plugged into power or has low battery, so make sure that the device is plugged in and charged. Don’t run other applications or browser windows or tabs while you’re running the benchmark. If networking is involved, use a local server if possible to eliminate latency. (Or configure the server to always respond with a particular delay, or use throttling, as necessary.) Update all the browsers before running the test.

Third, be aware of caching. It’s easy to fool yourself if you run 1,000 iterations of something, and it turns out that the last 999 iterations are all cached. JavaScript engines have JIT compilers, meaning that the first iteration can be different from the second iteration, which can be different from the third, etc. If you think you can figure out something low-level like “Is const faster than let?”, you probably can’t, because the JIT will outsmart you. Browsers also have bytecode caching, which means that the first page load may be different from the second, and the second may even be different from the third. (Tachometer works around this by using a fresh browser tab each iteration, which is clever.)

My point here is that, for all of your hard work to do rigorous, scientific benchmarking, you may just turn out to be wrong. You’ll publish your blog post, you’ll feel very proud of yourself, and then a browser engineer will contact you privately and say, “You know, it only works like this on a 60FPS monitor.” Or “only on Intel CPUs.” Or “only on macOS Big Sur.” Or “only if your DOM size is greater than 1,000 and the layer depth is above 10 and you’re using a trackball mouse and it’s a Tuesday and the moon is in the seventh house.”

There are so many variables in browser performance, and you can’t possibly capture them all. The best you can do is document your methodology, explain what your benchmark doesn’t test, and try not to make grand sweeping statements like, “You should always use const instead of let; my benchmark proves it’s faster.” At best, your benchmark proves that one number is higher than another in your very specific benchmark in the very specific way you tested it, and you have to be satisfied with that.

Conclusion

Writing about browser performance is hard, but it’s not fruitless. I’ve had enough successes over the years (and enough stubbornness and curiosity, I guess) that I keep doing it.

For instance, I wrote about how bundlers like Rollup produced faster JavaScript files than bundlers like Webpack, and Webpack eventually improved its implementation. I filed a bug on Firefox and Chrome showing that Safari had an optimization they didn’t, and both browsers fixed it, so now all three browsers are fast on the benchmark. I wrote a silly JavaScript “optimizer” that the V8 team used to improve their performance.

I bring up all these examples less to brag, and more to show that it is possible to improve things by simply writing about them. In all three of the above cases, I actually made mistakes in my benchmarks (pretty dumb ones, in some cases), and had to go back and fix it later. But if you can get enough traction and get the right people’s attention, then the browsers and bundlers and frameworks can change, without you having to actually write the code to do it. (To this day, I can’t write a line of C, C++, or Rust, but I’ve influenced browser vendors to write it for me, which aligns with my goal of spending more time playing Tetris than learning new programming languages.)

My point in writing all this is to try to convince you (if you’ve read this far) that it is indeed valuable for you to write about web performance. Even if you don’t feel like you really understand how browsers work. Even if you’re just getting started as a web developer. Even if you’re just curious, and you want to poke around at browsers to see how they tick. At worst you’ll be wrong (which I’ve been many times), and at best you might teach others about performant programming patterns, or even influence the ecosystem to change and make things better for everyone.

There are plenty of upsides, and all you need is an HTML file and a bit of patience. So if that sounds interesting to you, get started and have fun benchmarking!

Speeding up IndexedDB reads and writes

Recently I read James Long’s article “A future for SQL on the web”. It’s a great post, and if you haven’t read it, you should definitely go take a look!

I don’t want to comment on the specifics of the tool James created, except to say that I think it’s a truly amazing feat of engineering, and I’m excited to see where it goes in the future. But one thing in the post that caught my eye was the benchmark comparisons of IndexedDB read/write performance (compared to James’s tool, absurd-sql).

The IndexedDB benchmarks are fair enough, in that they demonstrate the idiomatic usage of IndexedDB. But in this post, I’d like to show how raw IndexedDB performance can be improved using a few tricks that are available as of IndexedDB v2 and v3:

  • Pagination (v2)
  • Relaxed durability (v3)
  • Explicit transaction commits (v3)

Let’s go over each of these in turn.

Pagination

Years ago when I was working on PouchDB, I hit upon an IndexedDB pattern that, at the time, improved performance in Firefox and Chrome by roughly 40-50%. I’m probably not the first person to come up with this idea, but I’ll lay it out here.

In IndexedDB, a cursor is basically a way of iterating through the data in a database one-at-a-time. And that’s the core problem: one-at-a-time. Sadly, this tends to be slow, because at every step of the iteration, JavaScript can respond to a single item from the cursor and decide whether to continue or stop the iteration.

Effectively this means that there’s a back-and-forth between the JavaScript main thread and the IndexedDB engine (running off-main-thread). You can see it in this screenshot of the Chrome DevTools performance profiler:

Screenshot of Chrome DevTools profiler showing multiple small tasks separated by a small amount of idle time each

Or in Chrome tracing, which shows a bit more detail:

Screenshot of Chrome tracing tool showing multiple separate tasks, separated by a bit of idle time. The top of each task says RunNormalPriorityTask, and near the bottom each one says IDBCursor continue.

Notice that each call to cursor.continue() gets its own little JavaScript task, and the tasks are separated by a bit of idle time. That’s a lot of wasted time for each item in a database!

Luckily, in IndexedDB v2, we got two new APIs to help out with this problem: getAll() and getAllKeys(). These allow you to fetch multiple items from an object store or index in a single go. They can also start from a given key range and return a given number of items, meaning that we can implement a paginated cursor:

const batchSize = 100
let keys, values, keyRange = null

function fetchMore() {
  // If there could be more results, fetch them
  if (keys && values && values.length === batchSize) {
    // Find keys greater than the last key
    keyRange = IDBKeyRange.lowerBound(keys.at(-1), true)
    keys = values = undefined
    next()
  }
}

function next() {
  store.getAllKeys(keyRange, batchSize).onsuccess = e => {
    keys = e.target.result
    fetchMore()
  }
  store.getAll(keyRange, batchSize).onsuccess = e => {
    values = e.target.result
    fetchMore()
  }
}

next()

In the example above, we iterate through the object store, fetching 100 items at a time rather than just 1. Using a modified version of the absurd-sql benchmark, we can see that this improves performance considerably. Here are the results for the “read” benchmark in Chrome:

Chart image, see table below

Click for table

DB size (columns) vs batch size (rows):

100 1000 10000 50000
1 8.9 37.4 241 1194.2
100 7.3 34 145.1 702.8
1000 6.5 27.9 100.3 488.3

(Note that a batch size of 1 means a cursor, whereas 100 and 1000 use a paginated cursor.)

And here’s Firefox:

Chart image, see table below

Click for table

DB size (columns) vs batch size (rows):

100 1000 10000 50000
1 2 15 125 610
100 2 9 70 468
1000 2 8 51 271

And Safari:

Chart image, see table below

Click for table

DB size (columns) vs batch size (rows):

100 1000 10000 50000
1 11 106 957 4673
100 1 5 44 227
1000 1 3 26 127

All benchmarks were run on a 2015 MacBook Pro, using Chrome 92, Firefox 91, and Safari 14.1. Tachometer was configured with 15 minimum iterations, a 1% horizon, and a 10-minute timeout. I’m reporting the median of all iterations.

As you can see, the paginated cursor is particularly effective in Safari, but it improves performance in all browser engines.

Now, this technique isn’t without its downsides. For one, you have to choose an explicit batch size, and the ideal number will depend on the size of the data and the usage patterns. You may also want to consider the downsides of overfetching – i.e. if the cursor should stop at a given value, you may end up fetching more items from the database than you really need. (Although ideally, you can use the upper bound of the key range to guard against that.)

The main downside of this technique is that it only works in one direction: you cannot build a paginated cursor in descending order. This is a flaw in the IndexedDB specification, and there are ideas to fix it, but currently it’s not possible.

Of course, instead of implementing a paginated cursor, you could also just use getAll() and getAllKeys() as-is and fetch all the data at once. This probably isn’t a great idea if the database is large, though, as you may run into memory pressure, especially on constrained devices. But it could be useful if the database is small.

getAll() and getAllKeys() both have great browser support, so this technique can be widely adopted for speeding up IndexedDB read patterns, at least in ascending order.

Relaxed durability

The paginated cursor can speed up database reads, but what about writes? In this case, we don’t have an equivalent to getAll()/getAllKeys() that we can lean on. Apparently there was some effort put into building a putAll(), but currently it’s abandoned because it didn’t actually improve write performance in Chrome.

That said, there are other ways to improve write performance. Unfortunately, none of these techniques are as effective as the paginated cursor, but they are worth investigating, so I’m reporting my results here.

The most significant way to improve write performance is with relaxed durability. This API is currently only available in Chrome, but it has also been implemented in WebKit as of Safari Technology Preview 130.

The idea behind relaxed durability is to resolve some disagreement between the browser vendors as to whether IndexedDB transactions should optimize for durability (writes succeed even in the event of a power failure or crash) or performance (writes succeed quickly, even if not fully flushed to disk).

It’s been well documented that Chrome’s IndexedDB performance is worse than Firefox’s or Safari’s, and part of the reason seems to be that Chrome defaults to a durable-by-default mode. But rather than sacrifice durability across-the-board, the Chrome team wanted to expose an explicit API for developers to decide which mode to use. (After all, only the web developer knows if IndexedDB is being used as an ephemeral cache or a store of priceless family photos.) So now we have three durability options: default, relaxed, and strict.

Using the “write” benchmark, we can test out relaxed durability in Chrome and see the improvement:

Chart image, see table below

Click for table
Durability 100 1000 10000 50000
Default 26.4 125.9 1373.7 7171.9
Relaxed 17.1 112.9 1359.3 6969.8

As you can see, the results are not as dramatic as with the pagination technique. The effect is most visible in the smaller database sizes, and the reason turns out to be that relaxed durability is better at speeding up multiple small transactions than one big transaction.

Modifying the benchmark to do one transaction per item in the database, we can see a much clearer impact of relaxed durability:

Chart image, see table below

Click for table
Durability 100 1000
Default 1074.6 10456.2
Relaxed 65.4 630.7

(I didn’t measure the larger database sizes, because they were too slow, and the pattern is clear.)

Personally, I find this option to be nice-to-have, but underwhelming. If performance is only really improved for multiple small transactions, then usually there is a simpler solution: use fewer transactions.

It’s also underwhelming given that, even with this option enabled, Chrome is still much slower than Firefox or Safari:

Chart image, see table below

Click for table
Browser 100 1000 10000 50000
Chrome (default) 26.4 125.9 1373.7 7171.9
Chrome (relaxed) 17.1 112.9 1359.3 6969.8
Firefox 8 53 436 1893
Safari 3 28 279 1359

That said, if you’re not storing priceless family photos in IndexedDB, I can’t see a good reason not to use relaxed durability.

Explicit transaction commits

The last technique I’ll cover is explicit transaction commits. I found it to be an even smaller performance improvement than relaxed durability, but it’s worth mentioning.

This API is available in both Chrome and Firefox, and (like relaxed durability) has also been implemented in Safari Technology Preview 130.

The idea is that, instead of allowing the transaction to auto-close based on the normal flow of the JavaScript event loop, you can explicitly call transaction.close() to signal that it’s safe to close the transaction immediately. This results in a very small performance boost because the IndexedDB engine is no longer waiting for outstanding requests to be dispatched. Here is the improvement in Chrome using the “write” benchmark:

Chart image, see table below

Click for table
Relaxed / Commit 100 1000 10000 50000
relaxed=false, commit=false 26.4 125.9 1373.7 7171.9
relaxed=false, commit=true 26 125.5 1373.9 7129.7
relaxed=true, commit=false 17.1 112.9 1359.3 6969.8
relaxed=true, commit=true 16.8 112.8 1356.2 7215

You’d really have to squint to see the improvement, and only for the smaller database sizes. This makes sense, since explicit commits can only shave a bit of time off the end of each transaction. So, like relaxed durability, it has a bigger impact on multiple small transactions than one big transaction.

The results are similarly underwhelming in Firefox:

Chart image, see table below

Click for table
Commit 100 1000 10000 50000
No commit 8 53 436 1893
Commit 8 52 434 1858

That said, especially if you’re doing multiple small transactions, you might as well use it. Since it’s not supported in all browsers, though, you’ll probably want to use a pattern like this:

if (transaction.commit) {
  transaction.commit()
}

If transaction.commit is undefined, then the transaction can just close automatically, and functionally it’s the same.

Update: Daniel Murphy points out that transaction.commit() can have bigger perf gains if the page is busy with other JavaScript tasks, which would delay the auto-closing of the transaction. This is a good point! My benchmark doesn’t measure this.

Conclusion

IndexedDB has a lot of detractors, and I think most of the criticism is justified. The IndexedDB API is awkward, it has bugs and gotchas in various browser implementations, and it’s not even particularly fast, especially compared to a full-featured, battle-hardened, industry-standard tool like SQLite. The new APIs unveiled in IndexedDB v3 don’t even move the needle much. It’s no surprise that many developers just say “forget it” and stick with localStorage, or they create elaborate solutions on top of IndexedDB, such as absurd-sql.

Perhaps I just have Stockholm syndrome from having worked with IndexedDB for so long, but I don’t find it to be so bad. The nomenclature and the APIs are a bit weird, but once you wrap your head around it, it’s a powerful tool with broad browser support – heck, it even works in Node.js via fake-indexeddb and indexeddbshim. For better or worse, IndexedDB is here to stay.

That said, I can definitely see a future where IndexedDB is not the only player in the browser storage game. We had WebSQL, and it’s long gone (even though I’m still maintaining a Node.js port!), but that hasn’t stopped people from wanting a more high-level database API in the browser, as demonstrated by tools like absurd-sql. In the future, I can imagine something like the Storage Foundation API making it more straightforward to build custom databases of top of low-level storage primitives – which is what IndexedDB was designed to do, and arguably failed at. (PouchDB, for one, makes extensive use of IndexedDB’s capabilities, but I’ve seen plenty of storage wrappers that essentially use IndexedDB as a dumb key-value store.)

I’d also like to see the browser vendors (especially Chrome) improve their IndexedDB performance. The Chrome team has said that they’re focused on read performance rather than write performance, but really, both matter. A mobile app developer can ship a prebuilt SQLite .db file in their app; in terms of quickly populating a database, there is nothing even remotely close for IndexedDB. As demonstrated above, cursor performance is also not great

For those web developers sticking it out with IndexedDB, though, I hope I’ve made a case that it’s not completely a lost cause, and that its performance can be improved. Who knows: maybe the browser vendors still have some tricks up their sleeves, especially if we web developers keep complaining about IndexedDB performance. It’ll be interesting to watch this space evolve and to see how both IndexedDB and its alternatives improve over the years.

Does shadow DOM improve style performance?

Update: I wrote a follow-up post on this topic.

Short answer: Kinda. It depends. And it might not be enough to make a big difference in the average web app. But it’s worth understanding why.

First off, let’s review the browser’s rendering pipeline, and why we might even speculate that shadow DOM could improve its performance. Two fundamental parts of the rendering process are style calculation and layout calculation, or simply “style” and “layout.” The first part is about figuring out which DOM nodes have which styles (based on CSS), and the second part is about figuring out where to actually place those DOM nodes on the page (using the styles calculated in the previous step).

Screenshot of Chrome DevTools showing a performance trace with JavaScript stacks followed by a purple Style/Layout region and green Paint region

A performance trace in Chrome DevTools, showing the basic JavaScript → Style → Layout → Paint pipeline.

Browsers are complex, but in general, the more DOM nodes and CSS rules on a page, the longer it will take to run the style and layout steps. One of the ways we can improve the performance of this process is to break up the work into smaller chunks, i.e. encapsulation.

For layout encapsulation, we have CSS containment. This has already been covered in other articles, so I won’t rehash it here. Suffice it to say, I think there’s sufficient evidence that CSS containment can improve performance (I’ve seen it myself), so if you haven’t tried putting contain: content on parts of your UI to see if it improves layout performance, you definitely should!

For style encapsulation, we have something entirely different: shadow DOM. Just like how CSS containment can improve layout performance, shadow DOM should (in theory) be able to improve style performance. Let’s consider why.

What is style calculation?

As mentioned before, style calculation is different from layout calculation. Layout calculation is about the geometry of the page, whereas style calculation is more explicitly about CSS. Basically, it’s the process of taking a rule like:

div > button {
  color: blue;
}

And a DOM tree like:

<div>
  <button></button>
</div>

…and figuring out that the <button> should have color: blue because its parent is a <div>. Roughly speaking, it’s the process of evaluating CSS selectors (div > button in this case).

Now, in the worst case, this is an O(n * m) operation, where n is the number of DOM nodes and m is the number of CSS rules. (I.e. for each DOM node, and for each rule, figure out if they match each other.) Clearly, this isn’t how browsers do it, or else any decently-sized web app would become grindingly slow. Browsers have a lot of optimizations in this area, which is part of the reason that the common advice is not to worry too much about CSS selector performance (see this article for a good, recent treatment of the subject).

That said, if you’ve worked on a non-trivial codebase with a fair amount of CSS, you may notice that, in Chrome performance profiles, the style costs are not zero. Depending on how big or complex your CSS is, you may find that you’re actually spending more time in style calculation than in layout calculation. So it isn’t a completely worthless endeavor to look into style performance.

Shadow DOM and style calculation

Why would shadow DOM improve style performance? Again, it’s because of encapsulation. If you have a CSS file with 1,000 rules, and a DOM tree with 1,000 nodes, the browser doesn’t know in advance which rules apply to which nodes. Even if you authored your CSS with something like CSS Modules, Vue scoped CSS, or Svelte scoped CSS, ultimately you end up with a stylesheet that is only implicitly coupled to the DOM, so the browser has to figure out the relationship at runtime (e.g. using class or attribute selectors).

Shadow DOM is different. With shadow DOM, the browser doesn’t have to guess which rules are scoped to which nodes – it’s right there in the DOM:

<my-component>
  #shadow-root
    <style>div {color: green}</style>
    <div></div>
<my-component>
<another-component>
  #shadow-root
    <style>div {color: blue}</style>
    <div></div>
</another-component>

In this case, the browser doesn’t need to test the div {color: green} rule against every node in the DOM – it knows that it’s scoped to <my-component>. Ditto for the div {color: blue} rule in <another-component>. In theory, this can speed up the style calculation process, because the browser can rely on explicit scoping through shadow DOM rather than implicit scoping through classes or attributes.

Benchmarking it

That’s the theory, but of course things are always more complicated in practice. So I put together a benchmark to measure the style calculation performance of shadow DOM. Certain CSS selectors tend to be faster than others, so for decent coverage, I tested the following selectors:

  • ID (#foo)
  • class (.foo)
  • attribute ([foo])
  • attribute value ([foo="bar"])
  • “silly” ([foo="bar"]:nth-of-type(1n):last-child:not(:nth-of-type(2n)):not(:empty))

Roughly, I would expect IDs and classes to be the fastest, followed by attributes and attribute values, followed by the “silly” selector (thrown in to add something to really make the style engine work).

To measure, I used a simple requestPostAnimationFrame polyfill, which measures the time spent in style, layout, and paint. Here is a screenshot in the Chrome DevTools of what’s being measured (note the “total” under the Timings section):

Screenshot of Chrome DevTools showing a "total" measurement in Timings which corresponds to style, layout, and other purple "rendering" blocks in the "Main" section

To run the actual benchmark, I used Tachometer, which is a nice tool for browser microbenchmarks. In this case, I just took the median of 51 iterations.

The benchmark creates several custom elements, and either attaches a shadow root with its own <style> (shadow DOM “on”) , or uses a global <style> with implicit scoping (shadow DOM “off”). In this way, I wanted to make a fair comparison between shadow DOM itself and shadow DOM “polyfills” – i.e. systems for scoping CSS that don’t rely on shadow DOM.

Each CSS rule looks something like this:

#foo {
  color: #000000;
}

And the DOM structure for each component looks like this:

<div id="foo">hello</div>

(Of course, for attribute and class selectors, the DOM node would have an attribute or class instead.)

Benchmark results

Here are the results in Chrome for 1,000 components and 1 CSS rule for each component:

Chart of Chrome with 1000 components and 1 rule. See tables for full data

Click for table
id class attribute attribute-value silly
Shadow DOM 67.90 67.20 67.30 67.70 69.90
No Shadow DOM 57.50 56.20 120.40 117.10 130.50

As you can see, classes and IDs are about the same with shadow DOM on or off (in fact, it’s a bit faster without shadow DOM). But once the selectors get more interesting (attribute, attribute value, and the “silly” selector), shadow DOM stays roughly constant, whereas the non-shadow DOM version gets more expensive.

We can see this effect even more clearly if we bump it up to 10 CSS rules per component:

Chart of Chrome with 1000 components and 10 rules. See tables for full data

Click for table
id class attribute attribute-value silly
Shadow DOM 70.80 70.60 71.10 72.70 81.50
No Shadow DOM 58.20 58.50 597.10 608.20 740.30

The results above are for Chrome, but we see similar numbers in Firefox and Safari. Here’s Firefox with 1,000 components and 1 rule each:

Chart of Firefox with 1000 components and 1 rule. See tables for full data

Click for table
id class attribute attribute-value silly
Shadow DOM 27 25 25 25 25
No Shadow DOM 18 18 32 32 32

And Firefox with 1,000 components, 10 rules each:

Chart of Firefox with 1000 components and 10 rules. See tables for full data

Click for table
id class attribute attribute-value silly
Shadow DOM 30 30 30 30 34
No Shadow DOM 22 22 143 150 153

And here’s Safari with 1,000 components and 1 rule each:

Chart of Safari with 1000 components and 1 rule. See tables for full data

Click for table
id class attribute attribute-value silly
Shadow DOM 57 58 61 63 64
No Shadow DOM 52 52 126 126 177

And Safari with 1,000 components, 10 rules each:

Chart of Safari with 1000 components and 10 rules. See tables for full data

Click for table
id class attribute attribute-value silly
Shadow DOM 60 61 81 81 92
No Shadow DOM 56 56 710 716 1157

All benchmarks were run on a 2015 MacBook Pro with the latest version of each browser (Chrome 92, Firefox 91, Safari 14.1).

Conclusions and future work

We can draw a few conclusions from this data. First off, it’s true that shadow DOM can improve style performance, so our theory about style encapsulation holds up. However, ID and class selectors are fast enough that actually it doesn’t matter much whether shadow DOM is used or not – in fact, they’re slightly faster without shadow DOM. This indicates that systems like Svelte, CSS Modules, or good old-fashioned BEM are using the best approach performance-wise.

This also indicates that using attributes for style encapsulation does not scale well compared to classes. So perhaps scoping systems like Vue would be better off switching to classes.

Another interesting question is why, in all three browser engines, classes and IDs are slightly slower when using shadow DOM. This is probably a better question for the browser vendors themselves, and I won’t speculate. I will say, though, that the differences are small enough in absolute terms that I don’t think it’s worth it to favor one or the other. The clearest signal from the data is just that shadow DOM helps to keep the style costs roughly constant, whereas without shadow DOM, you would want to stick to simple selectors like classes and IDs to avoid hitting a performance cliff.

As for future work: this is a pretty simple benchmark, and there are lots of ways to expand it. For instance, the benchmark only has one inner DOM node per component, and it only tests flat selectors – no descendant or sibling selectors (e.g. div div, div > div, div ~ div, and div + div). In theory, these scenarios should also favor shadow DOM, especially since these selectors can’t cross shadow boundaries, so the browser doesn’t need to look outside of the shadow root to find the relevant ancestors or siblings. (Although the browser’s Bloom filter makes this more complicated – see these notes for an good explanation of how this optimization works.)

Overall, though, I’d say that the numbers above are not big enough that the average web developer should start worrying about optimizing their CSS selectors, or migrating their entire web app to shadow DOM. These benchmark results are probably only relevant if 1) you’re building a framework, so any pattern you choose is magnified multiple times, or 2) you’ve profiled your web app and are seeing lots of high style calculation costs. But for everyone else, I hope at least that these results are interesting, and reveal a bit about how shadow DOM works.

Update: Thomas Steiner wondered about tag selectors as well (e.g. div {}), so I modified the benchmark to test it out. I’ll only report the results for the Shadow DOM version, since the benchmark uses divs, and in the non-shadow case it wouldn’t be possible to use tags alone to distinguish between different divs. In absolute terms, the numbers look pretty close to those for IDs and classes (or even a bit faster in Chrome and Firefox):

Click for table
Chrome Firefox Safari
1,000 components, 1 rule 53.9 19 56
1,000 components, 10 rules 62.5 20 58

Improving responsiveness in text inputs

For me, one of the most aggravating performance issues on the web is when it’s slow to type into a text input. I’m a fairly fast typist, so if there’s even a tiny delay in a <textarea> or <input>, I can feel it slowing me down, and it drives me nuts.

I find this problem especially irksome because it’s usually solvable with a few simple tricks. There’s no reason for a chat app or a social media app to be slow to type into, except that web developers often take the naïve approach, and that’s where the delay comes from.

To understand the source of input delays, let’s take a concrete example. Imagine a Twitter-like UI with a text field and a “remaining characters” count. As you type, the number gradually decreases down to zero.

Screenshot of a text area with the text "Hello I'm typing!" and the text "Characters remaining: 263"

Here’s the naïve way to implement this:

  1. Attach an input event listener to the <textarea>.
  2. Whenever the event fires, update some global state (e.g. in Redux).
  3. Update the “remaining characters” display based on that global state.

And here’s a live example. Really mash on the keyboard if you don’t notice the input delay:

Note: This example contains an artificial 70-millisecond delay to simulate a heavy real-world app, and to make the demo consistent across devices. Bear with me for a moment.

The problem with the naïve approach is that it usually ends up doing far too much work relative to the benefit that the user gets out of the “remaining characters” display. In the worst case, changing the global state could cause the entire UI to re-render (e.g. in a poorly-optimized React app), meaning that as the user types, every keypress causes a full global re-render.

Also, because we are directly listening to the input event, there will be a delay between the actual keypress and the character appearing in the <textarea>. Because the DOM is single-threaded, and because we’re doing blocking work on the main thread, the browser can’t render the new input until that work finishes. This can lead to noticeable typing delays and therefore user frustration.

My preferred solution to this kind of problem is to use requestIdleCallback to wait for the UI thread to be idle before running the blocking code. For instance, something like this:

let queued = false
textarea.addEventListener('input', () => {
  if (!queued) {
    queued = true
    requestIdleCallback(() => {
      updateUI(textarea.value)
      queued = false
    })
  }
})

This technique has several benefits:

  1. We are not directly blocking the input event with anything expensive, so there shouldn’t be a delay between typing a character and seeing that character appear in the <textarea>.
  2. We are not updating the UI for every keypress. requestIdleCallback will batch the UI updates when the user pauses between typing characters. This is sensible, because the user probably doesn’t care if the “remaining characters” count updates for every single keypress – their attention is on the text field, not on the remaining characters.
  3. On a slower machine, requestIdleCallback will naturally do fewer batches-per-keypress than on a faster machine. So a user on a faster device will have the benefit of a faster-updating UI, but neither user will experience poor input responsiveness.

And here’s a live example of the optimized version. Feel free to mash on the keyboard: you shouldn’t see (much) of a delay!

In the past, you might have used something like debouncing to solve this problem. But I like requestIdleCallback because of the third point above: it naturally adapts to the characteristics of the user’s device, rather than forcing us to choose a hardcoded delay.

Note: Running your state logic in a web worker is also a way to avoid this problem. But the vast majority of web apps aren’t architected this way, so I find requestIdleCallback to be better as a bolt-on solution.

To be fair, this technique isn’t foolproof. Some UIs really need to respond immediately to every keypress: for instance, to disallow certain characters or resize the <textarea> as it grows. (In those cases, though, I would throttle with requestAnimationFrame.) Also, some UIs may still lag if the work they’re doing is large enough that it’s perceptible even when batched. (In the live examples above, I set an artificial delay of 70 milliseconds, which you can still “feel” with the optimized version.) But for the most part, using requestIdleCallback is enough to get rid of any major responsiveness issues.

If you want to test this on your own website, I’d recommend putting the Chrome DevTools at 6x CPU slowdown and then mashing the keyboard as fast as you can. On a vanilla <textarea> or <input> with no JavaScript handlers, you won’t see any delay. Whereas if your own website feels sluggish, then maybe it’s time to optimize your text inputs!

Why it’s okay for web components to use frameworks

Should standalone web components be written in vanilla JavaScript? Or is it okay if they use (or even bundle) their own framework? With Vue 3 announcing built-in support for building web components, and with frameworks like Svelte and Lit having offered this functionality for some time, it seems like a good time to revisit the question.

First off, I should state my own bias. When I released emoji-picker-element, I made the decision to bundle its framework (Svelte) directly into the component. Clearly I don’t think this is a bad idea (despite my reputation as a perf guy!), so I’d like to explain why it doesn’t shock me for a web component to rely on a framework.

Size concerns

Many web developers might bristle at the idea of a standalone web component relying on its own framework. If I want a date picker, or a modal dialog, or some other utility component, why should I pay the tax of including its entire framework in my bundle? But I think this is the wrong way to look at things.

First off, JavaScript frameworks have come a long way from the days when they were huge, kitchen-sink monoliths. Today’s frameworks like Svelte, Lit, Preact, Vue, and others tend to be smaller, more focused, and more tree-shakeable. A Svelte “hello world” is 1.18 kB (minified and compressed), a Lit “hello world” is 5.7 kB, and petite-vue aims for a 5.8 kB compressed size. These are not huge by any stretch of the imagination.

If you dig deeper, the situation gets even more interesting. As Evan You points out, some frameworks (such as Vue) have a relatively high baseline cost that is amortized by a small per-component size, whereas other frameworks (such as Svelte) have a lower baseline cost but a higher per-component size. The days when you could confidently say “Framework X costs Y kilobytes” are over – the conversation has become much more complex and nuanced.

Second, with code-splitting becoming more common, the individual cost of a dependency has become less important than whether it can be lazy-loaded. For instance, if you use a date picker or modal dialog that bundles its own framework, why not dynamically import() it when it actually needs to be shown? There’s no reason to pay the cost on initial page load for a component that the user may never even need.

Third, bundle size is not the only performance metric that matters. There are also considerations like runtime cost, memory overhead, and energy usage that web developers rarely consider.

Looking at runtime cost, a framework can be small, but that’s not necessarily the same thing as being fast. Sometimes it takes more code to make an algorithm faster! For example, Inferno aims for faster runtime performance at the cost of a higher bundle size when compared to something like Preact. So it’s worth considering whether a component is fast in other metrics beside bundle size.

Caveats

That said, I don’t think “bring your own framework” is without its downsides. So let’s go over some problems you may run into when you mix-and-match frameworks.

You can imagine that, if every web component came with its own framework, then you might end up with multiple copies of the same framework on the same page. And this is definitely a concern! But assuming that the component externalizes its framework dependency (e.g. import 'my-framework'), then multiple components should be able to share the same framework code under the hood.

I used this technique in my own emoji-picker-element. If you’re already using Svelte in your project, then you can import 'emoji-picker-element/svelte' and get a version that doesn’t bundle its own framework, ensuring de-duplication. This saves a paltry 1.4 kB out of 13.9 kB total (compressed), but hey, it’s there. (Potentially I could make this the default behavior, but I like the bundled version for the benefit of folks who use <script> tags instead of bundlers. Maybe something like Skypack could make this simpler in the future.)

Another potential downside of bring-your-own-framework is when frameworks mutate global state, which can lead to conflicts between frameworks. For instance, React has historically attached global event listeners to the document (although thankfully this changed in React v17). Also, Angular’s Zone.js overrides the global Object.defineProperty (although there is a workaround). When mixing-and-matching frameworks, it’s best to avoid frameworks that mutate global state, or to carefully ensure that they don’t conflict with one another.

If you look at the compiled output for a framework like Svelte, though, you’ll see that it’s basically just a collection of pure functions that don’t modify the global state. Combining such frameworks in the same codebase is no more harmful than bundling different versions of Lodash or Underscore.

Now, to be clear: in an ideal world, your web app would only contain one framework. Otherwise it’s shipping duplicate code that essentially does the same thing. But web development is all about tradeoffs, and I don’t believe that it’s worth rejecting a component out-of-hand just to avoid a few extra kBs from a tiny framework like Preact or Lit. (Of course, for a larger framework, this may be a different story. But this is true of any component dependency, not just a framework.)

Framework chauvinism

In general, I don’t think the question should be whether a component uses its own framework or not. Instead, the question should be: Is this component small enough/fast enough for my use case? After all, a component can be huge without using a framework, and it can be slow even when written in vanilla JS. The framework is part of the story, but it’s not the whole story.

I also think that focusing too much on frameworks plays against the strengths of web components. The whole point of web components is to have a standard, interoperable way to add a component to a page without worrying about what framework it’s using under the hood (or if it’s using a framework at all).

Web components also serve as a fantastic glue layer between frameworks. If there’s a great React component out there that you want to use in your Vue codebase, why not wrap it in Remount (2.4 kB) and Preact (4 kB) and call it a day? Even if you spent the time to laboriously create your own Vue version of the component, are you really sure you’ll improve upon the battle-tested version that already exists on npm?

Part of the reason I wrote emoji-picker-element as a web component (and not, for instance, as a Svelte component) is that I think it’s silly to re-implement something like an emoji picker in multiple frameworks. The core business logic of an emoji picker has nothing to do with frameworks – in fact, I think my main contribution to the emoji picker landscape was in innovating around IndexedDB, accessibility, and data loading. Should we really re-implement all of those things just to satisfy developers who want their codebase to be pure Vue, or pure Lit, or pure React, or pure whatever? Do we need an entirely new ecosystem every time a new framework comes out?

The belief that it’s unacceptable for a web app to contain more than one framework is something I might call “framework chauvinism.” And honestly, if you feel this way, then you may as well choose the framework that has the most market share and biggest ecosystem – i.e. you may as well choose React. After all, if you chose Vue or Svelte or some other less-popular framework, then you might find that when you reach for some utility component on npm, nobody has written it in your framework of choice.

Now, if you like living in a React-only world: that’s great. You can definitely do so, given how enormous the React ecosystem is. But personally, I like playing around with different frameworks, comparing their strengths and weaknesses, and letting developers use whichever one tickles their fancy. The vision of a React-only future fills me with a deep boredom. I would much rather see frameworks continue to compete and innovate and push the boundaries of what’s possible in web development than to see one framework “solve” web development forever. (Or to see frameworks locked in a perpetual ecosystem race against each other.)

To me, the main benefit of web components is that they liberate us from the tyranny of frameworks. Rather than focusing on cosmetic questions of how a component is written (did you use React? did you use Vue? who cares!), we can focus on more important questions of performance, accessibility, correctness, and things that have nothing to do with whether you use HTML templates or a render() function. Balking at web components that use frameworks is, in my opinion, missing the entire point of web components.

Thanks to Thomas Steiner and Thomas Wilburn for their thoughtful feedback on a draft of this blog post.