Archive for the ‘Web’ Category

How to write about web performance

I’ve been writing about performance for a long time. I like to think I’ve gotten pretty good at it, but sometimes I look back on my older blog posts and cringe at the mistakes I made.

This post is an attempt to distill some of what I’ve learned over the years to offer as advice to other aspiring tinkerers, benchmarkers, and anyone curious about how browsers actually work when you put them to the test.

Why write about web performance?

The first and maybe most obvious question is: why bother? Why write about web performance? Isn’t this something that’s better left to the browser vendors themselves?

In some ways, this is true. Browser vendors know how their product actually works. If some part of the system is performing slowly, you can go knock on the door of your colleague who wrote the code and ask them why it’s slow. (Or send them a DM, I suppose, in our post-pandemic world.)

But in other ways, browser vendors really aren’t in a good position to talk frankly about web performance. Browsers are in the business of selling browsers. Web performance claims are often used in marketing, with claims like “Browser X is 25% faster than Browser Y,” which might need to get approved by the marketing department, the legal department, not to mention various owners and stakeholders…

And that’s only if your browser is the fast one. If you run a benchmark and it turns out that your browser is the slow one, or it’s a mixed bag, then browser vendors will keep pretty quiet about it. This is why whenever a browser vendor releases a new benchmark, surprise surprise! Their browser wins. So the browser vendors’ hands are pretty tied when it comes to accurately writing about how their product actually works.

Of course, there are exceptions to this rule. Occasionally you will find someone’s personal blog, or a comment on a public bugtracker, which betrays that their browser is actually not so great in some benchmark. But nobody is going to go out of their way to sing from the mountaintops about how lousy their browser is in a benchmark. If anything, they’ll talk about it after they’ve done the work to make things faster, meaning the benchmark already went through several rounds of internal discussion, and was maybe used to evaluate some internal initiative to improve performance – a process that might last years before the public actually hears about it.

Other times, browser vendors will release a new feature, announce it with some fanfare, and then make vague claims about how it improves performance without delving into any specifics. If you actually look into these claims, though, you might find that the performance improvement is pretty meager, or it only manifests in a specific context. (Don’t expect the team who built the feature to eagerly tell you this, though.)

By the way, I don’t blame the browser vendors at all for this situation. I worked on the performance team at Microsoft Edge (back in the EdgeHTML days, before the switch to Chromium), and I did the same stuff. I wrote about scrolling performance because, at the time, our engine was the best at scrolling. I wrote about input responsiveness after we had already made it faster. (Not before! Definitely not before.) I designed benchmarks that explicitly showed off the improvements we had made. I worked on marketing videos that showed our browser winning in experiments where we already knew we’d win.

And if you think I’m picking on Microsoft, I could easily find examples of the other browser vendors doing the same thing. But I choose not to, because I’d rather pick on myself. (If you work for a browser vendor and are reading this, I’m sure some examples come to mind.)

Don’t expect a car company to tell you that their competitor has better mileage. Don’t expect them to admit that their new model has a lousy safety rating. That’s what Consumer Reports is for. In the same way, if you don’t work at a browser vendor (I don’t, anymore), then you are blessedly free to say whatever you want about browsers, and to honestly assess their claims and compare them to each other in fair, unbiased benchmarks.

Plus, as a web developer, you might actually be in a better position to write a benchmark that is more representative of real-world code. Browser developers spend most of their day writing C, C++, and Rust, not necessarily HTML, CSS, and JavaScript. So they aren’t always familiar with the day-to-day concerns of working web developers.

The subtle science of benchmarking

Okay, so that was my long diatribe about why you’d bother writing about web performance. So how do you actually go about doing it?

First off, I’d say to write the benchmark before you start writing your blog post. Your conclusions and high-level takeaways may be vastly different depending on the results of the benchmark. So don’t assume you already know what the results are going to be.

I’ve made this mistake in the past! Once, I wrote an entire blog post before writing the benchmark, and then the benchmark completely upended what I was going to say in the post. I had to scrap the whole thing and start from scratch.

Benchmarking is science, and you should treat it with the seriousness of a scientific endeavor. Expect peer review, which means – most importantly! – publish your benchmark publicly and provide instructions for others to test it. Because believe me, they will! I’ve had folks notify me of a bug in my benchmark after I published a post, so I had to go back and edit it to correct the results. (This is annoying, and embarrassing, but it’s better than willfully spreading misinformation.)

Since you may end up running your benchmark multiple times, and even generating your charts and tables multiple times, make an effort to streamline the process of gathering the data. If some step is manual, try to automate it.

These days, I like Tachometer because it automates a lot of the boring parts of benchmarking – launching a browser, taking a measurement, taking multiple measurements, taking enough measurements to achieve statistical significance, etc. Unfortunately it doesn’t automate the part where you generate charts and graphs, but I usually write small scripts to output the data in a format where I can easily import it into a spreadsheet app.

This also leads to an important point: take accurate measurements. A common mistake is to use Date.now() – instead, you should use performance.now(), since this gives you a high-resolution timestamp. Or even better, use performance.mark() and performance.measure() – these are also high-resolution, but with the added benefit that you can actually see your measurements laid out visually in the Chrome DevTools. This is a great way to double-check that you’re actually measuring what you think you’re measuring.

Screenshot of a performance trace in Chrome DevTools with an annotation in the User Timing section for the "total" span saying "What the benchmark is measuring" and stacktraces in the main thread with the annotation "What the browser is doing"

Note: Sadly, the Firefox and Safari DevTools still don’t show performance marks/measures in their profiler traces. They really should; IE11 had this feature years ago.

As mentioned above, it’s also a good idea to take multiple measurements. Benchmarks will always show variance, and you can prove just about anything if you only take one sample. For best results, I’d say take at least three measurements and then calculate the median, or better yet, use a tool like Tachometer that will use a bunch of fancy statistics to find the ideal number of samples.

Humility

Writing about web performance is really hard, so it’s important to be humble. Browsers are incredibly complex, so you have to accept that you will probably be wrong about something. And if you’re not wrong, then you will be wrong in 5 years when browsers update their engines to make your results obsolete.

There are a few ways you can limit your likelihood of wrongness, though. Here are a few strategies that have worked well for me in the past.

First off, test in multiple browser engines. This is a good way to figure out if you’ve identified a quirk in a particular browser, or a fundamental truth about how the web works. Heck, if you write a benchmark where one browser performs much more poorly than the other ones, then congratulations! You’ve found a browser bug, and now you have a reproducible test case that you can file on that browser.

(And if you think they won’t be grateful or won’t fix the problem, then prepare to be surprised. I’ve filed several such bugs on browsers, and they usually at least acknowledge the issue if not outright fix it within a few releases. Sometimes browser developers are grateful when you file a bug like this, because they might already know something is a problem, but without bug reports from customers, they weren’t able to convince management to prioritize it.)

Second, reduce the variables. Test on the same hardware, if possible. (For testing the three major browser engines – Blink, Gecko, and WebKit – this sadly means you’re always testing on macOS. Do not trust WebKit on Windows/Linux; I’ve found its performance to be vastly different from Safari’s.) Browsers can differ based on whether the device is plugged into power or has low battery, so make sure that the device is plugged in and charged. Don’t run other applications or browser windows or tabs while you’re running the benchmark. If networking is involved, use a local server if possible to eliminate latency. (Or configure the server to always respond with a particular delay, or use throttling, as necessary.) Update all the browsers before running the test.

Third, be aware of caching. It’s easy to fool yourself if you run 1,000 iterations of something, and it turns out that the last 999 iterations are all cached. JavaScript engines have JIT compilers, meaning that the first iteration can be different from the second iteration, which can be different from the third, etc. If you think you can figure out something low-level like “Is const faster than let?”, you probably can’t, because the JIT will outsmart you. Browsers also have bytecode caching, which means that the first page load may be different from the second, and the second may even be different from the third. (Tachometer works around this by using a fresh browser tab each iteration, which is clever.)

My point here is that, for all of your hard work to do rigorous, scientific benchmarking, you may just turn out to be wrong. You’ll publish your blog post, you’ll feel very proud of yourself, and then a browser engineer will contact you privately and say, “You know, it only works like this on a 60FPS monitor.” Or “only on Intel CPUs.” Or “only on macOS Big Sur.” Or “only if your DOM size is greater than 1,000 and the layer depth is above 10 and you’re using a trackball mouse and it’s a Tuesday and the moon is in the seventh house.”

There are so many variables in browser performance, and you can’t possibly capture them all. The best you can do is document your methodology, explain what your benchmark doesn’t test, and try not to make grand sweeping statements like, “You should always use const instead of let; my benchmark proves it’s faster.” At best, your benchmark proves that one number is higher than another in your very specific benchmark in the very specific way you tested it, and you have to be satisfied with that.

Conclusion

Writing about browser performance is hard, but it’s not fruitless. I’ve had enough successes over the years (and enough stubbornness and curiosity, I guess) that I keep doing it.

For instance, I wrote about how bundlers like Rollup produced faster JavaScript files than bundlers like Webpack, and Webpack eventually improved its implementation. I filed a bug on Firefox and Chrome showing that Safari had an optimization they didn’t, and both browsers fixed it, so now all three browsers are fast on the benchmark. I wrote a silly JavaScript “optimizer” that the V8 team used to improve their performance.

I bring up all these examples less to brag, and more to show that it is possible to improve things by simply writing about them. In all three of the above cases, I actually made mistakes in my benchmarks (pretty dumb ones, in some cases), and had to go back and fix it later. But if you can get enough traction and get the right people’s attention, then the browsers and bundlers and frameworks can change, without you having to actually write the code to do it. (To this day, I can’t write a line of C, C++, or Rust, but I’ve influenced browser vendors to write it for me, which aligns with my goal of spending more time playing Tetris than learning new programming languages.)

My point in writing all this is to try to convince you (if you’ve read this far) that it is indeed valuable for you to write about web performance. Even if you don’t feel like you really understand how browsers work. Even if you’re just getting started as a web developer. Even if you’re just curious, and you want to poke around at browsers to see how they tick. At worst you’ll be wrong (which I’ve been many times), and at best you might teach others about performant programming patterns, or even influence the ecosystem to change and make things better for everyone.

There are plenty of upsides, and all you need is an HTML file and a bit of patience. So if that sounds interesting to you, get started and have fun benchmarking!

Speeding up IndexedDB reads and writes

Recently I read James Long’s article “A future for SQL on the web”. It’s a great post, and if you haven’t read it, you should definitely go take a look!

I don’t want to comment on the specifics of the tool James created, except to say that I think it’s a truly amazing feat of engineering, and I’m excited to see where it goes in the future. But one thing in the post that caught my eye was the benchmark comparisons of IndexedDB read/write performance (compared to James’s tool, absurd-sql).

The IndexedDB benchmarks are fair enough, in that they demonstrate the idiomatic usage of IndexedDB. But in this post, I’d like to show how raw IndexedDB performance can be improved using a few tricks that are available as of IndexedDB v2 and v3:

  • Pagination (v2)
  • Relaxed durability (v3)
  • Explicit transaction commits (v3)

Let’s go over each of these in turn.

Pagination

Years ago when I was working on PouchDB, I hit upon an IndexedDB pattern that, at the time, improved performance in Firefox and Chrome by roughly 40-50%. I’m probably not the first person to come up with this idea, but I’ll lay it out here.

In IndexedDB, a cursor is basically a way of iterating through the data in a database one-at-a-time. And that’s the core problem: one-at-a-time. Sadly, this tends to be slow, because at every step of the iteration, JavaScript can respond to a single item from the cursor and decide whether to continue or stop the iteration.

Effectively this means that there’s a back-and-forth between the JavaScript main thread and the IndexedDB engine (running off-main-thread). You can see it in this screenshot of the Chrome DevTools performance profiler:

Screenshot of Chrome DevTools profiler showing multiple small tasks separated by a small amount of idle time each

Or in Chrome tracing, which shows a bit more detail:

Screenshot of Chrome tracing tool showing multiple separate tasks, separated by a bit of idle time. The top of each task says RunNormalPriorityTask, and near the bottom each one says IDBCursor continue.

Notice that each call to cursor.continue() gets its own little JavaScript task, and the tasks are separated by a bit of idle time. That’s a lot of wasted time for each item in a database!

Luckily, in IndexedDB v2, we got two new APIs to help out with this problem: getAll() and getAllKeys(). These allow you to fetch multiple items from an object store or index in a single go. They can also start from a given key range and return a given number of items, meaning that we can implement a paginated cursor:

const batchSize = 100
let keys, values, keyRange = null

function fetchMore() {
  // If there could be more results, fetch them
  if (keys && values && values.length === batchSize) {
    // Find keys greater than the last key
    keyRange = IDBKeyRange.lowerBound(keys.at(-1), true)
    keys = values = undefined
    next()
  }
}

function next() {
  store.getAllKeys(keyRange, batchSize).onsuccess = e => {
    keys = e.target.result
    fetchMore()
  }
  store.getAll(keyRange, batchSize).onsuccess = e => {
    values = e.target.result
    fetchMore()
  }
}

next()

In the example above, we iterate through the object store, fetching 100 items at a time rather than just 1. Using a modified version of the absurd-sql benchmark, we can see that this improves performance considerably. Here are the results for the “read” benchmark in Chrome:

Chart image, see table below

Click for table

DB size (columns) vs batch size (rows):

100 1000 10000 50000
1 8.9 37.4 241 1194.2
100 7.3 34 145.1 702.8
1000 6.5 27.9 100.3 488.3

(Note that a batch size of 1 means a cursor, whereas 100 and 1000 use a paginated cursor.)

And here’s Firefox:

Chart image, see table below

Click for table

DB size (columns) vs batch size (rows):

100 1000 10000 50000
1 2 15 125 610
100 2 9 70 468
1000 2 8 51 271

And Safari:

Chart image, see table below

Click for table

DB size (columns) vs batch size (rows):

100 1000 10000 50000
1 11 106 957 4673
100 1 5 44 227
1000 1 3 26 127

All benchmarks were run on a 2015 MacBook Pro, using Chrome 92, Firefox 91, and Safari 14.1. Tachometer was configured with 15 minimum iterations, a 1% horizon, and a 10-minute timeout. I’m reporting the median of all iterations.

As you can see, the paginated cursor is particularly effective in Safari, but it improves performance in all browser engines.

Now, this technique isn’t without its downsides. For one, you have to choose an explicit batch size, and the ideal number will depend on the size of the data and the usage patterns. You may also want to consider the downsides of overfetching – i.e. if the cursor should stop at a given value, you may end up fetching more items from the database than you really need. (Although ideally, you can use the upper bound of the key range to guard against that.)

The main downside of this technique is that it only works in one direction: you cannot build a paginated cursor in descending order. This is a flaw in the IndexedDB specification, and there are ideas to fix it, but currently it’s not possible.

Of course, instead of implementing a paginated cursor, you could also just use getAll() and getAllKeys() as-is and fetch all the data at once. This probably isn’t a great idea if the database is large, though, as you may run into memory pressure, especially on constrained devices. But it could be useful if the database is small.

getAll() and getAllKeys() both have great browser support, so this technique can be widely adopted for speeding up IndexedDB read patterns, at least in ascending order.

Relaxed durability

The paginated cursor can speed up database reads, but what about writes? In this case, we don’t have an equivalent to getAll()/getAllKeys() that we can lean on. Apparently there was some effort put into building a putAll(), but currently it’s abandoned because it didn’t actually improve write performance in Chrome.

That said, there are other ways to improve write performance. Unfortunately, none of these techniques are as effective as the paginated cursor, but they are worth investigating, so I’m reporting my results here.

The most significant way to improve write performance is with relaxed durability. This API is currently only available in Chrome, but it has also been implemented in WebKit as of Safari Technology Preview 130.

The idea behind relaxed durability is to resolve some disagreement between the browser vendors as to whether IndexedDB transactions should optimize for durability (writes succeed even in the event of a power failure or crash) or performance (writes succeed quickly, even if not fully flushed to disk).

It’s been well documented that Chrome’s IndexedDB performance is worse than Firefox’s or Safari’s, and part of the reason seems to be that Chrome defaults to a durable-by-default mode. But rather than sacrifice durability across-the-board, the Chrome team wanted to expose an explicit API for developers to decide which mode to use. (After all, only the web developer knows if IndexedDB is being used as an ephemeral cache or a store of priceless family photos.) So now we have three durability options: default, relaxed, and strict.

Using the “write” benchmark, we can test out relaxed durability in Chrome and see the improvement:

Chart image, see table below

Click for table
Durability 100 1000 10000 50000
Default 26.4 125.9 1373.7 7171.9
Relaxed 17.1 112.9 1359.3 6969.8

As you can see, the results are not as dramatic as with the pagination technique. The effect is most visible in the smaller database sizes, and the reason turns out to be that relaxed durability is better at speeding up multiple small transactions than one big transaction.

Modifying the benchmark to do one transaction per item in the database, we can see a much clearer impact of relaxed durability:

Chart image, see table below

Click for table
Durability 100 1000
Default 1074.6 10456.2
Relaxed 65.4 630.7

(I didn’t measure the larger database sizes, because they were too slow, and the pattern is clear.)

Personally, I find this option to be nice-to-have, but underwhelming. If performance is only really improved for multiple small transactions, then usually there is a simpler solution: use fewer transactions.

It’s also underwhelming given that, even with this option enabled, Chrome is still much slower than Firefox or Safari:

Chart image, see table below

Click for table
Browser 100 1000 10000 50000
Chrome (default) 26.4 125.9 1373.7 7171.9
Chrome (relaxed) 17.1 112.9 1359.3 6969.8
Firefox 8 53 436 1893
Safari 3 28 279 1359

That said, if you’re not storing priceless family photos in IndexedDB, I can’t see a good reason not to use relaxed durability.

Explicit transaction commits

The last technique I’ll cover is explicit transaction commits. I found it to be an even smaller performance improvement than relaxed durability, but it’s worth mentioning.

This API is available in both Chrome and Firefox, and (like relaxed durability) has also been implemented in Safari Technology Preview 130.

The idea is that, instead of allowing the transaction to auto-close based on the normal flow of the JavaScript event loop, you can explicitly call transaction.close() to signal that it’s safe to close the transaction immediately. This results in a very small performance boost because the IndexedDB engine is no longer waiting for outstanding requests to be dispatched. Here is the improvement in Chrome using the “write” benchmark:

Chart image, see table below

Click for table
Relaxed / Commit 100 1000 10000 50000
relaxed=false, commit=false 26.4 125.9 1373.7 7171.9
relaxed=false, commit=true 26 125.5 1373.9 7129.7
relaxed=true, commit=false 17.1 112.9 1359.3 6969.8
relaxed=true, commit=true 16.8 112.8 1356.2 7215

You’d really have to squint to see the improvement, and only for the smaller database sizes. This makes sense, since explicit commits can only shave a bit of time off the end of each transaction. So, like relaxed durability, it has a bigger impact on multiple small transactions than one big transaction.

The results are similarly underwhelming in Firefox:

Chart image, see table below

Click for table
Commit 100 1000 10000 50000
No commit 8 53 436 1893
Commit 8 52 434 1858

That said, especially if you’re doing multiple small transactions, you might as well use it. Since it’s not supported in all browsers, though, you’ll probably want to use a pattern like this:

if (transaction.commit) {
  transaction.commit()
}

If transaction.commit is undefined, then the transaction can just close automatically, and functionally it’s the same.

Update: Daniel Murphy points out that transaction.commit() can have bigger perf gains if the page is busy with other JavaScript tasks, which would delay the auto-closing of the transaction. This is a good point! My benchmark doesn’t measure this.

Conclusion

IndexedDB has a lot of detractors, and I think most of the criticism is justified. The IndexedDB API is awkward, it has bugs and gotchas in various browser implementations, and it’s not even particularly fast, especially compared to a full-featured, battle-hardened, industry-standard tool like SQLite. The new APIs unveiled in IndexedDB v3 don’t even move the needle much. It’s no surprise that many developers just say “forget it” and stick with localStorage, or they create elaborate solutions on top of IndexedDB, such as absurd-sql.

Perhaps I just have Stockholm syndrome from having worked with IndexedDB for so long, but I don’t find it to be so bad. The nomenclature and the APIs are a bit weird, but once you wrap your head around it, it’s a powerful tool with broad browser support – heck, it even works in Node.js via fake-indexeddb and indexeddbshim. For better or worse, IndexedDB is here to stay.

That said, I can definitely see a future where IndexedDB is not the only player in the browser storage game. We had WebSQL, and it’s long gone (even though I’m still maintaining a Node.js port!), but that hasn’t stopped people from wanting a more high-level database API in the browser, as demonstrated by tools like absurd-sql. In the future, I can imagine something like the Storage Foundation API making it more straightforward to build custom databases of top of low-level storage primitives – which is what IndexedDB was designed to do, and arguably failed at. (PouchDB, for one, makes extensive use of IndexedDB’s capabilities, but I’ve seen plenty of storage wrappers that essentially use IndexedDB as a dumb key-value store.)

I’d also like to see the browser vendors (especially Chrome) improve their IndexedDB performance. The Chrome team has said that they’re focused on read performance rather than write performance, but really, both matter. A mobile app developer can ship a prebuilt SQLite .db file in their app; in terms of quickly populating a database, there is nothing even remotely close for IndexedDB. As demonstrated above, cursor performance is also not great

For those web developers sticking it out with IndexedDB, though, I hope I’ve made a case that it’s not completely a lost cause, and that its performance can be improved. Who knows: maybe the browser vendors still have some tricks up their sleeves, especially if we web developers keep complaining about IndexedDB performance. It’ll be interesting to watch this space evolve and to see how both IndexedDB and its alternatives improve over the years.

Does shadow DOM improve style performance?

Short answer: Kinda. It depends. And it might not be enough to make a big difference in the average web app. But it’s worth understanding why.

First off, let’s review the browser’s rendering pipeline, and why we might even speculate that shadow DOM could improve its performance. Two fundamental parts of the rendering process are style calculation and layout calculation, or simply “style” and “layout.” The first part is about figuring out which DOM nodes have which styles (based on CSS), and the second part is about figuring out where to actually place those DOM nodes on the page (using the styles calculated in the previous step).

Screenshot of Chrome DevTools showing a performance trace with JavaScript stacks followed by a purple Style/Layout region and green Paint region

A performance trace in Chrome DevTools, showing the basic JavaScript → Style → Layout → Paint pipeline.

Browsers are complex, but in general, the more DOM nodes and CSS rules on a page, the longer it will take to run the style and layout steps. One of the ways we can improve the performance of this process is to break up the work into smaller chunks, i.e. encapsulation.

For layout encapsulation, we have CSS containment. This has already been covered in other articles, so I won’t rehash it here. Suffice it to say, I think there’s sufficient evidence that CSS containment can improve performance (I’ve seen it myself), so if you haven’t tried putting contain: content on parts of your UI to see if it improves layout performance, you definitely should!

For style encapsulation, we have something entirely different: shadow DOM. Just like how CSS containment can improve layout performance, shadow DOM should (in theory) be able to improve style performance. Let’s consider why.

What is style calculation?

As mentioned before, style calculation is different from layout calculation. Layout calculation is about the geometry of the page, whereas style calculation is more explicitly about CSS. Basically, it’s the process of taking a rule like:

div > button {
  color: blue;
}

And a DOM tree like:

<div>
  <button></button>
</div>

…and figuring out that the <button> should have color: blue because its parent is a <div>. Roughly speaking, it’s the process of evaluating CSS selectors (div > button in this case).

Now, in the worst case, this is an O(n * m) operation, where n is the number of DOM nodes and m is the number of CSS rules. (I.e. for each DOM node, and for each rule, figure out if they match each other.) Clearly, this isn’t how browsers do it, or else any decently-sized web app would become grindingly slow. Browsers have a lot of optimizations in this area, which is part of the reason that the common advice is not to worry too much about CSS selector performance (see this article for a good, recent treatment of the subject).

That said, if you’ve worked on a non-trivial codebase with a fair amount of CSS, you may notice that, in Chrome performance profiles, the style costs are not zero. Depending on how big or complex your CSS is, you may find that you’re actually spending more time in style calculation than in layout calculation. So it isn’t a completely worthless endeavor to look into style performance.

Shadow DOM and style calculation

Why would shadow DOM improve style performance? Again, it’s because of encapsulation. If you have a CSS file with 1,000 rules, and a DOM tree with 1,000 nodes, the browser doesn’t know in advance which rules apply to which nodes. Even if you authored your CSS with something like CSS Modules, Vue scoped CSS, or Svelte scoped CSS, ultimately you end up with a stylesheet that is only implicitly coupled to the DOM, so the browser has to figure out the relationship at runtime (e.g. using class or attribute selectors).

Shadow DOM is different. With shadow DOM, the browser doesn’t have to guess which rules are scoped to which nodes – it’s right there in the DOM:

<my-component>
  #shadow-root
    <style>div {color: green}</style>
    <div></div>
<my-component>
<another-component>
  #shadow-root
    <style>div {color: blue}</style>
    <div></div>
</another-component>

In this case, the browser doesn’t need to test the div {color: green} rule against every node in the DOM – it knows that it’s scoped to <my-component>. Ditto for the div {color: blue} rule in <another-component>. In theory, this can speed up the style calculation process, because the browser can rely on explicit scoping through shadow DOM rather than implicit scoping through classes or attributes.

Benchmarking it

That’s the theory, but of course things are always more complicated in practice. So I put together a benchmark to measure the style calculation performance of shadow DOM. Certain CSS selectors tend to be faster than others, so for decent coverage, I tested the following selectors:

  • ID (#foo)
  • class (.foo)
  • attribute ([foo])
  • attribute value ([foo="bar"])
  • “silly” ([foo="bar"]:nth-of-type(1n):last-child:not(:nth-of-type(2n)):not(:empty))

Roughly, I would expect IDs and classes to be the fastest, followed by attributes and attribute values, followed by the “silly” selector (thrown in to add something to really make the style engine work).

To measure, I used a simple requestPostAnimationFrame polyfill, which measures the time spent in style, layout, and paint. Here is a screenshot in the Chrome DevTools of what’s being measured (note the “total” under the Timings section):

Screenshot of Chrome DevTools showing a "total" measurement in Timings which corresponds to style, layout, and other purple "rendering" blocks in the "Main" section

To run the actual benchmark, I used Tachometer, which is a nice tool for browser microbenchmarks. In this case, I just took the median of 51 iterations.

The benchmark creates several custom elements, and either attaches a shadow root with its own <style> (shadow DOM “on”) , or uses a global <style> with implicit scoping (shadow DOM “off”). In this way, I wanted to make a fair comparison between shadow DOM itself and shadow DOM “polyfills” – i.e. systems for scoping CSS that don’t rely on shadow DOM.

Each CSS rule looks something like this:

#foo {
  color: #000000;
}

And the DOM structure for each component looks like this:

<div id="foo">hello</div>

(Of course, for attribute and class selectors, the DOM node would have an attribute or class instead.)

Benchmark results

Here are the results in Chrome for 1,000 components and 1 CSS rule for each component:

Chart of Chrome with 1000 components and 1 rule. See tables for full data

Click for table
id class attribute attribute-value silly
Shadow DOM 67.90 67.20 67.30 67.70 69.90
No Shadow DOM 57.50 56.20 120.40 117.10 130.50

As you can see, classes and IDs are about the same with shadow DOM on or off (in fact, it’s a bit faster without shadow DOM). But once the selectors get more interesting (attribute, attribute value, and the “silly” selector), shadow DOM stays roughly constant, whereas the non-shadow DOM version gets more expensive.

We can see this effect even more clearly if we bump it up to 10 CSS rules per component:

Chart of Chrome with 1000 components and 10 rules. See tables for full data

Click for table
id class attribute attribute-value silly
Shadow DOM 70.80 70.60 71.10 72.70 81.50
No Shadow DOM 58.20 58.50 597.10 608.20 740.30

The results above are for Chrome, but we see similar numbers in Firefox and Safari. Here’s Firefox with 1,000 components and 1 rule each:

Chart of Firefox with 1000 components and 1 rule. See tables for full data

Click for table
id class attribute attribute-value silly
Shadow DOM 27 25 25 25 25
No Shadow DOM 18 18 32 32 32

And Firefox with 1,000 components, 10 rules each:

Chart of Firefox with 1000 components and 10 rules. See tables for full data

Click for table
id class attribute attribute-value silly
Shadow DOM 30 30 30 30 34
No Shadow DOM 22 22 143 150 153

And here’s Safari with 1,000 components and 1 rule each:

Chart of Safari with 1000 components and 1 rule. See tables for full data

Click for table
id class attribute attribute-value silly
Shadow DOM 57 58 61 63 64
No Shadow DOM 52 52 126 126 177

And Safari with 1,000 components, 10 rules each:

Chart of Safari with 1000 components and 10 rules. See tables for full data

Click for table
id class attribute attribute-value silly
Shadow DOM 60 61 81 81 92
No Shadow DOM 56 56 710 716 1157

All benchmarks were run on a 2015 MacBook Pro with the latest version of each browser (Chrome 92, Firefox 91, Safari 14.1).

Conclusions and future work

We can draw a few conclusions from this data. First off, it’s true that shadow DOM can improve style performance, so our theory about style encapsulation holds up. However, ID and class selectors are fast enough that actually it doesn’t matter much whether shadow DOM is used or not – in fact, they’re slightly faster without shadow DOM. This indicates that systems like Svelte, CSS Modules, or good old-fashioned BEM are using the best approach performance-wise.

This also indicates that using attributes for style encapsulation does not scale well compared to classes. So perhaps scoping systems like Vue would be better off switching to classes.

Another interesting question is why, in all three browser engines, classes and IDs are slightly slower when using shadow DOM. This is probably a better question for the browser vendors themselves, and I won’t speculate. I will say, though, that the differences are small enough in absolute terms that I don’t think it’s worth it to favor one or the other. The clearest signal from the data is just that shadow DOM helps to keep the style costs roughly constant, whereas without shadow DOM, you would want to stick to simple selectors like classes and IDs to avoid hitting a performance cliff.

As for future work: this is a pretty simple benchmark, and there are lots of ways to expand it. For instance, the benchmark only has one inner DOM node per component, and it only tests flat selectors – no descendant or sibling selectors (e.g. div div, div > div, div ~ div, and div + div). In theory, these scenarios should also favor shadow DOM, especially since these selectors can’t cross shadow boundaries, so the browser doesn’t need to look outside of the shadow root to find the relevant ancestors or siblings. (Although the browser’s Bloom filter makes this more complicated – see these notes for an good explanation of how this optimization works.)

Overall, though, I’d say that the numbers above are not big enough that the average web developer should start worrying about optimizing their CSS selectors, or migrating their entire web app to shadow DOM. These benchmark results are probably only relevant if 1) you’re building a framework, so any pattern you choose is magnified multiple times, or 2) you’ve profiled your web app and are seeing lots of high style calculation costs. But for everyone else, I hope at least that these results are interesting, and reveal a bit about how shadow DOM works.

Update: Thomas Steiner wondered about tag selectors as well (e.g. div {}), so I modified the benchmark to test it out. I’ll only report the results for the Shadow DOM version, since the benchmark uses divs, and in the non-shadow case it wouldn’t be possible to use tags alone to distinguish between different divs. In absolute terms, the numbers look pretty close to those for IDs and classes (or even a bit faster in Chrome and Firefox):

Click for table
Chrome Firefox Safari
1,000 components, 1 rule 53.9 19 56
1,000 components, 10 rules 62.5 20 58

Improving responsiveness in text inputs

For me, one of the most aggravating performance issues on the web is when it’s slow to type into a text input. I’m a fairly fast typist, so if there’s even a tiny delay in a <textarea> or <input>, I can feel it slowing me down, and it drives me nuts.

I find this problem especially irksome because it’s usually solvable with a few simple tricks. There’s no reason for a chat app or a social media app to be slow to type into, except that web developers often take the naïve approach, and that’s where the delay comes from.

To understand the source of input delays, let’s take a concrete example. Imagine a Twitter-like UI with a text field and a “remaining characters” count. As you type, the number gradually decreases down to zero.

Screenshot of a text area with the text "Hello I'm typing!" and the text "Characters remaining: 263"

Here’s the naïve way to implement this:

  1. Attach an input event listener to the <textarea>.
  2. Whenever the event fires, update some global state (e.g. in Redux).
  3. Update the “remaining characters” display based on that global state.

And here’s a live example. Really mash on the keyboard if you don’t notice the input delay:

Note: This example contains an artificial 70-millisecond delay to simulate a heavy real-world app, and to make the demo consistent across devices. Bear with me for a moment.

The problem with the naïve approach is that it usually ends up doing far too much work relative to the benefit that the user gets out of the “remaining characters” display. In the worst case, changing the global state could cause the entire UI to re-render (e.g. in a poorly-optimized React app), meaning that as the user types, every keypress causes a full global re-render.

Also, because we are directly listening to the input event, there will be a delay between the actual keypress and the character appearing in the <textarea>. Because the DOM is single-threaded, and because we’re doing blocking work on the main thread, the browser can’t render the new input until that work finishes. This can lead to noticeable typing delays and therefore user frustration.

My preferred solution to this kind of problem is to use requestIdleCallback to wait for the UI thread to be idle before running the blocking code. For instance, something like this:

let queued = false
textarea.addEventListener('input', () => {
  if (!queued) {
    queued = true
    requestIdleCallback(() => {
      updateUI(textarea.value)
      queued = false
    })
  }
})

This technique has several benefits:

  1. We are not directly blocking the input event with anything expensive, so there shouldn’t be a delay between typing a character and seeing that character appear in the <textarea>.
  2. We are not updating the UI for every keypress. requestIdleCallback will batch the UI updates when the user pauses between typing characters. This is sensible, because the user probably doesn’t care if the “remaining characters” count updates for every single keypress – their attention is on the text field, not on the remaining characters.
  3. On a slower machine, requestIdleCallback will naturally do fewer batches-per-keypress than on a faster machine. So a user on a faster device will have the benefit of a faster-updating UI, but neither user will experience poor input responsiveness.

And here’s a live example of the optimized version. Feel free to mash on the keyboard: you shouldn’t see (much) of a delay!

In the past, you might have used something like debouncing to solve this problem. But I like requestIdleCallback because of the third point above: it naturally adapts to the characteristics of the user’s device, rather than forcing us to choose a hardcoded delay.

Note: Running your state logic in a web worker is also a way to avoid this problem. But the vast majority of web apps aren’t architected this way, so I find requestIdleCallback to be better as a bolt-on solution.

To be fair, this technique isn’t foolproof. Some UIs really need to respond immediately to every keypress: for instance, to disallow certain characters or resize the <textarea> as it grows. (In those cases, though, I would throttle with requestAnimationFrame.) Also, some UIs may still lag if the work they’re doing is large enough that it’s perceptible even when batched. (In the live examples above, I set an artificial delay of 70 milliseconds, which you can still “feel” with the optimized version.) But for the most part, using requestIdleCallback is enough to get rid of any major responsiveness issues.

If you want to test this on your own website, I’d recommend putting the Chrome DevTools at 6x CPU slowdown and then mashing the keyboard as fast as you can. On a vanilla <textarea> or <input> with no JavaScript handlers, you won’t see any delay. Whereas if your own website feels sluggish, then maybe it’s time to optimize your text inputs!

Handling properties in custom element upgrades

It’s been well-documented that one of the most awkward parts of working with custom elements is handling properties and attributes. In this post, I want to go a step further and talk about a tricky situation with properties and the component lifecycle.

The problem

First off, see if you can find the bug in this code:

<hello-world></hello-world>
<script src="./hello.js" type="module"></script>
<script>
  document.querySelector('hello-world').mode = 'dark'
</script>

And here’s the component we’re loading, which is just a “hello world” that switches between dark and light mode:

// hello.js
customElements.define('hello-world', class extends HTMLElement {
  constructor() {
    super()
    this.innerHTML = '<div>Hello world!</div>'
  }

  set mode (mode) {
    this.querySelector('div')
      .setAttribute('style', mode === 'light'
        ? 'background: white; color: black;'
        : 'background: black; color: white;'
    )
  }
})

Do you see it? Don’t worry if you missed it; it’s extremely subtle and took me by surprise, too.

The problem is the timing. There are two <script>s – one loading hello.js as a module, and the other setting the mode property on the <hello-world> element. The problem is that the first <script> is type="module", meaning it’s deferred by default, whereas the second is an inline script, which runs immediately. So the first script will always run after the second script.

In terms of custom elements, this means that the set mode setter will never actually get called! The HTML element goes through the custom element upgrade process after its mode has already been set, so the setter has no impact. The component is still in light mode.

Note: Curiously, this is not the case for attributes. As long as we have observedAttributes and attributeChangedCallback defined in the custom element, we’ll be able to handle any attributes that existed before the upgrade. But, in the tradition of funky differences between properties and attributes, this isn’t true of properties.

The fix

To work around this issue, the first option is to just do nothing. After all, this is kind of an odd timing issue, and you can put the onus on consumers to load the custom element script before setting any properties on it.

I find this a bit unsatisfying, though. It feels like it should work, so why shouldn’t it? And as it turns out, there is a fix.

When the custom element is defined, all existing HTML elements are upgraded. This means they go through the constructor() callback, and we can check for any existing properties in that block:

constructor() {
  /* ... */
  if (Object.prototype.hasOwnProperty.call(this, 'mode')) {
    const mode = this.mode
    delete this.mode
    this.mode = mode
  }
}

Let’s break it down step-by-step:

Object.prototype.hasOwnProperty.call(this, 'mode')

Here we check if we already have a property defined called mode. The hasOwnProperty is necessary because we’re checking if the object has its own mode as opposed to the one it gets from the class (i.e. its prototype).

The Object.prototype dance is just an ESLint-recommended safety measure. Using this.hasOwnProperty directly is probably fine too.

const mode = this.mode
delete this.mode

Next, we cache and delete the mode that was set on the object. This way, the object no longer has its own mode property.

this.mode = mode

At this point, we can just set the mode and the setter from the prototype (set mode) will be invoked.

Here is a full working example if you’re curious.

Conclusion

Properties and attributes are an awkward part of working with web components, and this is a particularly tricky situation. But it’s not impossible to work around, with just a bit of extra constructor code.

Also, you shouldn’t have to deal with this unless you’re writing your own vanilla custom element, or a wrapper around a framework. Many frameworks have built-in support for building custom elements, which means they should handle this logic automatically.

For more reading on this topic, you can check out Google’s Web Fundamentals or take a look at how Lit and Stencil handle this situation.

Why it’s okay for web components to use frameworks

Should standalone web components be written in vanilla JavaScript? Or is it okay if they use (or even bundle) their own framework? With Vue 3 announcing built-in support for building web components, and with frameworks like Svelte and Lit having offered this functionality for some time, it seems like a good time to revisit the question.

First off, I should state my own bias. When I released emoji-picker-element, I made the decision to bundle its framework (Svelte) directly into the component. Clearly I don’t think this is a bad idea (despite my reputation as a perf guy!), so I’d like to explain why it doesn’t shock me for a web component to rely on a framework.

Size concerns

Many web developers might bristle at the idea of a standalone web component relying on its own framework. If I want a date picker, or a modal dialog, or some other utility component, why should I pay the tax of including its entire framework in my bundle? But I think this is the wrong way to look at things.

First off, JavaScript frameworks have come a long way from the days when they were huge, kitchen-sink monoliths. Today’s frameworks like Svelte, Lit, Preact, Vue, and others tend to be smaller, more focused, and more tree-shakeable. A Svelte “hello world” is 1.18 kB (minified and compressed), a Lit “hello world” is 5.7 kB, and petite-vue aims for a 5.8 kB compressed size. These are not huge by any stretch of the imagination.

If you dig deeper, the situation gets even more interesting. As Evan You points out, some frameworks (such as Vue) have a relatively high baseline cost that is amortized by a small per-component size, whereas other frameworks (such as Svelte) have a lower baseline cost but a higher per-component size. The days when you could confidently say “Framework X costs Y kilobytes” are over – the conversation has become much more complex and nuanced.

Second, with code-splitting becoming more common, the individual cost of a dependency has become less important than whether it can be lazy-loaded. For instance, if you use a date picker or modal dialog that bundles its own framework, why not dynamically import() it when it actually needs to be shown? There’s no reason to pay the cost on initial page load for a component that the user may never even need.

Third, bundle size is not the only performance metric that matters. There are also considerations like runtime cost, memory overhead, and energy usage that web developers rarely consider.

Looking at runtime cost, a framework can be small, but that’s not necessarily the same thing as being fast. Sometimes it takes more code to make an algorithm faster! For example, Inferno aims for faster runtime performance at the cost of a higher bundle size when compared to something like Preact. So it’s worth considering whether a component is fast in other metrics beside bundle size.

Caveats

That said, I don’t think “bring your own framework” is without its downsides. So let’s go over some problems you may run into when you mix-and-match frameworks.

You can imagine that, if every web component came with its own framework, then you might end up with multiple copies of the same framework on the same page. And this is definitely a concern! But assuming that the component externalizes its framework dependency (e.g. import 'my-framework'), then multiple components should be able to share the same framework code under the hood.

I used this technique in my own emoji-picker-element. If you’re already using Svelte in your project, then you can import 'emoji-picker-element/svelte' and get a version that doesn’t bundle its own framework, ensuring de-duplication. This saves a paltry 1.4 kB out of 13.9 kB total (compressed), but hey, it’s there. (Potentially I could make this the default behavior, but I like the bundled version for the benefit of folks who use <script> tags instead of bundlers. Maybe something like Skypack could make this simpler in the future.)

Another potential downside of bring-your-own-framework is when frameworks mutate global state, which can lead to conflicts between frameworks. For instance, React has historically attached global event listeners to the document (although thankfully this changed in React v17). Also, Angular’s Zone.js overrides the global Object.defineProperty (although there is a workaround). When mixing-and-matching frameworks, it’s best to avoid frameworks that mutate global state, or to carefully ensure that they don’t conflict with one another.

If you look at the compiled output for a framework like Svelte, though, you’ll see that it’s basically just a collection of pure functions that don’t modify the global state. Combining such frameworks in the same codebase is no more harmful than bundling different versions of Lodash or Underscore.

Now, to be clear: in an ideal world, your web app would only contain one framework. Otherwise it’s shipping duplicate code that essentially does the same thing. But web development is all about tradeoffs, and I don’t believe that it’s worth rejecting a component out-of-hand just to avoid a few extra kBs from a tiny framework like Preact or Lit. (Of course, for a larger framework, this may be a different story. But this is true of any component dependency, not just a framework.)

Framework chauvinism

In general, I don’t think the question should be whether a component uses its own framework or not. Instead, the question should be: Is this component small enough/fast enough for my use case? After all, a component can be huge without using a framework, and it can be slow even when written in vanilla JS. The framework is part of the story, but it’s not the whole story.

I also think that focusing too much on frameworks plays against the strengths of web components. The whole point of web components is to have a standard, interoperable way to add a component to a page without worrying about what framework it’s using under the hood (or if it’s using a framework at all).

Web components also serve as a fantastic glue layer between frameworks. If there’s a great React component out there that you want to use in your Vue codebase, why not wrap it in Remount (2.4 kB) and Preact (4 kB) and call it a day? Even if you spent the time to laboriously create your own Vue version of the component, are you really sure you’ll improve upon the battle-tested version that already exists on npm?

Part of the reason I wrote emoji-picker-element as a web component (and not, for instance, as a Svelte component) is that I think it’s silly to re-implement something like an emoji picker in multiple frameworks. The core business logic of an emoji picker has nothing to do with frameworks – in fact, I think my main contribution to the emoji picker landscape was in innovating around IndexedDB, accessibility, and data loading. Should we really re-implement all of those things just to satisfy developers who want their codebase to be pure Vue, or pure Lit, or pure React, or pure whatever? Do we need an entirely new ecosystem every time a new framework comes out?

The belief that it’s unacceptable for a web app to contain more than one framework is something I might call “framework chauvinism.” And honestly, if you feel this way, then you may as well choose the framework that has the most market share and biggest ecosystem – i.e. you may as well choose React. After all, if you chose Vue or Svelte or some other less-popular framework, then you might find that when you reach for some utility component on npm, nobody has written it in your framework of choice.

Now, if you like living in a React-only world: that’s great. You can definitely do so, given how enormous the React ecosystem is. But personally, I like playing around with different frameworks, comparing their strengths and weaknesses, and letting developers use whichever one tickles their fancy. The vision of a React-only future fills me with a deep boredom. I would much rather see frameworks continue to compete and innovate and push the boundaries of what’s possible in web development than to see one framework “solve” web development forever. (Or to see frameworks locked in a perpetual ecosystem race against each other.)

To me, the main benefit of web components is that they liberate us from the tyranny of frameworks. Rather than focusing on cosmetic questions of how a component is written (did you use React? did you use Vue? who cares!), we can focus on more important questions of performance, accessibility, correctness, and things that have nothing to do with whether you use HTML templates or a render() function. Balking at web components that use frameworks is, in my opinion, missing the entire point of web components.

Thanks to Thomas Steiner and Thomas Wilburn for their thoughtful feedback on a draft of this blog post.

JavaScript performance beyond bundle size

There’s an old story about a drunk trying to find his keys in the streetlight. Why? Well, because that’s where it’s the brightest. It’s a funny story, but also relatable, because as humans we all tend to take the path of least resistance.

I think we have the same problem in the web performance community. There’s a huge focus recently on JavaScript bundle size: how big are your dependencies? Could you use a smaller one? Could you lazy-load it? But I believe we focus on bundle size first and foremost because it’s easy to measure.

That’s not to say that bundle size isn’t important! Just like how you might have left your keys in the streetlight. And heck, you might as well check there first, since it’s the quickest place to look. But here are some other things that are harder to measure, but can be just as important:

  • Parse/compile time
  • Execution time
  • Power usage
  • Memory usage
  • Disk usage

A JavaScript dependency can affect all of these metrics. But they’re less discussed than bundle size, and I suspect it’s because they’re less straightforward to measure. In this post, I want to talk about how I approach bundle size, and how I approach the other metrics too.

Bundle size

When talking about the size of JavaScript code, you have to be precise. Some folks will say “my library is 10 kilobytes.” Is that minified? Gzipped? Tree-shaken? Did you use the highest Gzip setting (9)? What about Brotli compression?

This may sound like hair-splitting, but the distinction actually matters, especially between compressed and uncompressed size. The compressed size affects how fast it is to send bytes over the wire, whereas the uncompressed size affects how long it takes the browser to parse, compile, and execute the JavaScript. (These tend to correlate with code size, although it’s not a perfect predictor.)

The most important thing, though, is to be consistent. You don’t want to measure Library A using unminified, uncompressed size versus Library B using minified and compressed size (unless there’s a real difference in how you’re serving them).

Bundlephobia

For me, Bundlephobia is the Swiss Army knife of bundle size analysis. You can look up any dependency from npm and it will tell you both the minified size (what the browser parses and executes) as well as the minified and compressed size (what the browser downloads).

For instance, we can use this tool to see that react-dom weighs 121.1kB minified, but preact weighs 10.2kB. So we can confirm that Preact really is the honest goods – a React-compatible framework at a fraction of the size!

In this case, I don’t get hung up on exactly which minifier or exactly what Gzip compression level Bundlephobia is using, because at least it’s using the same system everywhere. So I know I’m comparing apples to apples.

Now that said, there are some caveats with Bundlephobia:

  1. It doesn’t tell you the tree-shaken cost. If you’re only importing one part of a module, the other parts may be tree-shaken out.
  2. It won’t tell you about subdirectory dependencies. So for instance, I know how expensive it is to import 'preact', but import 'preact/compat' could be literally anything – compat.js could be a huge file, and I’d have no way to know.
  3. If there are polyfills involved (e.g. your bundler injecting a polyfill for Node’s Buffer API, or for the JavaScript Object.assign() API), you won’t necessarily see it here.

In all the above cases, you really just have to run your bundler and check the output. Every bundler is different, and depending on the configuration or other factors, you might end up with a huge bundle or a tiny one. So next, let’s move on to the bundler-specific tools.

Webpack Bundle Analyzer

I love Webpack Bundle Analyzer. It offers a nice visualization of every chunk in your Webpack output, as well as which modules are inside of those chunks.

Screenshot of Webpack Bundle Analyer showing a list of modules and sizes on the left and a visual tree map of modules and sizes on the right, where the module is larger if it has a greater size, and modules-within-modules are also shown proportionally

In terms of the sizes it shows, the two most useful ones are “parsed” (the default) and “Gzipped”. “Parsed” essentially means “minified,” so these two measurements are roughly comparable with what Bundlephobia would tell us. But the difference here is that we’re actually running our bundler, so we know that the sizes are accurate for our particular application.

Rollup Plugin Analyzer

For Rollup, I would really love to have a graphical interface like Webpack Bundle Analyzer. But the next best thing I’ve found is Rollup Plugin Analyer, which will output your module sizes to the console while building.

Unfortunately, this tool doesn’t give us the minified or Gzipped size – just the size as seen by Rollup before such optimizations occur. It’s not perfect, but it’s great in a pinch.

Other bundle size tools

Other tools I’ve dabbled with and found useful:

I’m sure you can find other tools to add to this list!

Beyond the bundle

As I mentioned, though, I don’t think JavaScript bundle size is everything. It’s great as a first approximation, because it’s (comparatively) easy to measure, but there are plenty of other metrics that can impact page performance.

Runtime CPU cost

The first and most important one is the runtime cost. This can be broken into a few buckets:

  • Parsing
  • Compilation
  • Execution

These three phases are basically the end-to-end cost of calling require("some-dependency") or import "some-dependency". They may correlate with bundle size, but it’s not a one-to-one mapping.

For a trivial example, here is a (tiny!) JavaScript snippet that consumes a ton of CPU:

const start = Date.now()
while (Date.now() - start < 5000) {}

This snippet would get a great score on Bundlephobia, but unfortunately it will block the main thread for 5 seconds. This is a somewhat absurd example, but in the real world, you can find small libraries that nonetheless hammer the main thread. Traversing through all elements in the DOM, iterating through a large array in LocalStorage, calculating digits of pi… unless you’ve hand-inspected all your dependencies, it’s hard to know what they’re doing in there.

Parsing and compilation are both really hard to measure. It’s easy to fool yourself, because browsers have lots of optimizations around bytecode caching. For instance, browsers might not run the parse/compile step on second page load, or third page load (!), or when the JavaScript is cached in a Service Worker. So you might think a module is cheap to parse/compile, when really the browser has just cached it in advance.

Screenshot from Chrome DevTools showing main thread with Compilation followed by execution of some JavaScript anonymous call stacks

Compilation and execution in Chrome DevTools. Note that Chrome does some parsing and compilation off-main-thread.

The only way to be 100% safe is to completely clear the browser cache and measure first page load. I don’t like to mess around, so typically I will do this in a private/guest browsing window, or in a completely separate browser. You’ll also want to make sure that any browser extensions are disabled (private mode typically does this), since those extensions can impact page load time. You don’t want to get halfway into analyzing a Chrome trace and realize that you’re measuring your password manager!

Another thing I usually do is set Chrome’s CPU throttling to 4x or 6x. I think of 4x as “similar enough to a mobile device,” and 6x as “a super-duper slowed-down machine that makes the traces much easier to read, because everything is bigger.” Use whichever one you want; either will be more representative of real users than your (probably) high-end developer machine.

If I’m concerned about network speed, this is the point where I would turn on network throttling as well. “Fast 3G” is usually a good one that hits the sweet spot between “more like the real world” and “not so slow that I start yelling at my computer.”

So putting it all together, my steps for getting an accurate trace are typically:

  1. Open a private/guest browsing window.
  2. Navigate to about:blank if necessary (you don’t want to measure the unload event for your browser home page).
  3. Open the DevTools in Chrome.
  4. Go to the Performance tab.
  5. In the settings, turn on CPU throttling and/or network throttling.
  6. Click the Record button.
  7. Type the URL and press Enter.
  8. Stop recording when the page has loaded.

Screenshot of Chrome DevTools showing a page on about:blank, the CPU Throttling set to 6x, Network Throttling set to Fast 3G, and in a guest browsing window with no extensions

Now you have a performance trace (also known as a “timeline” or “profile”), which will show you the parse/compile/execution times for the JavaScript code in your initial page load. Unfortunately this part can end up being pretty manual, but there are some tricks to make it easier.

Most importantly, use the User Timing API (aka performance marks and measures) to mark parts of your web application with names that are meaningful to you. Focus on parts that you worry will be expensive, such as the initial render of your root application, a blocking XHR call, or bootstrapping your state object.

You can strip out performance.mark/performance.measure calls in production if you’re worried about the (small) overhead of these APIs. I like to turn it on or off based on query string parameters, so that I can easily turn on user timings in production if I want to analyze the production build. Terser’s pure_funcs option can also be used to remove performance.mark and performance.measure calls when you minify. (Heck, you can remove console.logs here too. It’s very handy.)

Another useful tool is mark-loader, which is a Webpack plugin that automatically wraps your modules in mark/measure calls so that you can see each dependency’s runtime cost. Why try to puzzle over a JavaScript call stack, when the tool can tell you exactly which dependencies are consuming exactly how much time?

Screenshot of Chrome DevTools showing User Timing section with bars marked for Three, Moment, and React. The JavaScript callstacks underneath mostly say "anonymous"

Loading Three.js, Moment, and React in production mode. Without the User Timings, would you be able to figure out where the time is being spent?

One thing to be aware of when measuring runtime performance is that the costs can vary between minified and unminified code. Unused functions may be stripped out, code will be smaller and more optimized, and libraries may define process.env.NODE_ENV === 'development' blocks that don’t run in production mode.

My general strategy for dealing with this situation is to treat the minified, production build as the source of truth, and to use marks and measures to make it comprehensible. As mentioned, though, performance.mark and performance.measure have their own small overhead, so you may want to toggle them with query string parameters.

Power usage

You don’t have to be an environmentalist to think that minimizing power use is important. We live in a world where people are increasingly browsing the web on devices that aren’t plugged into a power outlet, and the last thing they want is to run out of juice because of a misbehaving website.

I tend to think of power usage as a subset of CPU usage. There are some exceptions to this, like waking up the radio for a network connection, but most of the time, if a website is consuming excessive power, it’s because it’s consuming excessive CPU on the main thread.

So everything I’ve said above about improving JavaScript parse/compile/execute time will also reduce power consumption. But for long-lived web applications especially, the most insidious form of power drain comes after first page load. This might manifest as a user suddenly noticing that their laptop fan is whirring or their phone is growing hot, even though they’re just looking at an (apparently) idle webpage.

Once again, the tool of choice in these situations is the Chrome DevTools Performance tab, using essentially the same steps described above. What you’ll want to look for, though, is repeated CPU usage, usually due to timers or animations. For instance, a poorly-coded custom scrollbar, an IntersectionObserver polyfill, or an animated loading spinner may decide that they need to run code in every requestAnimationFrame or in a setInterval loop.

Screenshot of Chrome DevTools showing little peaks of yellow JavaScript usage periodically in the timeline

A poorly-behaved JavaScript widget. Notice the little peaks of JavaScript usage, showing constant CPU usage even while the page is idle.

Note that this kind of power drain can also occur due to unoptimized CSS animations – no JavaScript required! (In that case, it would be purple peaks rather than yellow peaks in the Chrome UI.) For long-running CSS animations, be sure to always prefer GPU-accelerated CSS properties.

Another tool you can use is Chrome’s Performance Monitor tab, which is actually different from the Performance tab. I see this as a sort of heartbeat monitor of how your website is doing perf-wise, without the hassle of manually starting and stopping a trace. If you see constant CPU usage here on an otherwise inert webpage, then you probably have a power usage problem.

Screenshot of Chrome Performance Monitor showing steady 8.4% cpu usage on a chart, along with a chart of memory usage in a sawtooth pattern, going up and down

The same poorly-behaved JavaScript widget in Performance Monitor. Note the constant low hum of CPU usage, as well as the sawtooth pattern in the memory usage, indicating memory constantly being allocated and de-allocated.

Also: hat tip to the WebKit folks, who added an explicit Energy Impact panel to the Safari Web Inspector. Another good tool to check out!

Memory usage

Memory usage is something that used to be much harder to analyze, but the tooling has improved a lot recently.

I already wrote a post about memory leaks last year, but it’s important to remember that memory usage and memory leaks are two separate problems. A website can have high memory usage without explicitly leaking memory. Whereas another website could start small, but eventually balloon to a huge size due to runaway leaks.

You can read the above blog post for how to analyze memory leaks. But in terms of memory usage, we have a new browser API that helps quite a bit with measuring it: performance.measureUserAgentSpecificMemory (formerly performance.measureMemory, which sadly was much less of a mouthful). There are several advantages of this API:

  1. It returns a promise that automatically resolves after garbage collection. (No more need for weird hacks to force GC!)
  2. It measures more than just JavaScript VM size – it also includes DOM memory as well as memory in web workers and iframes.
  3. In the case of cross-origin iframes, which are process-isolated due to Site Isolation, it will break down the attribution. So you can know exactly how memory-hungry your ads and embeds are!

Here is a sample output from the API:

{
  "breakdown": [
    {
      "attribution": ["https://pinafore.social/"],
      "bytes": 755360,
      "types": ["Window", "JS"]
    },
    {
      "attribution": [],
      "bytes": 804322,
      "types": ["Window", "JS", "Shared"]
    }
  ],
  "bytes": 1559682
}

In this case, bytes is the banner metric you’ll want to use for “how much memory am I using?” The breakdown is optional, and the spec explicitly notes that browsers can decide not to include it.

That said, it can still be finicky to use this API. First off, it’s only available in Chrome 89+. (In slightly older releases, you can set the “enable experimental web platform features” flag and use the old performance.measureMemory API.) More problematic, though, is that due to the potential for abuse, this API has been limited to cross-origin isolated contexts. This effectively means that you have to set some special headers, and if you rely on any cross-origin resources (external CSS, JavaScript, images, etc.), they’ll need to set some special headers too.

If that sounds like too much trouble, though, and if you only plan to use this API for automated testing, then you can run Chrome with the --disable-web-security flag. (At your own risk, of course!) Note, though, that measuring memory currently doesn’t work in headless mode.

Of course, this API also doesn’t give you a great level of granularity. You won’t be able to figure out, for instance, that React takes up X number of bytes, and Lodash takes up Y bytes, etc. A/B testing may be the only effective way to figure that kind of thing out. But this is still much better than the older tooling we had for measuring memory (which is so flawed that it’s really not even worth describing).

Disk usage

Limiting disk usage is most important in web application scenarios, where it’s possible to reach browser quota limits depending on the amount of available storage on the device. Excessive storage usage can come in many forms, such as stuffing too many large images into the ServiceWorker cache, but JavaScript can add up too.

You might think that the disk usage of a JavaScript module is a direct correlate of its bundle size (i.e. the cost of caching it), but there are some cases were this isn’t true. For instance, with my own emoji-picker-element, I make heavy use of IndexedDB to store the emoji data. This means I have to be cognizant of database-related disk usage, such as storing unnecessary data or creating excessive indexes.

Screenshot of Chrome DevTools Application tab under "Clear Storage" which a pie chart showing megabytes taken up in Cache Storage as well as IndexedDB, and a button saying "Clear Storage"

The Chrome DevTools has an “Application” tab which shows the total storage usage for a website. This is pretty good as a first approximation, but I’ve found that this screen can be a little bit inconsistent, and also the data has to be gathered manually. Plus, I’m interested in more than just Chrome, since IndexedDB has vastly different implementations across browsers, so the storage size could vary wildly.

The solution I landed on is a small script that launches Playwright, which is a Puppeteer-like tool that has the advantage of being able to launch more browsers than just Chrome. Another neat feature is that it can launch browsers with a fresh storage area, so you can launch a browser, write storage to /tmp, and then measure the IndexedDB usage for each browser.

To give you an example, here is what I get for the current version of emoji-picker-element:

Browser IndexedDB directory size
Chromium 2.13 MB
Firefox 1.37 MB
WebKit 2.17 MB

Of course, you would have to adapt this script if you wanted to measure the storage size of the ServiceWorker cache, LocalStorage, etc.

Another option, which might work better in a production environment, would be the StorageManager.estimate() API. However, this is designed more for figuring out if you’re approaching quota limits rather than performance analysis, so I’m not sure how accurate it would be as a disk usage metric. As MDN notes: “The returned values are not exact; between compression, deduplication, and obfuscation for security reasons, they will be imprecise.”

Conclusion

Performance is a multi-faceted thing. It would be great if we could reduce it down to a single metric such as bundle size, but if you really want to cover all the bases, there are a lot of different angles to consider.

Sometimes this can feel overwhelming, which is why I think initiatives like the Core Web Vitals, or a general focus on bundle size, aren’t such a bad thing. If you tell people they need to optimize a dozen different metrics, they may just decide not to optimize any of them.

That said, for JavaScript dependencies in particular, I would love if it were easier to see all of these metrics at a glance. Imagine if Bundlephobia had a “Nutrition Facts”-type view, with bundle size as the headline metric (sort of like calories!), and all the other metrics listed below. It wouldn’t have to be precise: the numbers might depend on the browser, the size of the DOM, how the API is used, etc. But you could imagine some basic stats around initial CPU execution time, memory usage, and disk usage that wouldn’t be impossible to measure in an automated way.

If such a thing existed, it would be a lot easier to make informed decisions about which JavaScript dependencies to use, whether to lazy-load them, etc. But in the meantime, there are lots of different ways of gathering this data, and I hope this blog post has at least encouraged you to look a little bit beyond the streetlight.

Thanks to Thomas Steiner and Jake Archibald for feedback on a draft of this blog post.

Managing focus in the shadow DOM

One of the trickiest things about the shadow DOM is that it subverts web developers’ expectations about how the DOM works. In the normal rules of the game, document.querySelectorAll('*') grabs all the elements in the DOM. With the shadow DOM, though, it doesn’t work that way: shadow elements are encapsulated.

Other classic DOM APIs, such as element.children and element.parentElement, are similarly unable to traverse shadow boundaries. Instead, you have to use more esoteric APIs like element.shadowRoot and getRootNode(), which didn’t exist before shadow DOM came onto the scene.

In practice, this means that a lot of JavaScript libraries designed for the pre-shadow DOM era might not work well if you’re using web components. This comes up more often than you might think.

The problem

For example, sometimes you want to iterate through all the tabbable elements on a page. Maybe you’re doing this because you want to build a focus trap for a modal dialog, or because you’re implementing arrow key navigation for KaiOS devices.

Now, without doing anything, elements inside of the shadow DOM are already focusable or tabbable just like any other element on the page. For instance, with my own emoji-picker-element, you can tab through its <input>, <button>s, etc.:

 

When implementing a focus trap or arrow key navigation, we want to preserve this existing behavior. So the first challenge is to emulate whatever the browser normally does when you press Tab or Shift+Tab. In this case, shadow DOM makes things a bit more complicated because you can’t just use a straightforward querySelectorAll() (or other pre-shadow DOM iteration techniques) to find all the tabbable elements.

Pedantic note: an element can be focusable but not tabbable. For instance, when using tabindex="-1", an element can be focused when clicking, but not when tabbing through the page.

While researching this, I found that a lot of off-the-shelf JavaScript libraries for focus management don’t handle the shadow DOM properly. For example, focusable provides a query selector string that you’re intended to use like so:

import focusable from 'focusable'

document.querySelectorAll(focusable)

Unfortunately, this can’t reach inside the shadow DOM, so it won’t work for something like emoji-picker-element. Bummer.

To be fair to focusable, though, many other libraries in the same category (focus traps, “get all tabbable elements,” accessible dialogs, etc.) have the same problem. So in this post, I’d like to explain what these libraries would need to do to support shadow DOM.

The solution

I’ve written a couple of JavaScript packages that deal with shadow DOM: kagekiri, which implements querySelectorAll() in a way that can traverse shadow boundaries, and arrow-key-navigation, which makes the left and right keys change focus.

To understand how these libraries work, let’s first understand the problem we’re trying to solve. In a non-shadow DOM context, what does this do?

document.querySelectorAll('*')

If you answered “grab all the elements in the DOM,” you’re absolutely right. But more importantly: what order are the elements returned in? It turns out that they’re returned in a depth-first tree traversal order, which is crucial because this is the same order as when the user presses Tab or Shift+Tab to change focus. (Let’s ignore positive tabindex values for the moment, which are an anti-pattern anyway.)

In the case of shadow DOM, we want to maintain this depth-first order, while also piercing into the shadow DOM for any shadow roots we encounter. Essentially, we want to pretend that the shadow DOM doesn’t exist.

There are a few ways you can do this. In kagekiri, I implemented a depth-first search myself, whereas in arrow-key-navigation, I used a TreeWalker, which is a somewhat obscure API that traverses elements in depth-first order. Either way, the main insight is that you need a way to enumerate a node’s “shadow children” as well as its actual children (which can be mixed together in the case of slotted elements). You also need to be able to run the reverse logic: finding the “light” parent of a shadow tree. And of course, this has to be recursive, since shadow roots can be nested within other shadow roots.

Rather than bore you with the details, suffice it to say that you need roughly a dozen lines of code, both for enumerating an element’s children and finding an element’s parent. In the non-shadow DOM world, these would be equivalent to a simple element.children and element.parentElement respectively.

Why the browser should handle this

Here’s the thing: I don’t particularly want to explain every line of code required for this exercise. I just want to impress upon you that this is a lot of heavy lifting for something that should probably be exposed as a web API. It feels silly that the browser knows perfectly well which element it would focus if I typed Tab or Shift+Tab, but as a web developer I have to reverse-engineer this behavior.

You might say that I’m missing the whole point of shadow DOM: after all, encapsulation is one of its major selling points. But I’d counter that a lot of folks are using shadow DOM because it’s the only way to get native CSS encapsulation (similar to “scoped” CSS in frameworks like Vue and Svelte), not necessarily DOM API encapsulation. So the fact that it breaks querySelectorAll() is a downside rather than an upside.

Here’s a sketch of my dream API:

element.getNextTabbableElement()
element.getPreviousTabbableElement()

Perhaps, like getRootNode(), these APIs could also offer an option for whether or not you want to pierce the shadow boundary. In any case, an API like this would obviate the need for the hacks described in this post.

I’d argue that browsers should provide such an API not only because of shadow DOM, but also because of built-in elements like <video> and <audio>. These behave like closed-shadow roots, in that they contain tabbable elements (i.e. the pause/play/track controls), but you can’t reach inside to manipulate them.

Screenshot of GNOME Web (WebKit) browser on an MDN video element demo page showing the dev tools open with a closed use agent shadow content for the controls of the video

WebKit’s developer tools helpfully shows the video controls as “shadow content (user agent).” You can look, but you can’t touch!

As far as I know, there’s no way to implement a WAI-ARIA compliant modal dialog with a standard <video controls> or <audio controls> inside. Instead, you would have to build your own audio/video player from scratch.

Brief aside: dialog element

There is the native <dialog> element now implemented in Chrome, and it does come with a built-in focus trap if you use showModal(). And this focus trap actually handles shadow DOM correctly, including closed shadow roots like <video controls>!

Unfortunately, though, it doesn’t quite follow the WAI-ARIA guidelines. The problems are that 1) closing the dialog doesn’t return focus to the previously focused element in the document, and 2) the focus trap doesn’t “cycle” through tabbable elements in the modal – instead, focus escapes to the browser chrome itself.

The first issue is irksome but not impossible to solve: you just have to listen for dialog open/close events and keep track of document.activeElement. It’s even possible to patch the correct behavior onto the native <dialog> element. (Shadow DOM, of course, makes this more complicated because activeElement can be nested inside shadow roots. I.e., you have to keep drilling into document.activeElement.shadowRoot.activeElement, etc.).

As for the second issue, it might not be considered a dealbreaker – at least the focus is trapped, even if it’s not completely compliant with WAI-ARIA. But it’s still disappointing that we can’t just use the <dialog> element as-is and get a fully accessible modal dialog, per the standard definition of “accessible.”

Update: After publishing this post, Chris Coyier clued me in to the inert attribute. Although it’s not shipped in any browser yet, I did write a demo of building a modal dialog with this API. After testing in Chrome and Firefox with the right flags enabled, though, it looks like the behavior is similar to <dialog> – focus is correctly trapped, but escapes to the browser chrome itself.

Second update: After an informal poll of users of assistive technologies, the consensus seems to be that having focus escape to the browser chrome is not ideal, but not a show-stopper as long as you can Shift+Tab to get back into the dialog. So it looks like when inert or <dialog> are more widely available in browsers, that will be the only way to deal with <video controls> and <audio controls> in a focus trap.

Last update (I promise!): Native <dialog> also seems to be the only way to have the Esc key dismiss the modal while focus is inside the <video>/<audio> controls.

Conclusion

Handling focus inside of the shadow DOM is not easy. Managing focus in the DOM has never been particularly easy (see the source code for any accessible dialog component for an example), and shadow DOM just makes things that much trickier by complicating a basic routine like DOM traversal.

Normally, DOM traversal is the kind of straightforward exercise you’d expect to see in a web dev job interview. But once you throw shadow DOM into the mix, I’d expect most working web developers to be unable to come up with the correct algorithm off the tops of their heads. (I know I can’t, and I’ve written it twice.)

As I’ve said in a previous post, though, I think it’s still early days for web components and shadow DOM. Blog posts like this are my attempt to sketch out the current set of problems and working solutions, and to try to point toward better solutions. Hopefully the ecosystem and browser APIs will eventually adapt to support shadow DOM and focus management more broadly.

More discussion about native <dialog> and Tab behavior can be found in this issue.

Thanks to Thomas Steiner and Sam Thorogood for feedback on a draft of this post.

Options for styling web components

When I released emoji-picker-element last year, it was my first time writing a general-purpose web component that could be dropped in to any project or framework. It was also my first time really kicking the tires on shadow DOM.

In the end, I think it was a natural fit. Web components are a great choice when you want something to be portable and self-contained, and an emoji picker fits the bill: you drop it onto a page, maybe with a button to launch it, and when the user clicks an emoji, you insert it into a text box somewhere.

Screenshot of an emoji picker with a search box, a row of emoji categories, and a grid of emoji

What wasn’t obvious to me, though, was how to allow users to style it. What if they wanted a different background color? What if they wanted the emoji to be bigger? What if they wanted a different font for the input field?

This led me to researching all the different ways that a standalone web component can expose a styling API. In this post, I’ll go over each strategy, as well as its strengths and weaknesses from my perspective.

Shadow DOM basics

(Feel free to skip this section if you already know how shadow DOM works.)

The main benefit of shadow DOM, especially for a standalone web component, is that all of your styling is encapsulated. Like JavaScript frameworks that automatically do scoped styles (such as Vue or Svelte), any styles in your web component won’t bleed out into the page, and vice versa. So you’re free to pop in your favorite resets, like so:

* {
  box-sizing: border-box;
}

button {
  cursor: pointer;
}

Another impact of shadow DOM is that DOM APIs cannot “pierce” the shadow tree. So for instance, document.querySelectorAll('button') won’t list any buttons inside of the emoji picker.

Brief interlude: open vs closed shadow DOM

There are two types of shadow DOM: open and closed. I decided to go with open, and all of the examples below assume an open shadow DOM.

In short, closed shadow DOM does not seem to be heavily used, and the drawbacks seem to outweight the benefits. Basically, “open” mode allows some limited JavaScript API access via element.shadowRoot (for instance, element.shadowRoot.querySelectorAll('button') will find the buttons), whereas “closed” mode blocks off all JavaScript access (element.shadowRoot is null). This may sound like a security win, but really there are plenty of workarounds even for closed mode.

So closed mode leaves you with an API surface that is harder to test, doesn’t give users an escape hatch (see below), and doesn’t offer any security benefits. Also, the framework I chose to use, Svelte, uses open mode by default and doesn’t have an option to change it. So it didn’t seem worth the hassle.

Styling strategies

With that brief tangent out of the way, let’s get back to styling. You may have noticed in the above discussion that shadow DOM seems pretty isolated – no styles go in, no styles get out. But there are actually some well-defined ways that styles can be tweaked, and these give you the opportunity to offer an ergonomic styling API to your users. Here are the main options:

  1. CSS variables (aka custom properties)
  2. Classes
  3. Shadow parts
  4. The “escape hatch” (aka inject whatever CSS you want)

One thing to note is that these strategies aren’t “either/or.” You can happily mix all of them in the same web component! It just depends on what makes the most sense for your project.

Option 1: CSS variables (aka custom properties)

For emoji-picker-element, I chose this option as my main approach, as it had the best browser support at the time and actually worked surprisingly well for a number of use cases.

The basic idea is that CSS variables can actually pierce the shadow DOM. No, really! You can have a variable defined at the :root, and then use that variable inside any shadow DOM you like. CSS variables at the :root are effectively global across the entire document (sort of like window in JavaScript).

Here is a CodePen to demonstrate:

Early on, I started to think of some useful CSS variables for my emoji picker:

  • --background to style the background color
  • --emoji-padding to style the padding around each emoji
  • --num-columns to choose the number of columns in the grid

These actually work! In fact, these are some of the variables I actually exposed in emoji-picker-element. Even --num-columns works, thanks to the magic of CSS grid.

However, it would be pretty awful if you had multiple web components on your page, and each of them had generic-sounding variables like --background that you were supposed to define at the :root. What if they conflicted?

Conflicting CSS variables

One strategy for dealing with conflicts is to prefix the variables. This is how Lightning Web Components, the framework we build at Salesforce, does it: everything is prefixed by --lwc-.

I think this makes sense for a design system, where multiple components on a page may want to reference the same variable. But for a standalone component like the emoji picker, I opted for a different strategy, which I picked up from Ionic Framework.

Take a look at Ionic’s button component and modal component. Both of them can be styled with the generic CSS property --background. But what if you want a different background for each? Not a problem!

Here is a simplified example of how Ionic is doing it. Below, I have a foo-component and a bar-component. They each have a different background color, but both are styled with the --background variable:

From the user’s perspective, the CSS is quite intuitive:

foo-component {
  --background: hotpink;
}

bar-component {
  --background: lightsalmon;
}

And if these variables are defined anywhere else, for instance at the :root, then they don’t affect the components at all!

:root {
  --background: black; /* does nothing */
}

Instead, the components revert back to the default background colors they each defined (in this case, lightgreen and lightblue).

How is this possible? Well, the trick is to declare the default value for the variable using the :host() pseudo-class from within the shadow DOM. For instance:

/* inside the shadow DOM */
:host {
  --background: lightblue; /* default value */
}

And then outside the shadow DOM, this can be overridden by targeting the foo-component, because it trumps :host in terms of CSS specificity:

/* outside the shadow DOM */
foo-component {
  --background: hotpink; /* overridden value */
}

It’s a bit hard to wrap your head around, but for anyone using your component, it’s very straightforward! It’s also unlikely to run into any conflicts, since you’d have to be targeting the custom element itself, not any of its ancestors or the :root.

Of course, this may not be enough reassurance for you. Users can still shoot themselves in the foot by doing something like this:

* {
  --background: black; /* this will override the default */
}

So if you’re worried, you can prefix the CSS variables as described above. I personally feel that the risk is pretty low, but it remains to be seen how the web component ecosystem will shake out.

When you don’t need CSS variables

As it turns out, CSS variables aren’t the only case where styles can leak into the shadow DOM: inheritable properties like font-family and color will also seep in.

For something like fonts, you can use this to your advantage by… doing nothing. Yup, just leave your spans and inputs unstyled, and they will match whatever font the surrounding page is using. For my own case, this was just one less thing I had to worry about.

If inheritable properties are a problem for you, though, you can always reset them.

Option 2: classes

Building on the previous section, we get to the next strategy for styling web components: classes. This is another one I used in emoji-picker-element: if you want to toggle dark mode or light mode, it’s as simple as adding a CSS class:

<emoji-picker class="dark"></emoji-picker>
<emoji-picker class="light"></emoji-picker>

(It will also default to the right one based on prefers-color-scheme, but I figured people might want to customize the default behavior.)

Once again, the trick here is to use the :host pseudo-class. In this case, we can pass another selector into the :host() pseudo-class itself:

:host(.dark) {
  background: black;
}
:host(.light) {
  background: white;
}

And here is a CodePen showing it in action:

Of course, you can also mix this approach with CSS variables: for instance, defining --background within the :host(.dark) block. Since you can put arbitrary CSS selectors inside of :host(), you can also use attributes instead of classes, or whatever other approach you’d like.

One potential downside of classes is that they can also run into conflicts – for instance, if the user has dark and light classes already defined elsewhere in their CSS. So you may want to avoid this technique, or use prefixes, if you’re concerned by the risk.

Option 3: shadow parts

CSS shadow parts are a newer spec, so the browser support is not as widespread as CSS variables (still pretty good, though, and improving daily).

The idea of shadow parts is that, as a web component author, you can define “parts” of your component that users can style. CSS Tricks has a good breakdown, but the basic gist is this:

/* outside the shadow DOM */
custom-component::part(foo) {
  background: lightgray;
}
<!-- inside the shadow DOM -->
<span part="foo">
  My background will be lightgray!
</span>

And here is a demo:

I think this strategy is fine, but I actually didn’t end up using it for emoji-picker-element (not yet, anyway). Here is my thought process.

Downsides of shadow parts

First off, it’s hard to decide which “parts” of a web component should be styleable. In the case of the emoji picker, should it be the emoji themselves? What about the skin tone picker, which also contains emoji? What are the right boundaries and names here? (This is not an unsolvable problem, but naming things is hard!)

To be fair, this same criticism could also be applied to CSS variables: naming variables is still hard! But as it turns out, I already use CSS variables to organize my code internally; it just jives with my own mental model. So exposing them publicly didn’t involve a lot of extra naming for me.

Second, by offering a ::part API, I actually lock myself in to certain design decisions, which isn’t necessarily the case with CSS variables. For instance, consider the --emoji-padding variable I use to control the padding around an emoji. The equivalent way of doing this with shadow parts might be:

emoji-picker::part(emoji) {
  padding: 2em;
}

But now, if I ever decide to define the padding some other way (e.g. through width or implicit positioning), or if I decide I actually want a wrapper div to handle the padding, I could potentially break anyone who is styling with the ::part API. Whereas with CSS variables, I can always redefine what --emoji-padding means using my own internal logic.

In fact, this is exactly what I do in emoji-picker-element! The --emoji-padding is not a padding at all, but rather part of a calc() statement that sets the width. I did this for performance reasons – it turned out to be faster (in Chrome anyway) to have fixed cell sizes in a CSS grid. But the user doesn’t have to know this; they can just use --emoji-padding without caring how I implemented it.

Finally, shadow parts expand the API surface in ways that make me a bit uncomfortable. For instance, the user could do something like:

emoji-picker::part(emoji) {
  margin: 1em;
  animation: 1s infinite some-animation;
  display: flex;
  position: relative;
}

With CSS shadow parts, there are just a lot of unexpected ways I could break somebody’s code by changing one of these properties. Whereas with CSS variables, I can explicitly define what I want users to style (such as the padding) and what I don’t (such as display). Of course, I could use semantic versioning to try to communicate breaking changes, but at this point any CSS change on any ::part is potentially breaking.

In (mild) defense of shadow parts

That said, I can definitely see where shadow parts have their place. If you look at my colleague Greg Whitworth’s Open UI definition of a <select> element, it has well-defined parts for everything that makes up a <select>: the button, the listbox, the options, etc. In fact, one of the main goals of the project is to standardize these parts across frameworks and specs. For this kind of situation, shadow parts are a natural fit.

Shadow parts also increase the expressiveness of the user’s CSS: the same thing that makes me squeamish about breaking changes is also what allows users to go hog-wild with styling any ::part however they choose. Personally, I like to restrict the API surface of the code that I ship, but there is an inherent tradeoff here between customizability and breakability, so I don’t believe there is one right answer.

Also, although I organized my own internal styling with CSS variables, it’s actually possible to do this with shadow parts as well. Inside the shadow DOM, you can use :host::part(foo), and it works as expected. This can make your shadow parts less brittle, since they’re not just a public API but also used internally. Here is an example:

Update: as of this writing (November 2021), this only works in Firefox and WebKit Nightly. Here are the tracking bugs for Chromium and WebKit. In the meantime, another option is the attribute selector: [part="foo"].

Once again: these strategies aren’t “either/or.” If you’d like to use a mix of variables, parts, and classes, then by all means, go ahead! You should use whatever feels most natural for the project you’re working on.

Option 4: the escape hatch

The final strategy I’ll mention is what I’ll call “the escape hatch.” This is basically a way for users to bolt whatever CSS they want onto your custom element, regardless of any other styling techniques you’re using. It looks like this:

const style = document.createElement('style')
style.innerHTML = 'h1 { font-family: "Comic Sans"; }'
element.shadowRoot.appendChild(style)

Because of the way open shadow DOM works, users aren’t prevented from appending <style> tags to the shadowRoot. So using this technique, they always have an “escape hatch” in case the styling API you expose doesn’t meet their needs.

This strategy is probably not the one you want to lean on as your primary styling interface, but it is kind of nice that it always exists. This means that users are never frustrated that they can’t style something – there’s always a workaround.

Of course, this technique is also fragile. If anything changes in your component’s DOM structure or CSS classes, then the user’s code may break. But since it’s obvious to the user that they’re using a loophole, I think this is acceptable. Appending your own <style> tag is clearly a “you broke it, you bought it” kind of situation.

Conclusion

There are lots of ways to expose a styling API on your web component. As the specs mature, we’ll probably have even more possibilities with constructable stylesheets or themes (the latter is apparently defunct, but who knows, maybe it will inspire another spec?). Like a lot of things on the web, there is more than one way to do it.

Web components have a bit of a bad rap, and I think the criticisms are mostly justified. But I also think it’s early days for this technology, and we’re still figuring out where web components work well and where they need to improve. No doubt the web components of 2021 will be even better than those of 2020, due to improved browser specs as well as the community’s understanding of how best to use this tool.

Thanks to Thomas Steiner for feedback on a draft of this blog post.

Building an accessible emoji picker

In my previous blog post, I introduced emoji-picker-element, a custom element that acts as an emoji picker. In the post, I said that accessibility “shouldn’t be an afterthought,” and made it clear that I took accessibility seriously.

But I didn’t explain how I actually made the component accessible! In this post I’ll try to make up for that oversight.

Reduce motion

If you have motion-based animations in your web app, one of the easiest things you can do to improve accessibility is to disable them with the prefers-reduced-motion CSS media query. (Note that this applies to transform animations but not opacity animations, as it’s my understanding that opacity changes don’t cause nausea for those with vestibular disorders.)

There were two animations in emoji-picker-element that I had to consider: 1) a small “indicator” under the currently-selected tab button, which moves as you click tab buttons, and 2) the skin tone dropdown, which slides down when you click it.

 

For the tab indicator, the fix was fairly simple:

.indicator {
  will-change: opacity, transform;
  transition: opacity 0.1s linear, transform 0.25s ease-in-out;
}

@media (prefers-reduced-motion: reduce) {
  .indicator {
    will-change: opacity;
    transition: opacity 0.1s linear;
  }
}

Note that there is also an opacity transition on this element (which plays when search results appear or disappear), so there is a bit of unfortunate repetition here. But the core idea is to remove the transform animation.

For the skin tone dropdown, the fix was a bit more complicated. The reason is that I have a JavaScript transitionend event listener on the element:

element.addEventListener('transitionend', listener);

If I were to remove the transform animation completely, then this listener would never fire. So I borrowed a technique from the cssremedy project, which looks like this:

@media (prefers-reduced-motion: reduce) {
  .skintone-list {
    transition-duration: 0.001s;
  }
}

Based on my testing in Safari, Firefox, and Chrome, this effectively removes the animation while ensuring that transitionend still fires. (There are other tricks mentioned in that thread, but I found that this solution was sufficient for my use case.)

With these fixes in place, the potentially-nauseating animations are removed for those who prefer things that way. The easiest way to test this is in the Chrome DevTools “Rendering” tab, under “Emulate CSS media feature prefers-reduced-motion”:

Screenshot of Chrome DevTools "prefers reduced motion" Rendering setting

Here it is with motion reduced:

 

As you can see, the elements no longer move around. Instead, they instantly pop in or out of place.

Screen reader and keyboard accessibility

When testing screen reader accessibility, I use four main tools:

  1. NVDA in Firefox on Windows (with SpeechViewer enabled so I can see the text)
  2. VoiceOver in Safari on macOS
  3. Chrome’s Accessibility panel in DevTools (Firefox also has nice accessibility tools! I use them occasionally for a second opinion.)
  4. The axe extension (also available in Lighthouse)

I like testing in actual screen readers, because they sometimes have bugs or differing behavior (just like browsers!). Testing in both NVDA and VoiceOver gives me more confidence that I didn’t mess anything up.

In the following sections, I’ll go over the basic semantic patterns I used in emoji-picker-element, and how those work for screen reader or keyboard users.

Tab buttons and tab panel

For the main emoji picker UI, I decided to use the tab panel pattern with manual activation. This means that each emoji category (“Smileys and emoticons,” “People and body”) is a tab button (role=tab), and the list of emoji underneath is a tab panel (role=tabpanel).

Screenshot showing the emoji categories as tabs and the main grid of emoji as the tab panel

The only thing I had to do to make this pattern work was to add and keydown listeners to move focus left and right between tabs. (Technically I should also add Home and End – I have that as a todo!)

Aside from that, clearly each tab button should be a <button> (so that Enter and Spacebar fire correctly), and it should have an aria-label for assistive text. I also went ahead and added titles that echo the content of aria-label, but I’m considering replacing those since they aren’t accessible to keyboard users. (I.e. title appears when you hover with a mouse, but not when you focus with the keyboard. Plus it sometimes adds extra spoken text in NVDA, which is less than ideal.)

Skin tone dropdown

The skin tone dropdown is modeled on the collapsible dropdown listbox pattern. In other words, it’s basically a fancy <select> element.

Annotated screenshot showing the collapsed skin tone button as a button, and the expanded list of skin tones as a listbox

The button that triggers the dropdown is just a regular <button>, whereas the listbox has role=listbox and its child <button>s have role=option. To implement this, I just used the DevTools to analyze the W3C demo linked above, then tested in both NVDA and VoiceOver to ensure that my implementation had the same behavior as the W3C example.

One pro tip: since the listbox disappears on the blur event, which would happen when you click inside the DevTools itself, you can use the DevTools to remove the blur event and make it easier to inspect the listbox in its “expanded” state.

Screenshot of Chrome DevTools on the W3C collapsible listbox example, showing an arrow pointing at the "remove" button in DevTools next to the "blur" listener in the Event Listeners section

Removing this blur listener may make debugging a bit easier.

Search input and search results

For the search input, I decided to do something a bit clever (which may or may not burn me later!).

By default, the emoji in the tabpanel are simple <button>s aligned in a CSS Grid. They’re given role=menuitem and placed inside of a role=menu container, modeled after the menu pattern.

However, when you start typing in the search input, the tabpanel emoji are instantly replaced with the search results emoji:

 

Visually, this is pretty straightforward, and it aligns with the behavior of most other emoji pickers. I also found it to be good for performance, because I can have one single Svelte #each expression, and let Svelte handle the list updates (as opposed to clearing and re-creating the entire list whenever something changes).

If I had done nothing else, this would have been okay for accessibility. The user can type some text, and then change focus to the search results to see what the results are. But that sounded kind of awkward, so I wanted to do one better.

So instead, I implemented the combobox with listbox popup pattern. When you start typing into the input (which is <input type=search> with role=combobox), the menu with menuitems immediately transforms into a listbox with options instead. Then, using aria-activedescendant and aria-controls, I link the listbox with the combobox, allowing the user to press or down to cycle through the search results. (I also used an aria-describedby to explain this behavior.)

 

In the above video, you can see NVDA reading out the expanded/collapsed state of the combobox, as well as the currently-selected emoji as I cycle through the list by pressing and . (My thanks to Assistiv Labs for providing a free account for OSS testing! This saved me a trip to go boot up my Windows machine.)

So here’s the upshot: from the perspective of a screen reader user, the search input works exactly like a standard combobox with a dropdown! They don’t have to know that the menu/menuitem elements are being replaced at all, or that they’re aligned in a grid.

Now, this isn’t a perfect pattern: for a sighted user, they might find it more intuitive to press and to move horizontally through the grid of emoji (and and to move vertically). However, I found it would be tricky to properly handle the / keys, as the search input itself allows you to move the cursor left and right when it has focus. Whereas the and are unambiguous in this situation, so they’re safe to use.

Plus, this is really a progressive enhancement – mouse or touch users don’t have to know that these keyboard shortcuts exist. So I’m happy with this pattern for now.

Testing

To test this project, I decided to use Jest with testing-library. One of my favorite things about testing-library is that it forces you to put accessibility front-and-center. By design, it makes it difficult to use simple CSS selectors, and encourages you to query by ARIA roles instead.

This made it a bit harder to debug, since I had to inspect the implicit accessibility tree of my component rather than the explicit DOM structure. But it helped keep me honest, and ensure that I was adding the proper ARIA roles during development.

If I had one criticism of this approach, I would say that I find it inherently harder to test using Jest and JSDom rather than a real browser. Rather than having access to the browser DevTools, I had to set debugger statements and use node --inspect-brk to walk through the code in the Chrome Node inspector.

It also wasn’t always clear when one of my ARIA role queries wasn’t working properly. Perhaps when the Accessibility Object Model gains wider adoption, it will become as easy to test the accessibility tree as it is to test CSS selectors.

Conclusion

To get accessibility right, it helps to consider it from the get-go. Like performance, it can be easy to paint yourself into a corner if you quickly build a solution based only on the visuals, without testing the semantics as well.

While building emoji-picker-element, I found that I often had to make tweaks to improve performance (e.g. using one big Svelte #each expression) or accessibility (e.g. changing the aria-hidden and tabindex attributes on elements that are visually hidden but not display: none). I also had to think hard about how to merge the menu component with the listbox to allow the search to work properly.

I don’t consider myself an accessibility expert, so I’m happy to hear from others who have ideas for improving accessibility. For instance, I don’t think the favorites bar is easily reachable to screen reader or keyboard users, and I’m still noodling on how I might improve that. Please feel free to open a GitHub issue if you have any ideas for what I can do better!