performance | Read the Tea Leaves

Posts Tagged ‘performance’

5 Jan

Memory leaks: the forgotten side of web performance

Posted by Nolan Lawson in performance, Web. Tagged: performance. 15 comments

I’ve researched and learned enough about client-side memory leaks to know that most web developers aren’t worrying about them too much. If a web app leaks 5 MB on every interaction, but it still works and nobody notices, then does it matter? (Kinda sounds like a “tree in the forest” koan, but bear with me.)

Even those who have poked around in the browser DevTools to dabble in the arcane art of memory leak detection have probably found the experience… daunting. The effort-to-payoff ratio is disappointingly high, especially compared to the hundreds of other things that are important in web development, like security and accessibility.

So is it really worth the effort? Do memory leaks actually matter?

I would argue that they do matter, if only because the lack of care (as shown by public-facing SPAs leaking up to 186 MB per interaction) is a sign of the immaturity of our field, and an opportunity for growth. Similarly, five years ago, there was much less concern among SPA authors for accessibility, security, runtime performance, or even ensuring that the back button maintained scroll position (or that the back button worked at all!). Today, I see a lot more discussion of these topics among SPA developers, and that’s a great sign that our field is starting to take our craft more seriously.

So why should you, and why shouldn’t you, care about memory leaks? Obviously I’m biased because I have an axe to grind (and a tool I wrote, fuite), but let me try to give an even-handed take.

Memory leaks and software engineering

In terms of actual impact on the business of web development, memory leaks are a funny thing. If you speed up your website by 2 seconds, everyone agrees that that’s a good thing with a visible user impact. If you reduce your website’s memory leak by 2 MB, can we still agree it was worth it? Maybe not.

Here are some of the unique characteristics of memory leaks that I’ve observed, in terms of how they actually fit into the web development process. Memory leaks are:

Low-impact until critical
Hard to diagnose
Trivial to fix once diagnosed

Low-impact…

Most web apps can leak memory and no one will ever notice. Not the user, not the website author – nobody. There are a few reasons for this.

First off, browsers are well aware that the web is a leaky mess and are already ruthless about killing background tabs that consume too much memory. (My former colleague on the Microsoft Edge performance team, Todd Reifsteck, told me way back in 2016 that “the web leaks like a sieve.”) A lot of users are tab hoarders (essentially using tabs as bookmarks), and there’s a tacit understanding between browser and user that you can’t really have 100 tabs open at once (in the sense that the tab is actively running and instantly available). So you click on a tab that’s a few weeks old, boom, there’s a flash of white while the page loads, and nobody seems to mind much.

Second off, even for long-lived SPAs that the user may habitually check in on (think: GMail, Evernote, Discord), there are plenty of opportunities for a page refresh. The browser needs to update. The user doesn’t trust that the data is fresh and hits F5. Something goes wrong because programmers are terrible at managing state, and users are well aware that the old turn-it-off-and-back-on-again solves most problems. All of this means that even a multi-MB leak can go undetected, since a refresh will almost always occur before an Out Of Memory crash.

Screenshot of Chrome browser window with sad tab and "aw snap something went wrong" message

Chrome’s Out Of Memory error page. If you see this, something has gone very wrong.

Third, it’s a tragedy-of-the-commons situation, and people tend to blame the browser. Chrome is a memory hog. Firefox gobbles up RAM. Safari is eating all my memory. For reasons I can’t quite explain, people with 100+ open tabs are quick to blame the messenger. Maybe this goes back to the first point: tab hoarders expect the browser to automatically transition tabs from “thing I’m actively using” to “background thing that is basically a bookmark,” seamlessly and without a hitch. Browsers have different heuristics about this, some heuristics are better than others, and so in that sense, maybe it is the browser’s “fault” for failing to adapt to the user’s tab-hoarding behavior. In any case, the website author tends to escape the blame, especially if their site is just 1 out of 100 naughty tabs that are all leaking memory. (Although this may change as more browsers call out tabs individually in Task Manager, e.g. Edge and Safari.)

…Until critical

What’s interesting, though, is that every so often a memory leak will get so bad that people actually start to notice. Maybe someone opens up Task Manager and wonders why a note-taking app is consuming more RAM than DOTA. Maybe the website slows to a crawl after a few hours of usage. Maybe the users are on a device with low available memory (and of course the developers, with their 32GB workstations, never noticed).

Here’s what often happens in this case: a ticket lands on some web developer’s desk that says “Memory usage is too high, fix it.” The developer thinks to themselves, “I’ve never given much thought to memory usage, well let’s take a stab at this.” At some point they probably open up DevTools, click “Memory,” click “Take snapshot,” and… it’s a mess. Because it turns out that the SPA leaks, has always leaked, and in fact has multiple leaks that have accumulated over time. The developer assumes this is some kind of sudden-onset disease, when in fact it’s a pre-existing condition that has gradually escalated to stage-4.

The funny thing is that the source of the leak – the event listener, the subscriber, whatever – might not even be the proximate cause of the recent crisis. It might have been there all along, and was originally a tiny 1 MB leak nobody noticed, until suddenly someone attached a much bigger object to the existing leak, and now it’s a 100 MB leak that no one can ignore.

Unfortunately to get there, you’re going to have to hack your way through the jungle of the half-dozen other leaks that you ignored up to this point. (We fixed the leak! Oh wait, no we didn’t. We fixed the other leak! Oh wait, there’s still one more…) But that’s how it goes when you ignore a chronic but steadily worsening illness until the moment it becomes a crisis.

Hard to diagnose

This brings us to the second point: memory leaks are hard to diagnose. I’ve already written a lot about this, and I won’t rehash old content. Suffice it to say, the tooling is not really up to the task (despite some nice recent innovations), even if you’re a veteran with years of web development experience. Some gotchas that tripped me up include the fact that you have to ignore WeakMaps and circular references, and that the DevTools console itself can leak memory.

Oh and also, browsers themselves can have memory leaks! For instance, see these ResizeObserver/IntersectionObserver leaks in Chromium, Firefox, and Safari (fixed in all but Firefox), or this Chromium leak in lazy-loading images (not fixed), or this discussion of a leak in Safari. Of course, the tooling will not help you distinguish between browser leaks and web page leaks, so you just kinda have to know this stuff. In short: good luck!

Even with the tool that I’ve written, fuite, I won’t claim that we’ve reached a golden age of memory leak debugging. My tool is better than what’s out there, but that’s not saying much. It can catch the dumb stuff, such as leaking event listeners and DOM nodes, and for the more complex stuff like leaking collections (Arrays, Maps, etc.), it can at least point you in the right direction. But it’s still up to the web developer to decide which leaks are worth chasing (some are trivial, others are massive), and to track them down.

I still believe that the browser DevTools (or perhaps professional testing tools, such as Cypress or Sentry), should be the ones to handle this kind of thing. The browser especially is in a much better position to figure out why memory is leaking, and to point the web developer towards solutions. fuite is the best I could do with userland tooling (such as Puppeteer), but overall I’d still say we’re in the Stone Age, not the Space Age. (Maybe fuite pushed us to the Bronze Age, if I’m being generous to myself.)

Trivial to fix once diagnosed

Here’s the really surprising thing about memory leaks, though, and perhaps the reason I find them so addictive and keep coming back to them: once you figure out where the leak is coming from, they’re usually trivial to fix. For instance:

You called addEventListener but forgot to call removeEventListener.
You called setInterval, but forgot to call clearInterval when the component unloaded.
You added a DOM node, but forgot to remove it when the page transitions away.
Etc.

You might have a multi-MB leak, and the fix is one line of code. That’s a massive bang-for-the-buck! That is, if you discount the days of work it might have taken to find that line of code.

This is where I would like to go with fuite. It would be amazing if you could just point a tool at your website and have it tell you exactly which line of code caused a leak. (It’d be even better if it could open a pull request to fix the leak, but hey, let’s not get ahead of ourselves.)

I’ve taken some baby steps in this direction by adding stacktraces for leaking collections. So for instance, if you have an Array that is growing by 1 on every user interaction, fuite can tell you which line of code actually called Array.push(). This is a huge improvement over v1.0 of fuite (which just told you the Array was leaking, but not why), and although there are edge cases where it doesn’t work, I’m pretty proud of this feature. My goal is to expand this to other leaks (event listeners, DOM nodes, etc.), although since this is just a tool I’m building in my spare time, we’ll see if I get to it.

Screenshot of console output showing leaking collections and stacktraces for each

fuite showing stacktraces for leaking collections.

After releasing this tool, I also learned that Facebook has built a similar tool and is planning to open-source it soon. That’s great! I’m excited to see how it works, and I’m hoping that having more tools in this space will help us move past the Stone Age of memory leak debugging.

Conclusion

So to bring it back around: should you care about memory leaks? Well, if your boss is yelling at you because customers are complaining about Out Of Memory crashes, then yeah, you absolutely should. Are you leaking 5 MB, and nobody has complained yet? Well, maybe an ounce of prevention is worth a pound of cure in this case. If you start fixing your memory leaks now, it might avoid that crisis in the future when 5 MB suddenly grows to 50 MB.

Alternatively, are you leaking a measly ~1 kB because your routing library is appending some metadata to an Array? Well, maybe you can let that one slide. (fuite will still report this leak, but I would argue that it’s not worth fixing.)

On the other hand, all of these leaks are important in some sense, because even thinking about them shows a dedication to craftsmanship that is (in my opinion) too often lacking in web development. People write a web app, they throw something buggy over the wall, and then they rewrite their frontend four years later after users are complaining too much. I see this all the time when I observe how my wife uses her computer – she’s constantly telling me that some app gets slower or buggier the longer she uses it, until she gives up and refreshes. Whenever I help her with her computer troubles, I feel like I have to make excuses for my entire industry, for why we feel it’s acceptable to waste our users’ time with shoddy, half-baked software.

Maybe I’m just a dreamer and an idealist, but I really enjoy putting that final polish on something and feeling proud of what I’ve created. I notice, too, when the software I use has that extra touch of love and care – and it gives me more confidence in the product and the team behind it. When I press the back button and it doesn’t work, I lose a bit of trust. When I press Esc on a modal and it doesn’t close, I lose a bit of trust. And if an app keeps slowing down until I’m forced to refresh, or if I notice the memory steadily creeping up, I lose a bit of trust. I would like to think that fixing memory leaks is part of that extra polish that won’t necessarily win you a lot of accolades, but your users will subtly notice, and it will build their confidence in your software.

Thanks to Jake Archibald and Todd Reifsteck for feedback on a draft of this post.

17 Dec

Introducing fuite: a tool for finding memory leaks in web apps

Posted by Nolan Lawson in performance, Web. Tagged: memory, performance. 18 comments

Debugging memory leaks in web apps is hard. The tooling exists, but it’s complicated, cumbersome, and often doesn’t answer the simple question: Why is my app leaking memory?

Because of this, I’d wager that most web developers are not actively monitoring for memory leaks. And of course, if you’re not testing something, it’s easy for bugs to slip through.

Via Wikimedia Commons

When I first started looking into memory leaks, I assumed it was a rare thing. How could JavaScript – a language with an automatic garbage collector – be a big source of memory leaks? But the more I learned, the more I suspected that memory leaks were actually quite common in Single Page Apps (SPAs) – it’s just that nobody is testing for it!

Since most web developers aren’t fiddling with the Chrome memory tools for the fun of it, they probably won’t notice a leak until the browser tab crashes with an Out Of Memory error, or the page slows down, or someone happens to open up the Task Manager and notice that a website is using many megabytes (or even gigabytes!) of memory. But at that point, it’s gotten bad enough that there may be multiple leaks on the same page.

I’ve written about memory leaks in the past, but my advice basically boils down to: “Use the Chrome DevTools, follow these dozen tedious steps, and then maybe you can figure out why your page is leaking.” This is not a great developer experience, and I’m sure many readers just shook their heads in despair and moved on. It would be much better if a tool could find memory leaks automatically.

That’s why I wrote fuite (French for “leak”). fuite is a CLI tool that you can point at any URL, and it will analyze the page for memory leaks:

npx fuite https://example.com

That’s it! By default, it assumes that the site is a client-rendered SPA, and it will crawl the page for internal links (such as /about or /contact). Then, for each link, it runs the following steps:

Click the link
Press the browser back button
Repeat to see if memory grows

If fuite finds any leaks, it will show which objects are suspected of causing the leak:

Test         : Go to /foo and back
Memory change: +10 MB
Leak detected: Yes

Leaking objects:

| Object            | # added | Retained size increase |
| ----------------- | ------- | ---------------------- |
| HTMLIFrameElement | 1       | +10 MB                 |

Leaking event listeners:

| Event        | # added | Nodes  |
| ------------ | ------- | ------ |
| beforeunload | 2       | Window |

Leaking DOM nodes:

DOM size grew by 6 node(s)

To do this, fuite uses the basic strategy outlined in my blog post. It will launch Chrome, run some scenario n number of times (7 by default) and see if any objects are leaking a multiple of n times (7, 14, 21, etc.).

fuite will also analyze any Arrays, Objects, Maps, Sets, event listeners, and the overall DOM to see if any of those are leaking. For instance, if an Array grows by exactly 7 after 7 iterations, then it’s probably leaking.

Testing real-world websites

Somewhat surprisingly, the “basic” scenario of clicking internal links and pressing the back button is enough to find memory leaks in many SPAs. I tested fuite against the home pages for 10 popular frontend frameworks, and found leaks in all of them:

Site	Leak detected	Internal links	Average growth	Max growth
Site 1	yes	8	27.2 kB	43 kB
Site 2	yes	10	50.4 kB	78.9 kB
Site 3	yes	27	98.8 kB	135 kB
Site 4	yes	8	180 kB	212 kB
Site 5	yes	13	266 kB	1.07 MB
Site 6	yes	8	638 kB	1.15 MB
Site 7	yes	7	1.37 MB	2.25 MB
Site 8	yes	15	3.49 MB	4.28 MB
Site 9	yes	43	5.57 MB	7.37 MB
Site 10	yes	16	14.9 MB	186 MB

In this case, “internal links” refers to the number of internal links tested, “average growth” refers to the average memory growth for every link (i.e. clicking it and then pressing the back button), and “max growth” refers to whichever internal link was leaking the most. Note that these numbers don’t include one-time setup costs, as fuite does one preflight iteration before the normal 7 iterations.

To confirm these results yourself, you can use the Chrome DevTools Memory tab. Here is a screenshot of the worst-performing site from my set, where I click a link, press the back button, take a heap snapshot, and repeat:

Screenshot of the Chrome DevTools memory heapsnapshots list, showing memory starting at 18.7MB and increasing by roughly 6MB every iteration until reaching 41 MB on iteration 5

On this particular site, memory grows by about 6 MB every time you click a link and go back.

To avoid naming and shaming, I haven’t listed the actual websites. The point is just to show a representative sample of some popular SPAs – the authors of those websites are free to run fuite themselves and track down these leaks. (Please do!)

Caveats

Note, though, that not every leak in an SPA is an egregious problem that needs to be addressed. SPAs need to, for example, maintain the focus and scroll state to properly support accessibility, which means that there may be some small metadata that is stored for every page navigation. fuite will dutifully report such leaks (because they are leaks), but it’s up to the developer to decide if a tiny leak is worth chasing or not.

Some memory growth may also be due to browser-internal changes (such as JITing), which the web page can’t really control. So the memory growth numbers are an imperfect measure of what you stand to gain by fixing leaks – it could very well be that a few kBs of growth are unavoidable. (Although fuite tries to ignore browser-internal growth, and will only say “leaks detected” if there is actionable advice for the web developer.)

In rare cases, some memory growth may also be due to outright browser bugs. While analyzing the sites above, I actually found one (Site #4) that seems to be suffering from this Chrome bug due to <img loading="lazy"> not being unloaded. Unfortunately it’d be hard for fuite to detect browser bugs, so if you’re mystified by a leak, it’s good to cross-check against other browsers!

Also note that it’s almost impossible for a Multi-Page App (MPA) to leak, because the browser clears memory on every page navigation. (Assuming no browser bugs, of course.) During my testing, I found two frontend frameworks whose home pages were MPAs, and unsurprisingly, fuite couldn’t find any leaks in them. These were excluded from the results above.

Memory leaks are more of a concern for SPAs, where memory isn’t cleared automatically on each navigation. fuite is primarily designed for SPAs, although you can run it on MPAs too.

fuite currently only measures the JavaScript heap memory in the main frame of the page, so cross-origin iframes, Web Workers, and Service Workers are not measured. Something like performance.measureUserAgentSpecificMemory() would be more accurate, but it’s only available in cross-origin isolated contexts, so it’s not practical for a general-purpose tool right now.

Other memory leak scenarios

The “crawl for internal links” scenario is just the default one – you can also build your own. fuite is built on top of Puppeteer, so for whatever scenario you want to test, you essentially just need to write a Puppeteer script to tell the browser what to do. Some common scenarios you might test are:

Open a modal dialog and then close it
Hover over an element to show a tooltip, then mouse away to dismiss it
Scroll through an infinite-loading list, then navigate away and back
Etc.

In each of these scenarios, you would expect memory to be the same before and after. But of course, it’s not always so simple with web apps! You may be surprised how many of your dialogs and tooltips are harboring memory leaks.

To analyze leaks, fuite captures heap snapshot files, which you can load in the Chrome DevTools to inspect. It also has a --debug mode that you can use for more fine-grained analysis: stepping through the test as it’s running, debugging the browser in real-time, analyzing the leaking objects, etc.

Under the hood, fuite is a fairly basic tool, and I won’t claim that it can do 100% of the work of fixing memory leaks. There is still the human component of figuring out why your objects were allocated and retained, and then finding a reasonable fix. But my goal is to automate ~95% of the work, so that it actually becomes achievable to fix memory leaks in web apps.

You can find fuite on GitHub. Happy leak hunting!

Update: I made a video tutorial showing how to debug memory leaks with fuite.

5 Dec

One weird trick to improve your website’s performance

Posted by Nolan Lawson in performance, Web. Tagged: performance. 3 comments

Every so often, I come across a web performance post from what I like to call the “one weird trick” genre. It goes something like this:

“I improved my page load time by 50% by adding one line of CSS!”

“It’s 2x faster to use this JavaScript API than this other one!”

The thing is, I love a good performance post. I love when someone finds some odd little unexplored corner of browser performance and shines a light on it. It might actually provide some good data that can influence framework authors, library developers, and even browser vendors to improve their performance.

But more often than not, the “one weird trick” genre drives me nuts, because of what’s not included in the post:

Did you test on multiple browsers?
Did you profile to try to understand why something is slower or faster?
Did you publish your benchmark so that others can verify your results?

That’s why I wrote “How to write about web performance”, where I tried to summarize everything that I think makes for a great web perf post. But of course, not everyone reads my blog religiously (how dare they?), so the “one weird trick” genre continues unabated.

Look, I get it. Writing about performance is hard. And we’re not all experts. I’ve made the same mistakes myself, in posts like “High performance web worker messages” (2016) – where I found the “one weird trick” that it’s faster to stringify an object before sending it to a web worker. Of course this makes little sense (the browser should be able to serialize the object faster than you can do it yourself), and Surma has demonstrated that there’s no need to do this stringify dance in modern versions of Chrome. (As I’ve said before: if you’re not wrong about web perf today, you’ll be wrong tomorrow when browsers change!)

That said, I do occasionally find a post that really exemplifies what’s great about the web perf genre. For instance, this post by Eoin Hennessy about improving Webpack performance really ticks all the boxes. The author wasn’t satisfied with finding “one weird trick” – they had to understand why the trick worked. So they actually went to the trouble of building Node from source (!) to find the true root cause, and they even submitted a patch to Webpack to fix it.

A post like this, like a good mystery novel, has everything that makes for a satisfying story: the problem, the search, the resolution, the ending. Unlike the “one weird trick” posts, this one doesn’t leave me craving more. Instead, it leaves me feeling like I truly learned something about how browser engines work.

So if you’ve found “one weird trick,” that’s great! There might actually be something really interesting there. But unless you do the extra research, it’s hard to say more than just “Well, this technique worked for me, on my website, in Chrome, in this scenario…” (etc.). If you want to extrapolate from your results to something more widely-applicable, you have to put in the work.

So here are some things you can do. Test in multiple browsers. File a browser bug if one is slower than the others. Ask around if you know any web perf experts or folks who work at browser vendors. Take a performance profile. And if you put in just a bit of extra effort, you might find more than “one weird trick” – you might find a valuable learning opportunity for web developers, browser vendors, or anyone interested in how the web works.

12 Sep

How to write about web performance

Posted by Nolan Lawson in performance, Web. Tagged: performance. 3 comments

I’ve been writing about performance for a long time. I like to think I’ve gotten pretty good at it, but sometimes I look back on my older blog posts and cringe at the mistakes I made.

This post is an attempt to distill some of what I’ve learned over the years to offer as advice to other aspiring tinkerers, benchmarkers, and anyone curious about how browsers actually work when you put them to the test.

Why write about web performance?

The first and maybe most obvious question is: why bother? Why write about web performance? Isn’t this something that’s better left to the browser vendors themselves?

In some ways, this is true. Browser vendors know how their product actually works. If some part of the system is performing slowly, you can go knock on the door of your colleague who wrote the code and ask them why it’s slow. (Or send them a DM, I suppose, in our post-pandemic world.)

But in other ways, browser vendors really aren’t in a good position to talk frankly about web performance. Browsers are in the business of selling browsers. Web performance claims are often used in marketing, with claims like “Browser X is 25% faster than Browser Y,” which might need to get approved by the marketing department, the legal department, not to mention various owners and stakeholders…

And that’s only if your browser is the fast one. If you run a benchmark and it turns out that your browser is the slow one, or it’s a mixed bag, then browser vendors will keep pretty quiet about it. This is why whenever a browser vendor releases a new benchmark, surprise surprise! Their browser wins. So the browser vendors’ hands are pretty tied when it comes to accurately writing about how their product actually works.

Of course, there are exceptions to this rule. Occasionally you will find someone’s personal blog, or a comment on a public bugtracker, which betrays that their browser is actually not so great in some benchmark. But nobody is going to go out of their way to sing from the mountaintops about how lousy their browser is in a benchmark. If anything, they’ll talk about it after they’ve done the work to make things faster, meaning the benchmark already went through several rounds of internal discussion, and was maybe used to evaluate some internal initiative to improve performance – a process that might last years before the public actually hears about it.

Other times, browser vendors will release a new feature, announce it with some fanfare, and then make vague claims about how it improves performance without delving into any specifics. If you actually look into these claims, though, you might find that the performance improvement is pretty meager, or it only manifests in a specific context. (Don’t expect the team who built the feature to eagerly tell you this, though.)

By the way, I don’t blame the browser vendors at all for this situation. I worked on the performance team at Microsoft Edge (back in the EdgeHTML days, before the switch to Chromium), and I did the same stuff. I wrote about scrolling performance because, at the time, our engine was the best at scrolling. I wrote about input responsiveness after we had already made it faster. (Not before! Definitely not before.) I designed benchmarks that explicitly showed off the improvements we had made. I worked on marketing videos that showed our browser winning in experiments where we already knew we’d win.

And if you think I’m picking on Microsoft, I could easily find examples of the other browser vendors doing the same thing. But I choose not to, because I’d rather pick on myself. (If you work for a browser vendor and are reading this, I’m sure some examples come to mind.)

Don’t expect a car company to tell you that their competitor has better mileage. Don’t expect them to admit that their new model has a lousy safety rating. That’s what Consumer Reports is for. In the same way, if you don’t work at a browser vendor (I don’t, anymore), then you are blessedly free to say whatever you want about browsers, and to honestly assess their claims and compare them to each other in fair, unbiased benchmarks.

Plus, as a web developer, you might actually be in a better position to write a benchmark that is more representative of real-world code. Browser developers spend most of their day writing C, C++, and Rust, not necessarily HTML, CSS, and JavaScript. So they aren’t always familiar with the day-to-day concerns of working web developers.

The subtle science of benchmarking

Okay, so that was my long diatribe about why you’d bother writing about web performance. So how do you actually go about doing it?

First off, I’d say to write the benchmark before you start writing your blog post. Your conclusions and high-level takeaways may be vastly different depending on the results of the benchmark. So don’t assume you already know what the results are going to be.

I’ve made this mistake in the past! Once, I wrote an entire blog post before writing the benchmark, and then the benchmark completely upended what I was going to say in the post. I had to scrap the whole thing and start from scratch.

Benchmarking is science, and you should treat it with the seriousness of a scientific endeavor. Expect peer review, which means – most importantly! – publish your benchmark publicly and provide instructions for others to test it. Because believe me, they will! I’ve had folks notify me of a bug in my benchmark after I published a post, so I had to go back and edit it to correct the results. (This is annoying, and embarrassing, but it’s better than willfully spreading misinformation.)

Since you may end up running your benchmark multiple times, and even generating your charts and tables multiple times, make an effort to streamline the process of gathering the data. If some step is manual, try to automate it.

These days, I like Tachometer because it automates a lot of the boring parts of benchmarking – launching a browser, taking a measurement, taking multiple measurements, taking enough measurements to achieve statistical significance, etc. Unfortunately it doesn’t automate the part where you generate charts and graphs, but I usually write small scripts to output the data in a format where I can easily import it into a spreadsheet app.

This also leads to an important point: take accurate measurements. A common mistake is to use Date.now() – instead, you should use performance.now(), since this gives you a high-resolution timestamp. Or even better, use performance.mark() and performance.measure() – these are also high-resolution, but with the added benefit that you can actually see your measurements laid out visually in the Chrome DevTools. This is a great way to double-check that you’re actually measuring what you think you’re measuring.

As mentioned above, it’s also a good idea to take multiple measurements. Benchmarks will always show variance, and you can prove just about anything if you only take one sample. For best results, I’d say take at least three measurements and then calculate the median, or better yet, use a tool like Tachometer that will use a bunch of fancy statistics to find the ideal number of samples.

Humility

Writing about web performance is really hard, so it’s important to be humble. Browsers are incredibly complex, so you have to accept that you will probably be wrong about something. And if you’re not wrong, then you will be wrong in 5 years when browsers update their engines to make your results obsolete.

There are a few ways you can limit your likelihood of wrongness, though. Here are a few strategies that have worked well for me in the past.

First off, test in multiple browser engines. This is a good way to figure out if you’ve identified a quirk in a particular browser, or a fundamental truth about how the web works. Heck, if you write a benchmark where one browser performs much more poorly than the other ones, then congratulations! You’ve found a browser bug, and now you have a reproducible test case that you can file on that browser.

(And if you think they won’t be grateful or won’t fix the problem, then prepare to be surprised. I’ve filed several such bugs on browsers, and they usually at least acknowledge the issue if not outright fix it within a few releases. Sometimes browser developers are grateful when you file a bug like this, because they might already know something is a problem, but without bug reports from customers, they weren’t able to convince management to prioritize it.)

Second, reduce the variables. Test on the same hardware, if possible. (For testing the three major browser engines – Blink, Gecko, and WebKit – this sadly means you’re always testing on macOS. Do not trust WebKit on Windows/Linux; I’ve found its performance to be vastly different from Safari’s.) Browsers can differ based on whether the device is plugged into power or has low battery, so make sure that the device is plugged in and charged. Don’t run other applications or browser windows or tabs while you’re running the benchmark. If networking is involved, use a local server if possible to eliminate latency. (Or configure the server to always respond with a particular delay, or use throttling, as necessary.) Update all the browsers before running the test.

Third, be aware of caching. It’s easy to fool yourself if you run 1,000 iterations of something, and it turns out that the last 999 iterations are all cached. JavaScript engines have JIT compilers, meaning that the first iteration can be different from the second iteration, which can be different from the third, etc. If you think you can figure out something low-level like “Is const faster than let?”, you probably can’t, because the JIT will outsmart you. Browsers also have bytecode caching, which means that the first page load may be different from the second, and the second may even be different from the third. (Tachometer works around this by using a fresh browser tab each iteration, which is clever.)

My point here is that, for all of your hard work to do rigorous, scientific benchmarking, you may just turn out to be wrong. You’ll publish your blog post, you’ll feel very proud of yourself, and then a browser engineer will contact you privately and say, “You know, it only works like this on a 60FPS monitor.” Or “only on Intel CPUs.” Or “only on macOS Big Sur.” Or “only if your DOM size is greater than 1,000 and the layer depth is above 10 and you’re using a trackball mouse and it’s a Tuesday and the moon is in the seventh house.”

There are so many variables in browser performance, and you can’t possibly capture them all. The best you can do is document your methodology, explain what your benchmark doesn’t test, and try not to make grand sweeping statements like, “You should always use const instead of let; my benchmark proves it’s faster.” At best, your benchmark proves that one number is higher than another in your very specific benchmark in the very specific way you tested it, and you have to be satisfied with that.

Conclusion

Writing about browser performance is hard, but it’s not fruitless. I’ve had enough successes over the years (and enough stubbornness and curiosity, I guess) that I keep doing it.

For instance, I wrote about how bundlers like Rollup produced faster JavaScript files than bundlers like Webpack, and Webpack eventually improved its implementation. I filed a bug on Firefox and Chrome showing that Safari had an optimization they didn’t, and both browsers fixed it, so now all three browsers are fast on the benchmark. I wrote a silly JavaScript “optimizer” that the V8 team used to improve their performance.

I bring up all these examples less to brag, and more to show that it is possible to improve things by simply writing about them. In all three of the above cases, I actually made mistakes in my benchmarks (pretty dumb ones, in some cases), and had to go back and fix it later. But if you can get enough traction and get the right people’s attention, then the browsers and bundlers and frameworks can change, without you having to actually write the code to do it. (To this day, I can’t write a line of C, C++, or Rust, but I’ve influenced browser vendors to write it for me, which aligns with my goal of spending more time playing Tetris than learning new programming languages.)

My point in writing all this is to try to convince you (if you’ve read this far) that it is indeed valuable for you to write about web performance. Even if you don’t feel like you really understand how browsers work. Even if you’re just getting started as a web developer. Even if you’re just curious, and you want to poke around at browsers to see how they tick. At worst you’ll be wrong (which I’ve been many times), and at best you might teach others about performant programming patterns, or even influence the ecosystem to change and make things better for everyone.

There are plenty of upsides, and all you need is an HTML file and a bit of patience. So if that sounds interesting to you, get started and have fun benchmarking!

1 Jan

Things I’ve been wrong about, things I’ve been right about

Posted by Nolan Lawson in Web. Tagged: browsers, performance. 1 comment

The end of the year is a good time for reflection, and this year I’m going to do something a bit different. I’d like to list some of the calls I’ve made over the years, and how well those predictions have turned out.

So without further ado, here’s a list of things I’ve been wrong about or right about over the years.

Quick links:

Wrong: web workers will take over the world
Wrong: Safari is the new IE
Right: developer experience is trumping user experience
Right: I’m better off without a Twitter account
Right: the cost of small modules
Mixed: progressive enhancement isn’t dead, but it smells funny

Wrong: web workers will take over the world

Around 2015, I got really excited by web workers. I gave talks, I wrote a blog post, and I wrote an app that got a fair amount of attention. Unfortunately it turned out web workers were not going to take over the world in the way I imagined.

My enthusiasm for web workers mostly came from my experience with Android development. In Android development, if you don’t want your app to be slow, you move work to a background thread. After I became a web developer, I discovered it was suddenly very hard to make apps that weren’t janky. Oh, the web browser only has one thread? Well, there’s your problem.

What I didn’t know at the time, though, was that browsers already had a lot of tricks for moving work off the main thread; they’re just not necessarily very obvious, or directly exposed to web developers. You can see my ignorance in this video, where I’m purportedly showing the performance advantages of my worker-powered Pokémon app by scrolling the screen on mobile Chrome.

As I learned later, though, scrolling runs off-main-thread in modern browsers (and more importantly, composited). So the only thing that’s going to make this scrolling smoother is to not block it with unnecessary touchstart/touchmove listeners, or for the Chrome team to improve their scrolling implementation (as in fact, they have been doing). There are also differences between subscrollers and main-document scrollers, as I learned later.

All of these things are non-obvious to web developers, because they’re not directly exposed in an API. So in my ignorance, I pointed to the one case where threading is exposed in a web API, i.e. web workers.

While it is true, though, that blocking the main thread is a major cause of slowdowns in web pages, web workers aren’t the panacea I imagined. The reasons are laid out very succinctly by Evan You in this talk, but to summarize his points: moving work from the main thread to a background worker is very difficult, and the payoff is not so great.

The main reason it’s difficult is that you always have to come back to the main thread to do work on the DOM anyway. This is what libraries like worker-dom do. Also, some APIs can only be invoked synchronously on the main thread, such as getBoundingClientRect. Furthermore, as pointed out by Pete Hunt, web workers cannot handle preventDefault or stopPropagation (e.g. in a click handler), because those must be handled synchronously.

So on the one hand, you can’t just take existing web code and port it to a web worker; there are some things that have to be tweaked, and other things that are just impossible. Then on the other hand, moving things to a worker creates its own costs. The cost of cloning data between threads can be expensive (note: to be fair, Chrome has improved their cloning performance since I wrote that post). There is also a built-in latency when sending messages between the two threads, so you don’t necessarily want to pay that round-trip cost for every interaction. Also, some work has to be done on the main thread anyway, and it’s not guaranteed that those costs are the small ones.

Since then, I’ve come to believe that the best way to avoid the cost of main thread work (such as virtual DOM diffing in a library like React) is not by moving it to a background thread, but rather by skipping it entirely. SvelteJS does this, as explained in this talk by Rich Harris. Also, you can use APIs like requestIdleCallback to delay work on the main thread. Future APIs like hasPendingUserInput may help as well.

Of course, I’m not saying that web workers should never be used. For long-running computations that don’t rely on the DOM, they’re definitely useful. And perhaps sometime in the future it will become more viable to run your “entire app” in a worker thread, as sketched out here by Jason Miller. APIs like SharedArrayBuffer, blöcks, and some kind of asynchronous preventDefault/stopPropagation may tip the balance in web workers’ favor. But for now I’ll say that I was wrong on this call.

Wrong: Safari is the new IE

“Safari is the new IE” is probably my most-read post of all time, since it got picked up by Ars Technica. Unfortunately I’ve learned a lot about how the browser industry works since I wrote it, and nowadays I regard it with a bit of embarrassment.

As it turns out, Safari is not really the new IE. At the time I wrote that post (2015), the WebKit team was dragging their feet a bit, but since then they’ve really picked up the pace. If you look at metrics like HTML5Test, CanIUse, or ES2015+ compatibility tables, you’ll see they’ve made a lot of progress since 2015. They’re still behind Chrome and Firefox, but they’re giving Edge a run for its money (although Edge is now switching to Chromium, so that’s less relevant).

Also, the WebKit team does a lot of work that is less obvious to web developers, but which they still deserve credit for. Safari is a beast when it comes to performance, and they often set the standard that other engines aim to beat. It’s no surprise that the Speedometer benchmark came from the WebKit team (with Safari originally winning), and quickly became a point of focus for Chrome, Edge, and Firefox. The MotionMark and JetStream benchmarks also originally came from WebKit.

The WebKit team also does some interesting privacy work, including intelligent tracking protection and double-keying of cross-origin storage. (I.e. if example.com stores data in an iframe inside of another website, that data will not be shared with example.com itself or other example.com iframes on other websites. This limits this ability of sites to do third-party tracking.)

To be clear, though, I don’t regret writing that blog post. It was a cry of anger from a guy who was tired of dealing with a lot of IndexedDB bugs, which the WebKit team eventually got around to fixing. Heck, I’ve been told that my blog post may have even motivated Apple to make those fixes, and to release excellent developer-facing features like Safari Technology Preview. So kudos, WebKit team: you proved me wrong!

In any case, it’s unlikely that we’ll ever have a situation like IE6 again, with one browser taking up 95% of all usage. The best contender for that title is currently Chrome, and although it ticks some of the IE6 boxes (outsized influence on the ecosystem, de-facto standardization of implementation details), it doesn’t tick some other ones (lack of investment from the company building it, falling behind on web standards). The state of browser diversity is certainly worrying, though, which makes it all the more important to support non-Chromium browsers like Safari and Firefox, and give them credit for the good work they’re doing.

So in that regard, I am sorry for creating a meme that still seems to stick to this day.

Right: developer experience is trumping user experience

This blog post didn’t get a lot of attention when I published it in early 2016, but I think I tapped into something that was happening in the web community: web developers were starting to focus on obscure developer-facing features of frontend frameworks rather than tangible benefits for end-users.

Since then, this point has been articulated particularly well by folks on the Chrome team and at Google, such as Malte Ubl (“Developer experience and user experience,” 2017) and Alex Russell (“The developer experience bait-and-switch,” 2018). Paul Lewis also touched on it a bit in late 2015 in “The cost of frameworks”.

Developer experience (DX) vs user experience (UX) is now a topic of hot debate among web developers, and although I may not have put my finger on the problem very well, I did start writing about it in early 2016. So I’ll chalk this up as something I was right about.

Right: I’m better off without a Twitter account

I deleted my Twitter account a little over a year ago, and I have no regrets about that.

Twitter has made some good moves in the past year, such as as bringing back the chronological timeline, but overall it’s still a cesspool of negativity, preening, and distractions. Also, I don’t believe any one company should have a monopoly on microblogging, so I’m putting my money where my mouth is by denying them my attention and ad revenue.

These days I use Mastodon and RSS, and I’m much happier. Mastodon in particular has served as a kind of nicotine patch for my Twitter addiction, and for that I’m grateful.

The fediverse does have some of the same negative characteristics as Twitter (brigading, self-righteousness, lack of nuance), but overall it’s much smaller and quieter than Twitter, and more importantly less addictive, so I use social media less these days than I used to. I tend to spend more time on my hobbies instead, one of which is (ironically) building a Mastodon client!

Right: the cost of small modules

“The cost of small modules” was one of my most-read posts of 2016, and in terms of the overall conclusions, I was right. JavaScript compilation and initial execution are expensive, as has been covered quite well by Addy Osmani in “The cost of JavaScript”.

Furthermore, a lot of the bloat was coming from the bundlers themselves. In the post, I identified Browserify and Webpack as the main offenders, with Closure Compiler and Rollup showing how to do it right. Since I wrote that post, though, Webpack and Browserify have stepped up their game, and now module concatenation is a standard practice for JavaScript bundlers.

One thing I didn’t understand at the time was why JavaScript compilation was so expensive for the “function-wrapping” format of bundlers like Webpack and Browserify. I only realized it later when researching some quirks about how JavaScript engines parse function bodies. The conclusions from that research were interesting, but the larger takeaway of “only include the code you need” was the important one.

Mixed: progressive enhancement isn’t dead, but it smells funny

For better or worse, progressive enhancement doesn’t seem to be doing very well these days. In retrospect, this blog post was more about Twitter shaming (see above for my thoughts on Twitter), but I think the larger point about progressive enhancement losing its cachet is right.

As we slowly enter a world where there is one major browser engine (Chromium), which is frequently updated and leading on web standards, supporting old or weird browsers just becomes less important. Developers have already voted with their feet to target mostly Chrome and Chrome derivatives, putting pressure on other browsers to either adopt Chromium themselves or else bleed users (and therefore relevance). It’s a self-perpetuating cycle – the less developers care about progressive enhancement, the less it matters.

I also believe the term “progressive enhancement” has been somewhat co-opted by the Chrome devrel team as as euphemism for giving the best experience to Chrome and a poorer experience to “older browsers” (aka non-Chrome browsers). It’s a brilliant re-branding that feeds into web developers’ deepest wish, which is to live in a Chrome-only world where they only have to focus on Chrome.

That’s not to say progressive enhancement is without its virtues. Insofar as it encourages people to actually think about accessibility, performance, and web standards, it’s a good thing. But these days it’s becoming less about “build with HTML, layer on CSS, sprinkle on JavaScript” and more about “support a slightly older version of Chrome, target the latest version of Chrome.”

The other point I made in that blog post, which was about JavaScript-heavy webapps being better for the “next billion” Internet users, may turn out to be wrong. I’m not sure. Static websites are certainly easier on the user’s battery, and with a Service Worker they can still have the benefits of offline capabilities.

Perhaps with the new Portals proposal, we won’t even need to build SPAs to have fancy transitions between pages. I have a hunch that SPAs are being overused these days, and that user experience is suffering as a consequence, but that’s another bet that will have to be evaluated at a later date.

Conclusions

So that’s all for my roundup of bad takes, good takes, and the stuff in between. Hope you found it interesting, and happy 2019!

18 Nov

Scrolling the main document is better for performance, accessibility, and usability

Posted by Nolan Lawson in Web. Tagged: mobile web, performance. 3 comments

When I first wrote Pinafore, I thought pretty deeply about some aspects of the scrolling, but not enough about others.

For instance, I implemented a custom virtual list in Svelte.js, as well as an infinite scroll to add more content as you scroll the timeline. When it came to where to put the scrollable element, though, I didn’t think too hard about it.

A fixed-position nav plus a scrollable section below. Seems simple, right? I went with what seemed to me like an obvious solution: an absolute-position element below the nav bar, which could be scrolled up and down.

Then Sorin Davidoi opened this issue, pointing out that using the entire document (i.e. the <body>) as the scrolling element would allow mobile browsers to hide the address bar while scrolling down. I wasn’t aware of this, so I went ahead and implemented it.

This indeed allowed the URL bar to gracefully shrink or hide, across a wide range of mobile browsers. Here’s Safari for iOS:

And Chrome for Android:

And Firefox for Android:

As it turned out, though, this fix solved more than just the address bar problem – it also improved the framerate of scrolling in Chrome for Android. This is a longstanding issue in Pinafore that had puzzled me up till now. But with the “document as scroller” change, the framerate is magically improved:

Of course, as the person who wrote one of the more comprehensive analyses of cross-browser scrolling performance, this really shouldn’t have surprised me. My own analysis showed that some browsers (notably Chrome) hadn’t optimized subscrolling to nearly the same degree as main-document scrolling. Somehow, though, I didn’t put two-and-two together and realize that this is why Pinafore’s scrolling was janky in Chrome for Android. (It was fine in Firefox for Android and Safari for iOS, which is also perhaps why I didn’t feel pressed to fix it.)

In retrospect, the Chrome Dev Tools’s “scrolling performance issues” tool should have been enough to tip me off, but I wasn’t sure what to do when it says “repaints on scroll.” Nor did I know that moving the scrolling element to the main document would do the trick. Most of the advice online suggests using will-change: transform, but in this case it didn’t help. (Although in the past, I have found that will-change can improve mobile Chrome’s scrolling in some cases.)

Screenshot of Pinafore with a blue overlay saying "repaints on scroll."

The “repaints on scroll” warning. This is gone now that the scrollable element is the document body.

As if the mobile UI and performance improvements weren’t enough, this change also improved accessibility. When users first open Pinafore, they often want to start scrolling by tapping the “down” or “PageDown” key on the keyboard. However, this doesn’t work if you’re using a subscroller, because unlike the main document, the subscroller isn’t focused by default. So we had to add custom behavior to focus the scrollable element when the page first loads. Once I got rid of the subscroller, though, this code could be removed.

Another nice fix was that it’s no longer necessary to add -webkit-overflow-scrolling: touch so that iOS Safari will use smooth scrolling. The main document already scrolls smoothly on iOS.

This subscroller fix may be obvious to more experienced web devs, but to me it was a bit surprising. From a design standpoint, the two options seemed roughly equivalent, and it didn’t occur to me that one or the other would have such a big impact, especially on mobile browsers. Given the difference in performance, accessibility, and usability though, I’ll definitely think harder in the future about exactly which element I want to be the scrollable one.

Note that what I’m not saying in this blog post is that you should avoid subscrollers at all costs. There are some cases where the design absolutely calls for a subscroller, and the fact that Chrome hasn’t optimized for this scenario (whereas other browsers like Firefox, Edge, and Safari have) is a real bug, and I hope they’ll fix it.

However, if the visual design of the page calls for the entire document to be scrollable, then by all means, make the entire document scrollable! And check out document.scrollingElement for a good cross-browser API for managing the scrollTop and scrollHeight.

Update: Steve Genoud points out that there’s an additional benefit to scrolling the main document on iOS: you can tap the status bar to scroll back up to the top. Another usability win!

Update: Michael Howell notes that this technique can cause problems for fragment navigation, e.g. index.html#fragment, because the fixed nav could cover up the target element. Amusingly, I’ve noticed this problem in WordPress.com (where my blog is hosted) if you navigate to a fragment while logged in. I also ran into this in Pinafore in the case of element.scrollIntoView(), which I worked around by updating the scrollTop to account for the nav height right after calling scrollIntoView(true). Good to be aware of!

25 Sep

Accurately measuring layout on the web

Posted by Nolan Lawson in Web. Tagged: performance. 31 comments

Update (August 2019): the technique described below, in particular how to schedule an event to fire after style/layout calculations are complete, is now captured in a web API proposal called requestPostAnimationFrame. There is also a good polyfill called afterframe.

Update (October 2022): in 2019, WebKit updated their requestAnimationFrame implementation to align with Chrome and Firefox (i.e. rendering before the next frame).

We all want to make faster websites. The question is just what to measure, and how to use that information to determine what’s “slow” and what could be made faster.

The browser rendering pipeline is complicated. For that reason, it’s tricky to measure the performance of a webpage, especially when components are rendered client-side and everything becomes an intricate ballet between JavaScript, the DOM, styling, layout, and rendering. Many folks stick to what they understand, and so they may under-measure or completely mis-measure their website’s frontend performance.

So in this post, I want to demystify some of these concepts, and offer techniques for accurately measuring what’s going on when we render things on the web.

The web rendering pipeline

Let’s say we have a component that is rendered client-side, using JavaScript. To keep things simple, I wrote a demo component in vanilla JS, but everything I’m about to say would also apply to React, Vue, Angular, etc.

When we use the handy Performance profiler in the Chrome Dev Tools, we see something like this:

This is a view of the CPU costs of our component, in terms of milliseconds on the UI thread. To break things down, here are the steps required:

Execute JavaScript – executing (but not necessarily compiling) JavaScript, including any state manipulation, “virtual DOM diffing,” and modifying the DOM.
Calculate style – taking a CSS stylesheet and matching its selector rules with elements in the DOM. This is also known as “formatting.”
Calculate layout – taking those CSS styles we calculated in step #2 and figuring out where the boxes should be laid out on the screen. This is also known as “reflow.”
Render – the process of actually putting pixels on the screen. This often involves painting, compositing, GPU acceleration, and a separate rendering thread.

All of these steps invoke CPU costs, and therefore all of them can impact the user experience. If any one of them takes a long time, it can lead to the appearance of a slow-loading component.

The naïve approach

Now, the most common mistake that folks make when trying to measure this process is to skip steps 2, 3, and 4 entirely. In other words, they just measure the time spent executing JavaScript, and completely ignore everything after that.

When I worked as a browser performance engineer, I would often look at a trace of a team’s website and ask them which mark they used to measure “done.” More often than not, it turned out that their mark landed right after JavaScript, but before style and layout, meaning the last bit of CPU work wasn’t being measured.

So how do we measure these costs? For the purposes of this post, let’s focus on how we measure style and layout in particular. As it turns out, the render step is much more complicated to measure, and indeed it’s impossible to measure accurately, because rendering is often a complex interplay between separate threads and the GPU, and therefore isn’t even visible to userland JavaScript running on the main thread.

Style and layout calculations, however, are 100% measurable because they block the main thread. And yes, this is true even with something like Firefox’s Stylo engine – even if multiple threads can be employed to speed up the work, ultimately the main thread has to wait on all the other threads to deliver the final result. This is just the way the web works, as specc’ed.

What to measure

So in practical terms, we want to put a performance mark before our JavaScript starts executing, and another one after all the additional work is done:

I’ve written previously about various JavaScript timers on the web. Can any of these help us out?

As it turns out, requestAnimationFrame will be our main tool of choice, but there’s a problem. As Jake Archibald explains in his excellent talk on the event loop, browsers disagree on where to fire this callback:

Now, per the HTML5 event loop spec, requestAnimationFrame is indeed supposed to fire before style and layout are calculated. Edge has already fixed this in v18, and perhaps Safari will fix it in the future as well. But that would still leave us with inconsistent behavior in IE, as well as in older versions of Safari and Edge.

Also, if anything, the spec-compliant behavior actually makes it more difficult to measure style and layout! In an ideal world, the spec would have two timers – one for requestAnimationFrame, and another for requestAnimationFrameAfterStyleAndLayout (or something like that). In fact, there has been some discussion at the WHATWG about adding an API for this, but so far it’s just a gleam in the spec authors’ eyes.

Unfortunately, we live in the real world with real constraints, and we can’t wait for browsers to add this timer. So we’ll just have to figure out how to crack this nut, even with browsers disagreeing on when requestAnimationFrame should fire. Is there any solution that will work cross-browser?

Cross-browser “after frame” callback

There’s no solution that will work perfectly to place a callback right after style and layout, but based on the advice of Todd Reifsteck, I believe this comes closest:

requestAnimationFrame(() => {
  setTimeout(() => {
    performance.mark('end')
  })
})

Let’s break down what this code is doing. In the case of spec-compliant browsers, such as Chrome, it looks like this:

Note that rAF fires before style and layout, but the next setTimeout fires just after those steps (including “paint,” in this case).

And here’s how it works in non-spec-compliant browsers, such as Edge 17:

Note that rAF fires after style and layout, and the next setTimeout happens so soon that the Edge F12 Tools actually render the two marks on top of each other.

So essentially, the trick is to queue a setTimeout callback inside of a rAF, which ensures that the second callback happens after style and layout, regardless of whether the browser is spec-compliant or not.

Downsides and alternatives

Now to be fair, there are a lot of problems with this technique:

setTimeout is somewhat unpredictable in that it may be clamped to 4ms (or more in some cases).
If there are any other setTimeout callbacks that have been queued elsewhere in the code, then ours may not be the last one to run.
In the non-spec-compliant browsers, doing the setTimeout is actually a waste, because we already have a perfectly good place to set our mark – right inside the rAF!

However, if you’re looking for a one-size-fits-all solution for all browsers, rAF + setTimeout is about as close as you can get. Let’s consider some alternative approaches and why they wouldn’t work so well:

rAF + microtask

requestAnimationFrame(() => {
  Promise.resolve().then(() => {
    performance.mark('after')
  })
})

This one doesn’t work at all, because microtasks (e.g. Promises) run immediately after JavaScript execution has completed. So it doesn’t wait for style and layout at all:

rAF + requestIdleCallback

requestAnimationFrame(() => {
  requestIdleCallback(() => {
    performance.mark('after')
  })
})

Calling requestIdleCallback from inside of a requestAnimationFrame will indeed capture style and layout:

However, if the microtask version fires too early, I would worry that this one would fire too late. The screenshot above shows it firing fairly quickly, but if the main thread is busy doing other work, rIC could be delayed a long time waiting for the browser to decide that it’s safe to run some “idle” work. This one is far less of a sure bet than setTimeout.

rAF + rAF

requestAnimationFrame(() => {
  requestAnimationFrame(() => {
    performance.mark('after')
  })
})

This one, also called a “double rAF,” is a perfectly fine solution, but compared to the setTimeout version, it probably captures more idle time – roughly 16.7ms on a 60Hz screen, as opposed to the standard 4ms for setTimeout – and is therefore slightly more inaccurate.

You might wonder about that, given that I’ve already talked about setTimeout(0) not really firing in 0 (or even necessarily 4) milliseconds in a previous blog post. But keep in mind that, even though setTimeout() may be clamped by as much as a second, this only occurs in a background tab. And if we’re running in a background tab, we can’t count on rAF at all, because it may be paused altogether. (How to deal with noisy telemetry from background tabs is an interesting but separate question.)

So rAF+setTimeout, despite its flaws, is probably still better than rAF+rAF.

Not fooling ourselves

In any case, whether we choose rAF+setTimeout or double rAF, we can rest assured that we’re capturing any event-loop-driven style and layout costs. With this measure in place, it’s much less likely that we’ll fool ourselves by only measuring JavaScript and direct DOM API performance.

As an example, let’s consider what would happen if our style and layout costs weren’t just invoked by the event loop – that is, if our component were calling one of the many APIs that force style/layout recalculation, such as getBoundingClientRect(), offsetTop, etc.

If we call getBoundingClientRect() just once, notice that the style and layout calculations shift over into the middle of JavaScript execution:

The important point here is that we’re not doing anything any slower or faster – we’ve merely moved the costs around. If we don’t measure the full costs of style and layout, though, we might deceive ourselves into thinking that calling getBoundingClientRect() is slower than not calling it! In fact, though, it’s just a case of robbing Peter to pay Paul.

It’s worth noting, though, that the Chrome Dev Tools have added little red triangles to our style/layout calculations, with the message “Forced reflow is a likely performance bottleneck.” This can be a bit misleading in this case, because again, the costs are not actually any higher – they’ve just moved to earlier in the trace.

(Now it’s true that, if we call getBoundingClientRect() repeatedly and change the DOM in the process, then we might invoke layout thrashing, in which case the overall costs would indeed be higher. So the Chrome Dev Tools are right to warn folks in that case.)

In any case, my point is that it’s easy to fool yourself if you only measure explicit JavaScript execution, and ignore any event-loop-driven style and layout costs that come afterward. The two costs may be scheduled differently, but they both impact performance.

Conclusion

Accurately measuring layout on the web is hard. There’s no perfect metric to capture style and layout – or indeed, rendering – even though all three can impact the user experience just as much as JavaScript.

However, it’s important to understand how the HTML5 event loop works, and to place performance marks at the appropriate points in the component rendering lifecycle. This can help avoid any mistaken conclusions about what’s “slower” or “faster” based on an incomplete view of the pipeline, and ensure that style and layout costs are accounted for.

I hope this blog post was useful, and that the art of measuring client-side performance is a little less mysterious now. And maybe it’s time to push browser vendors to add requestAnimationFrameAfterStyleAndLayout (we’ll bikeshed on the name though!).

Thanks to Ben Kelly, Todd Reifsteck, and Alex Russell for feedback on a draft of this blog post.

1 Sep

A tour of JavaScript timers on the web

Posted by Nolan Lawson in Web. Tagged: javascript, performance. 14 comments

Pop quiz: what is the difference between these JavaScript timers?

Promises
setTimeout
setInterval
setImmediate
requestAnimationFrame
requestIdleCallback

More specifically, if you queue up all of these timers at once, do you have any idea which order they’ll fire in?

If not, you’re probably not alone. I’ve been doing JavaScript and web programming for years, I’ve worked for a browser vendor for two of those years, and it’s only recently that I really came to understand all these timers and how they play together.

In this post, I’m going to give a high-level overview of how these timers work, and when you might want to use them. I’ll also cover the Lodash functions debounce() and throttle(), because I find them useful as well.

Promises and microtasks

Let’s get this one out of the way first, because it’s probably the simplest. A Promise callback is also called a “microtask,” and it runs at the same frequency as MutationObserver callbacks. Assuming queueMicrotask() ever makes it out of spec-land and into browser-land, it will also be the same thing.

I’ve already written a lot about promises. One quick misconception about promises that’s worth covering, though, is that they don’t give the browser a chance to breathe. Just because you’re queuing up an asynchronous callback, that doesn’t mean that the browser can render, or process input, or do any of the stuff we want browsers to do.

For example, let’s say we have a function that blocks the main thread for 1 second:

function block() {
  var start = Date.now()
  while (Date.now() - start < 1000) { /* wheee */ }
}

If we were to queue up a bunch of microtasks to call this function:

for (var i = 0; i < 100; i++) {
  Promise.resolve().then(block)
}

This would block the browser for about 100 seconds. It’s basically the same as if we had done:

for (var i = 0; i < 100; i++) {
  block()
}

Microtasks execute immediately after any synchronous execution is complete. There’s no chance to fit in any work between the two. So if you think you can break up a long-running task by separating it into microtasks, then it won’t do what you think it’s doing.

setTimeout and setInterval

These two are cousins: setTimeout queues a task to run in x number of milliseconds, whereas setInterval queues a recurring task to run every x milliseconds.

The thing is… browsers don’t really respect that milliseconds thing. You see, historically, web developers have abused setTimeout. A lot. To the point where browsers have had to add mitigations for setTimeout(/* ... */, 0) to avoid locking up the browser’s main thread, because a lot of websites tended to throw around setTimeout(0) like confetti.

This is the reason that a lot of the tricks in crashmybrowser.com don’t work anymore, such as queuing up a setTimeout that calls two more setTimeouts, which call two more setTimeouts, etc. I covered a few of these mitigations from the Edge side of things in “Improving input responsiveness in Microsoft Edge”.

Broadly speaking, a setTimeout(0) doesn’t really run in zero milliseconds. Usually, it runs in 4. Sometimes, it may run in 16 (this is what Edge does when it’s on battery power, for instance). Sometimes it may be clamped to 1 second (e.g., when running in a background tab). These are the sorts of tricks that browsers have had to invent to prevent runaway web pages from chewing up your CPU doing useless setTimeout work.

So that said, setTimeout does allow the browser to run some work before the callback fires (unlike microtasks). But if your goal is to allow input or rendering to run before the callback, setTimeout is usually not the best choice because it only incidentally allows those things to happen. Nowadays, there are better browser APIs that can hook more directly into the browser’s rendering system.

setImmediate

Before moving on to those “better browser APIs,” it’s worth mentioning this thing. setImmediate is, for lack of a better word … weird. If you look it up on caniuse.com, you’ll see that only Microsoft browsers support it. And yet it also exists in Node.js, and has lots of “polyfills” on npm. What the heck is this thing?

setImmediate was originally proposed by Microsoft to get around the problems with setTimeout described above. Basically, setTimeout had been abused, and so the thinking was that we can create a new thing to allow setImmediate(0) to actually be setImmediate(0) and not this funky “clamped to 4ms” thing. You can see some discussion about it from Jason Weber back in 2011.

Unfortunately, setImmediate was only ever adopted by IE and Edge. Part of the reason it’s still in use is that it has a sort of superpower in IE, where it allows input events like keyboard and mouseclicks to “jump the queue” and fire before the setImmediate callback is executed, whereas IE doesn’t have the same magic for setTimeout. (Edge eventually fixed this, as detailed in the previously-mentioned post.)

Also, the fact that setImmediate exists in Node means that a lot of “Node-polyfilled” code is using it in the browser without really knowing what it does. It doesn’t help that the differences between Node’s setImmediate and process.nextTick are very confusing, and even the official Node docs say the names should really be reversed. (For the purposes of this blog post though, I’m going to focus on the browser rather than Node because I’m not a Node expert.)

Bottom line: use setImmediate if you know what you’re doing and you’re trying to optimize input performance for IE. If not, then just don’t bother. (Or only use it in Node.)

requestAnimationFrame

Now we get to the most important setTimeout replacement, a timer that actually hooks into the browser’s rendering loop. By the way, if you don’t know how the browser event loops works, I strongly recommend this talk by Jake Archibald. Go watch it, I’ll wait.

Okay, now that you’re back, requestAnimationFrame basically works like this: it’s sort of like a setTimeout, except instead of waiting for some unpredictable amount of time (4 milliseconds, 16 milliseconds, 1 second, etc.), it executes before the browser’s next style/layout calculation step. Now, as Jake points out in his talk, there is a minor wrinkle in that it actually executes after this step in Safari, IE, and Edge <18, but let's ignore that for now since it's usually not an important detail.

The way I think of requestAnimationFrame is this: whenever I want to do some work that I know is going to modify the browser's style or layout – for instance, changing CSS properties or starting up an animation – I stick it in a requestAnimationFrame (abbreviated to rAF from here on out). This ensures a few things:

I'm less likely to layout thrash, because all of the changes to the DOM are being queued up and coordinated.
My code will naturally adapt to the performance characteristics of the browser. For instance, if it's a low-cost device that is struggling to render some DOM elements, rAF will naturally slow down from the usual 16.7ms intervals (on 60 Hertz screens) and thus it won't bog down the machine in the same way that running a lot of setTimeouts or setIntervals might.

This is why animation libraries that don't rely on CSS transitions or keyframes, such as GreenSock or React Motion, will typically make their changes in a rAF callback. If you're animating an element between opacity: 0 and opacity: 1, there's no sense in queuing up a billion callbacks to animate every possible intermediate state, including opacity: 0.0000001 and opacity: 0.9999999.

Instead, you're better off just using rAF to let the browser tell you how many frames you're able to paint during a given period of time, and calculate the "tween" for that particular frame. That way, slow devices naturally end up with a slower framerate, and faster devices end up with a faster framerate, which wouldn't necessarily be true if you used something like setTimeout, which operates independently of the browser's rendering speed.

requestIdleCallback

rAF is probably the most useful timer in the toolkit, but requestIdleCallback is worth talking about as well. The browser support isn't great, but there's a polyfill that works just fine (and it uses rAF under the hood).

In many ways rAF is similar to requestIdleCallback. (I'll abbreviate it to rIC from now on. Starting to sound like a pair of troublemakers from West Side Story, huh? "There go Rick and Raff, up to no good!")

Like rAF, rIC will naturally adapt to the browser's performance characteristics: if the device is under heavy load, rIC may be delayed. The difference is that rIC fires on the browser "idle" state, i.e. when the browser has decided it doesn't have any tasks, microtasks, or input events to process, and you're free to do some work. It also gives you a "deadline" to track how much of your budget you're using, which is a nice feature.

Dan Abramov has a good talk from JSConf Iceland 2018 where he shows how you might use rIC. In the talk, he has a webapp that calls rIC for every keyboard event while the user is typing, and then it updates the rendered state inside of the callback. This is great because a fast typist can cause many keydown/keyup events to fire very quickly, but you don't necessarily want to update the rendered state of the page for every keypress.

Another good example of this is a “remaining character count” indicator on Twitter or Mastodon. I use rIC for this in Pinafore, because I don't really care if the indicator updates for every single key that I type. If I'm typing quickly, it's better to prioritize input responsiveness so that I don't lose my sense of flow.

Screenshot of Pinafore with some text entered in the text box and a digit counter showing the number of remaining characters

In Pinafore, the little horizontal bar and the “characters remaining” indicator update as you type.

One thing I’ve noticed about rIC, though, is that it’s a little finicky in Chrome. In Firefox it seems to fire whenever I would, intuitively, think that the browser is “idle” and ready to run some code. (Same goes for the polyfill.) In mobile Chrome for Android, though, I’ve noticed that whenever I scroll with touch scrolling, it might delay rIC for several seconds even after I’m done touching the screen and the browser is doing absolutely nothing. (I suspect the issue I’m seeing is this one.)

Update: Alex Russell from the Chrome team informs me that this is a known issue and should be fixed soon!

In any case, rIC is another great tool to add to the tool chest. I tend to think of it this way: use rAF for critical rendering work, use rIC for non-critical work.

debounce and throttle

These two functions aren’t built in to the browser, but they’re so useful that they’re worth calling out on their own. If you aren’t familiar with them, there’s a good breakdown in CSS Tricks.

My standard use for debounce is inside of a resize callback. When the user is resizing their browser window, there’s no point in updating the layout for every resize callback, because it fires too frequently. Instead, you can debounce for a few hundred milliseconds, which will ensure that the callback eventually fires once the user is done fiddling with their window size.

throttle, on the other hand, is something I use much more liberally. For instance, a good use case is inside of a scroll event. Once again, it’s usually senseless to try to update the rendered state of the app for every scroll callback, because it fires too frequently (and the frequency can vary from browser to browser and from input method to input method… ugh). Using throttle normalizes this behavior, and ensures that it only fires every x number of milliseconds. You can also tweak Lodash’s throttle (or debounce) function to fire at the start of the delay, at the end, both, or neither.

In contrast, I wouldn’t use debounce for the scrolling scenario, because I don’t want the UI to only update after the user has explicitly stopped scrolling. That can get annoying, or even confusing, because the user might get frustrated and try to keep scrolling in order to update the UI state (e.g. in an infinite-scrolling list). throttle is better in this case, because it doesn’t wait for the scroll event to stop firing.

throttle is a function I use all over the place for all kinds of user input, and even for some regularly-scheduled tasks like IndexedDB cleanups. It’s extremely useful. Maybe it should just be baked into the browser some day!

Conclusion

So that’s my whirlwind tour of the various timer functions available in the browser, and how you might use them. I probably missed a few, because there are certainly some exotic ones out there (postMessage or lifecycle events, anyone?). But hopefully this at least provides a good overview of how I think about JavaScript timers on the web.

20 Mar

Smaller Lodash bundles with Webpack and Babel

Posted by Nolan Lawson in Web. Tagged: javascript, lodash, performance. 9 comments

One of the benefits of working with smart people is that you can learn a lot from them through osmosis. As luck would have it, a recent move placed my office next to John-David Dalton‘s, with the perk being that he occasionally wanders into my office to talk about cool stuff he’s working on, like Lodash and ES modules in Node.

Recently we chatted about Lodash and the various plugins for making its bundle size smaller, such as lodash-webpack-plugin and babel-plugin-lodash. I admitted that I had used both projects but only had a fuzzy notion of what they actually did, or why you’d want to use one or the other. Fortunately J.D. set me straight, and so I thought it’d be a good opportunity to take what I’ve learned and turn it into a short blog post.

TL;DR

Use the import times from 'lodash/times' format over import { times } from 'lodash' wherever possible. If you do, then you don’t need the babel-plugin-lodash. Update: or use lodash-es instead.

Be very careful when using lodash-webpack-plugin to check that you’re not omitting any features you actually need, or stuff can break in production.

Avoid Lodash chaining (e.g. _(array).map(...).filter(...).take(...)), since there’s currently no way to reduce its size.

babel-plugin-lodash

The first thing to understand about Lodash is that there are multiple ways you can use the same method, but some of them are more expensive than others:

import { times } from 'lodash'   // 68.81kB  :(
import times from 'lodash/times' //  2.08kB! :)

times(3, () => console.log('whee'))

You can see the difference using something like webpack-bundle-analyzer. Here’s the first version:

Screenshot of lodash.js taking up almost the entire bundle size

Using the import { times } from 'lodash' idiom, it turns out that lodash.js is so big that you can’t even see our tiny index.js! Lodash takes up a full parsed size of 68.81kB. (In the bundle analyzer, hover your mouse over the module to see the size.)

Now here’s the second version (using import times from 'lodash/times'):

Screenshot showing many smaller Lodash modules not taking up so much space

In the second screenshot, Lodash’s total size has shrunk down to 2.08kB. Now we can finally see our index.js!

However, some people prefer the second syntax to the first, especially since it can get more terse the more you import.

Consider:

import { map, filter, times, noop } from 'lodash'

compared to:

import map from 'lodash/map'
import filter from 'lodash/filter'
import times from 'lodash/times'
import noop from 'lodash/noop'

What the babel-plugin-lodash proposes is to automatically rewrite your Lodash imports to use the second pattern rather than the first. So it would rewrite

import { times } from 'lodash'

import times from 'lodash/times'

One takeway from this is that, if you’re already using the import times from 'lodash/times' idiom, then you don’t need babel-plugin-lodash.

Update: apparently if you use the lodash-es package, then you also don’t need the Babel plugin. It may also have better tree-shaking outputs in Webpack due to setting "sideEffects": false in package.json, which the main lodash package does not do.

lodash-webpack-plugin

What lodash-webpack-plugin does is a bit more complicated. Whereas babel-plugin-lodash focuses on the syntax in your own code, lodash-webpack-plugin changes how Lodash works under the hood to make it smaller.

The reason this cuts down your bundle size is that it turns out there are a lot of edge cases and niche functionality that Lodash provides, and if you’re not using those features, they just take up unnecessary space. There’s a full list in the README, but let’s walk through some examples.

Iteratee shorthands

What in the heck is an “iteratee shorthand”? Well, let’s say you want to map() an Array of Objects like so:

import map from 'lodash/map'
map([{id: 'foo'}, {id: 'bar'}], obj => obj.id) // ['foo', 'bar']

In this case, Lodash allows you to use a shorthand:

import map from 'lodash/map'
map([{id: 'foo'}, {id: 'bar'}], 'id') // ['foo', 'bar']

This shorthand syntax is nice to save a few characters, but unfortunately it requires Lodash to use more code under the hood. So lodash-webpack-plugin can just remove this functionality.

For example, let’s say I use the full arrow function instead of the shorthand. Without lodash-webpack-plugin, we get:

Screenshot showing multiple lodash modules under .map

In this case, Lodash takes up 18.59kB total.

Now let’s add lodash-webpack-plugin:

Screenshot of lodash with a very small map.js dependency

And now Lodash is down to 117 bytes! That’s quite the savings.

Collection methods

Another example is “collection methods” for Objects. This means being able to use standard Array methods like forEach() and map() on an Object, in which case Lodash gives you a callback with both the key and the value:

import forEach from 'lodash/forEach'

forEach({foo: 'bar', baz: 'quux'}, (value, key) => {
  console.log(key, value)
  // prints 'foo bar' then 'baz quux'
})

This is handy, but once again it has a cost. Let’s say we’re only using forEach for Arrays:

import forEach from 'lodash/forEach'

forEach(['foo', 'bar'], obj => {
  console.log(obj) // prints 'foo' then 'bar
})

In this case, Lodash will take up a total of 5.06kB:

Screenshot showing Lodash forEach() taking up quite a few modules

Whereas once we add in lodash-webpack-plugin, Lodash trims down to a svelte 108 bytes:

Screenshot showing a very small Lodash forEach.js module

Chaining

Another common Lodash feature is chaining, which exposes functionality like this:

import _ from 'lodash'
const array = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
console.log(_(array)
  .map(i => parseInt(i, 10))
  .filter(i => i % 2 === 1)
  .take(5)
  .value()
) // prints '[ 1, 3, 5, 7, 9 ]'

Unfortunately there is currently no good way to reduce the size required for chaining. So you’re better off importing the Lodash functions individually:

import map from 'lodash/map'
import filter from 'lodash/filter'
import take from 'lodash/take'
const array = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']

console.log(
  take(
    filter(
      map(array, i => parseInt(i, 10)),
    i => i % 2 === 1),
  5)
) // prints '[ 1, 3, 5, 7, 9 ]'

Using the lodash-webpack-plugin with the chaining option enabled, the first example takes up the full 68.81kB:

Screenshot showing large lodash.js dependency

This makes sense, since we’re still importing all of Lodash for the chaining to work.

Whereas the second example with chaining disabled gives us only 590 bytes:

Screenshot showing a handful of small Lodash modules

The second piece of code is a bit harder to read than the first, but it’s certainly a big savings in file size! Luckily J.D. tells me there may be some work in progress on a plugin that could rewrite the second syntax to look more like the first (similar to babel-plugin-lodash).

Edit: it was brought to my attention in the comments that this functionality should be coming soon to babel-plugin-lodash!

Gotchas

Saving bundle size is great, but lodash-webpack-plugin comes with some caveats. By default, all of these features – shorthands for the iteratee shorthands, collections for the Object collection methods, and others – are disabled by default. Furthermore, they may break or even silently fail if you try to use them when they’re disabled.

This means that if you only use lodash-webpack-plugin in production, you may be in for a rude surprise when you test something in development mode and then find it’s broken in production. In my previous examples, if you use the iteratee shorthand:

map([{id: 'foo'}, {id: 'bar'}], 'id') // ['foo', 'bar']

And if you don’t enable shorthands in lodash-webpack-plugin, then this will actually throw a runtime error:

map.js:16 Uncaught TypeError: iteratee is not a function

In the case of the Object collection methods, it’s more insidious. If you use:

forEach({foo: 'bar', baz: 'quux'}, (value, key) => {
  console.log(key, value)
})

And if you don’t enable collections in lodash-webpack-plugin, then the forEach() method will silently fail. This can lead to some very hard-to-uncover bugs!

Conclusion

The babel-plugin-lodash and lodash-webpack-plugin packages are great. They’re an easy way to reduce your bundle size by a significant amount and with minimal effort.

The lodash-webpack-plugin is particularly useful, since it actually changes how Lodash operates under the hood and can remove functionality that almost nobody uses. Support for edge cases like sparse arrays (guards) and typed arrays (exotics) is unlikely to be something you’ll need.

While the lodash-webpack-plugin is extremely useful, though, it also has some footguns. If you’re only enabling it for production builds, you may be surprised when something works in development but then fails in production. It might also be hard to add to a large existing project, since you’ll have to meticulously audit all your uses of Lodash.

So be sure to carefully read the documentation before installing the lodash-webpack-plugin. And if you’re not sure if you need a certain feature, then you may be better off enabling that feature (or disabling the plugin entirely) and just take the ~20kB hit.

Note: if you’d like to experiment with this yourself, I put these examples into a small GitHub repo. If you uncomment various bits of code in src/index.js, and enable or disable the Babel and Webpack plugins in .babelrc and webpack.config.js, then you can play around with these examples yourself.

29 Feb

High-performance Web Worker messages

Posted by Nolan Lawson in Webapps. Tagged: performance, web workers. 16 comments

Update: this blog post was based on the latest browsers as of early 2016. Things have changed, and in particular the benchmark shows that recent versions of Chrome do not exhibit the performance cliff for non-stringified postMessage() messages as described in this post.

In recent posts and talks, I’ve explored how Web Workers can vastly improve the responsiveness of a web application, by moving work off the UI thread and thereby reducing DOM-blocking. In this post, I’ll delve a bit more deeply into the performance characteristics of postMessage(), which is the primary interface for communicating with Web Workers.

Since Web Workers run in a separate thread (although not necessarily a separate process), and since JavaScript environments don’t share memory across threads, messages have to be explicitly sent between the main thread and the worker. As it turns out, the format you choose for this message can have a big impact on performance.

TLDR: always use JSON.stringify() and JSON.parse() to communicate with a Web Worker. Be sure to fully stringify the message.

I first came across this tip from IndexedDB spec author and Chrome developer Joshua Bell, who mentioned offhand:

We know that serialization/deserialization is slow. It’s actually faster to JSON.stringify() then postMessage() a string than to postMessage() an object.

This insight was further confirmed by Parashuram N., who demonstrated experimentally that stringify was a key factor in making a worker-based React implementation that improved upon vanilla React. He says:

By “stringifying” all messages between the worker and the main thread, React implemented on a Web Worker [is] faster than the normal React version. The perf benefit of the Web Worker approach starts to increase as the number of nodes increases.

Malte Ubl, tech lead of the AMP project, has also been experimenting with postMessage() in Web Workers. He had this to say:

On phones, [stringifying] is quickly relevant, but not with just 3 or so fields. Just measured the other day. It is bad.

This made me curious as to where, exactly, the tradeoffs lie with stringfying messages. So I decided to create a simple benchmark and run it on a variety of browsers. My tests confirmed that stringifying is indeed faster than sending raw objects, and that the message size has a dramatic impact on the speed of worker communication.

Furthermore, the only real benefit comes if you stringify the entire message. Even a small object that wraps the stringified message (e.g. {msg: JSON.stringify(message)}) performs worse than the fully-stringified case. (These results differ between Chrome, Firefox, and Safari, but keep reading for the full analysis.)

Test results

In this test, I ran 50,000 iterations of postMessage() (both to and from the worker) and used console.time() to measure the total time spent posting messages back and forth. I also varied the number of keys in the object between 0 and 30 (keys and values were both just Math.random()).

Clarification: the test does include the overhead of JSON.parse() and JSON.stringify(). The worker even re-stringifies the message when echoing it back.

First, here are the results in Chrome 48 (running on a 2013 MacBook Air with Yosemite):

And in Chrome 48 for Android (running on a Nexus 5 with Android 5.1):

What’s clear from these results is that full stringification beats both partial stringification and no-stringification across all message sizes. The difference is fairly stark on desktop Chrome for small messages sizes, but this difference start to narrow as message size increases. On the Nexus 5, there’s no such dramatic swing.

In Firefox 46 (also on the MacBook Air), stringification is still the winner, although by a smaller margin:

In Safari 9, it gets more interesting. For Safari, at least, stringification is actually slower than posting raw messages:

Based on these results, you might be tempted to think it’s a good idea to UA-sniff for Safari, and avoid stringification in that browser. However, it’s worth considering that Safari is consistently faster than Chrome (with or without stringification), and that it’s also faster than Firefox, at least for small message sizes. Here are the stringified results for all three browsers:

So the fact that Safari is already fast for small messages would reduce the attractiveness of any UA-sniffing hack. Also notice that Firefox, to its credit, maintains a fairly consistent response time regardless of message size, and starts to actually beat both Safari and Chrome at the higher levels.

Now, assuming we were to use the UA-sniffing approach, we could swap in the raw results for Safari (i.e. showing the fastest times for each browser), which gives us this:

So it appears that avoiding stringification in Safari allows it to handily beat the other browsers, although it does start to converge with Firefox for larger message sizes.

On a whim, I also tested Transferables, i.e. using ArrayBuffers as the data format to transfer the stringified JSON. In theory, Transferables can offer some performance gains when sending large data, because the ArrayBuffer is instantly zapped from one thread to the other, without any cloning or copying. (After transfer, the ArrayBuffer is unavailable to the sender thread.)

As it turned out, though, this didn’t perform well in either Chrome or Firefox. So I didn’t explore it any further.

Transferables might be useful for sending binary data that’s already in that format (e.g. Blobs, Files, etc.), but for JSON data it seems like a poor fit. On the bright side, they do have wide browser support, including Chrome, Firefox, Safari, IE, and Edge.

Speaking of Edge, I would have run these tests in that browser, but unfortunately my virtual machine kept crashing due to the intensity of the tests, and I didn’t have an actual Windows device handy. Contributions welcome!

Correction: this post originally stated that Safari doesn’t support Transferables. It does.

Update: Boulos Dib has gracious run the numbers for Edge 13, and they look very similar to Safari (in that raw objects are faster than stringification):

Conclusion

Based on these tests, my recommendation would be to use stringification across the board, or to UA-sniff for Safari and avoid stringification in that browser (but only if you really need maximum performance!).

Another takeaway is that, in general, message sizes should be kept small. Firefox seems to be able to maintain a relatively speedy delivery regardless of the message size, but Safari and Chrome tend to slow down considerably as the message size increases. For very large messages, it may even make sense to save the data to IndexedDB from the worker, and then simply fetch the saved data from the main thread, but I haven’t verified this idea with a benchmark.

The full results for my tests are available in this spreadsheet. I encourage anybody who wants to reproduce these results to check out the test suite and offer a pull request or the results from their own browser.

And if you’d like a simple Web Worker library that makes use of stringification, check out promise-worker.

Update: Chris Thoburn has offered another Web Worker performance test that adds some additional ways of sending messages, like MessageChannels. Here are his own browser results.

« Older Entries

Read the Tea Leaves Software and other dark arts, by Nolan Lawson

Posts Tagged ‘performance’

Memory leaks and software engineering

Low-impact…

…Until critical

Hard to diagnose

Trivial to fix once diagnosed

Conclusion

Testing real-world websites

Caveats

Other memory leak scenarios

Why write about web performance?

The subtle science of benchmarking

Humility

Conclusion

Wrong: web workers will take over the world

Wrong: Safari is the new IE

Right: developer experience is trumping user experience

Right: I’m better off without a Twitter account

Right: the cost of small modules

Mixed: progressive enhancement isn’t dead, but it smells funny

Conclusions

The web rendering pipeline

The naïve approach

What to measure

Cross-browser “after frame” callback

Downsides and alternatives

rAF + microtask

rAF + requestIdleCallback

rAF + rAF

Not fooling ourselves

Conclusion

Promises and microtasks

setTimeout and setInterval

setImmediate

requestAnimationFrame

requestIdleCallback

debounce and throttle

Conclusion

TL;DR

babel-plugin-lodash

lodash-webpack-plugin

Iteratee shorthands

Collection methods

Chaining

Gotchas

Conclusion

Test results

Conclusion

Recent Posts

About Me

Archives

Tags

Links