Accurately measuring layout on the web

We all want to make faster websites. The question is just what to measure, and how to use that information to determine what’s “slow” and what could be made faster.

The browser rendering pipeline is complicated. For that reason, it’s tricky to measure the performance of a webpage, especially when components are rendered client-side and everything becomes an intricate ballet between JavaScript, the DOM, styling, layout, and rendering. Many folks stick to what they understand, and so they may under-measure or completely mis-measure their website’s frontend performance.

So in this post, I want to demystify some of these concepts, and offer techniques for accurately measuring what’s going on when we render things on the web.

The web rendering pipeline

Let’s say we have a component that is rendered client-side, using JavaScript. To keep things simple, I wrote a demo component in vanilla JS, but everything I’m about to say would also apply to React, Vue, Angular, etc.

When we use the handy Performance profiler in the Chrome Dev Tools, we see something like this:

Screenshot of Chrome Dev Tools showing work on the UI thread divided into JavaScript, then Style, then Layout, then Render

This is a view of the CPU costs of our component, in terms of milliseconds on the UI thread. To break things down, here are the steps required:

  1. Execute JavaScript – executing (but not necessarily compiling) JavaScript, including any state manipulation, “virtual DOM diffing,” and modifying the DOM.
  2. Calculate style – taking a CSS stylesheet and matching its selector rules with elements in the DOM. This is also known as “formatting.”
  3. Calculate layout – taking those CSS styles we calculated in step #2 and figuring out where the boxes should be laid out on the screen. This is also known as “reflow.”
  4. Render – the process of actually putting pixels on the screen. This often involves painting, compositing, GPU acceleration, and a separate rendering thread.

All of these steps invoke CPU costs, and therefore all of them can impact the user experience. If any one of them takes a long time, it can lead to the appearance of a slow-loading component.

The naïve approach

Now, the most common mistake that folks make when trying to measure this process is to skip steps 2, 3, and 4 entirely. In other words, they just measure the time spent executing JavaScript, and completely ignore everything after that.

Screenshot of Chrome Dev Tools, showing an arrow pointing after JavaScript but before Style and Layout with the text 'Most devs stop measuring here'

When I worked as a browser performance engineer, I would often look at a trace of a team’s website and ask them which mark they used to measure “done.” More often than not, it turned out that their mark landed right after JavaScript, but before style and layout, meaning the last bit of CPU work wasn’t being measured.

So how do we measure these costs? For the purposes of this post, let’s focus on how we measure style and layout in particular. As it turns out, the render step is much more complicated to measure, and indeed it’s impossible to measure accurately, because rendering is often a complex interplay between separate threads and the GPU, and therefore isn’t even visible to userland JavaScript running on the main thread.

Style and layout calculations, however, are 100% measurable because they block the main thread. And yes, this is true even with something like Firefox’s Stylo engine – even if multiple threads can be employed to speed up the work, ultimately the main thread has to wait on all the other threads to deliver the final result. This is just the way the web works, as specc’ed.

What to measure

So in practical terms, we want to put a performance mark before our JavaScript starts executing, and another one after all the additional work is done:

Screenshot of Chrome Dev Tools, with arrow pointing before JavaScript execution saying 'Ideal start' and arrow pointing after Render (Paint) saying 'Ideal end'

I’ve written previously about various JavaScript timers on the web. Can any of these help us out?

As it turns out, requestAnimationFrame will be our main tool of choice, but there’s a problem. As Jake Archibald explains in his excellent talk on the event loop, browsers disagree on where to fire this callback:

Screenshot of Chrome Dev Tools showing arrow pointing before style/layout saying "Chrome, FF, Edge >= 18" and arrow pointing after style/layout saying "Safari, IE, Edge < 18"

Now, per the HTML5 event loop spec, requestAnimationFrame is indeed supposed to fire before style and layout are calculated. Edge has already fixed this in v18, and perhaps Safari will fix it in the future as well. But that would still leave us with inconsistent behavior in IE, as well as in older versions of Safari and Edge.

Also, if anything, the spec-compliant behavior actually makes it more difficult to measure style and layout! In an ideal world, the spec would have two timers – one for requestAnimationFrame, and another for requestAnimationFrameAfterStyleAndLayout (or something like that). In fact, there has been some discussion at the WHATWG about adding an API for this, but so far it’s just a gleam in the spec authors’ eyes.

Unfortunately, we live in the real world with real constraints, and we can’t wait for browsers to add this timer. So we’ll just have to figure out how to crack this nut, even with browsers disagreeing on when requestAnimationFrame should fire. Is there any solution that will work cross-browser?

Cross-browser “after frame” callback

There’s no solution that will work perfectly to place a callback right after style and layout, but based on the advice of Todd Reifsteck, I believe this comes closest:

requestAnimationFrame(() => {
  setTimeout(() => {
    performance.mark('end')
  })
})

Let’s break down what this code is doing. In the case of spec-compliant browsers, such as Chrome, it looks like this:

Screenshot of Chrome Dev Tools showing 'Start' before JavaScript execution, requestAnimationFrame before style/layout, and setTimeout falling a bit after Paint/Render

Note that rAF fires before style and layout, but the next setTimeout fires just after those steps (including “paint,” in this case).

And here’s how it works in non-spec-compliant browsers, such as Edge 17:

Screenshot of Edge F12 Tools showing 'Start' before JavaScript execution, and requestAnimationFrame/setTimeout both almost immediately after style/layout

Note that rAF fires after style and layout, and the next setTimeout happens so soon that the Edge F12 Tools actually render the two marks on top of each other.

So essentially, the trick is to queue a setTimeout callback inside of a rAF, which ensures that the second callback happens after style and layout, regardless of whether the browser is spec-compliant or not.

Downsides and alternatives

Now to be fair, there are a lot of problems with this technique:

  1. setTimeout is somewhat unpredictable in that it may be clamped to 4ms (or more in some cases).
  2. If there are any other setTimeout callbacks that have been queued elsewhere in the code, then ours may not be the last one to run.
  3. In the non-spec-compliant browsers, doing the setTimeout is actually a waste, because we already have a perfectly good place to set our mark – right inside the rAF!

However, if you’re looking for a one-size-fits-all solution for all browsers, rAF + setTimeout is about as close as you can get. Let’s consider some alternative approaches and why they wouldn’t work so well:

rAF + microtask

requestAnimationFrame(() => {
  Promise.resolve().then(() => {
    performance.mark('after')
  })
})

This one doesn’t work at all, because microtasks (e.g. Promises) run immediately after JavaScript execution has completed. So it doesn’t wait for style and layout at all:

Screenshot of Chrome Dev Tools showing microtask firing before style/layout

rAF + requestIdleCallback

requestAnimationFrame(() => {
  requestIdleCallback(() => {
    performance.mark('after')
  })
})

Calling requestIdleCallback from inside of a requestAnimationFrame will indeed capture style and layout:

Screenshot of Chrome Dev Tools showing requestIdleCallback firing a bit after render/paint

However, if the microtask version fires too early, I would worry that this one would fire too late. The screenshot above shows it firing fairly quickly, but if the main thread is busy doing other work, rIC could be delayed a long time waiting for the browser to decide that it’s safe to run some “idle” work. This one is far less of a sure bet than setTimeout.

rAF + rAF

requestAnimationFrame(() => {
  requestAnimationFrame(() => {
    performance.mark('after')
  })
})

This one, also called a “double rAF,” is a perfectly fine solution, but compared to the setTimeout version, it probably captures more idle time – roughly 16.7ms on a 60Hz screen, as opposed to the standard 4ms for setTimeout – and is therefore slightly more inaccurate.

Screenshot of Chrome Dev Tools showing a second requestAnimationFrame firing a bit after render/paint

You might wonder about that, given that I’ve already talked about setTimeout(0) not really firing in 0 (or even necessarily 4) milliseconds in a previous blog post. But keep in mind that, even though setTimeout() may be clamped by as much as a second, this only occurs in a background tab. And if we’re running in a background tab, we can’t count on rAF at all, because it may be paused altogether. (How to deal with noisy telemetry from background tabs is an interesting but separate question.)

So rAF+setTimeout, despite its flaws, is probably still better than rAF+rAF.

Not fooling ourselves

In any case, whether we choose rAF+setTimeout or double rAF, we can rest assured that we’re capturing any event-loop-driven style and layout costs. With this measure in place, it’s much less likely that we’ll fool ourselves by only measuring JavaScript and direct DOM API performance.

As an example, let’s consider what would happen if our style and layout costs weren’t just invoked by the event loop – that is, if our component were calling one of the many APIs that force style/layout recalculation, such as getBoundingClientRect(), offsetTop, etc.

If we call getBoundingClientRect() just once, notice that the style and layout calculations shift over into the middle of JavaScript execution:

Screenshot of Chrome Dev Tools showing style/layout costs moved to the left inside of JavaScript execution under getBoundingClientRect with red triangles on each purple rectangle

The important point here is that we’re not doing anything any slower or faster – we’ve merely moved the costs around. If we don’t measure the full costs of style and layout, though, we might deceive ourselves into thinking that calling getBoundingClientRect() is slower than not calling it! In fact, though, it’s just a case of robbing Peter to pay Paul.

It’s worth noting, though, that the Chrome Dev Tools have added little red triangles to our style/layout calculations, with the message “Forced reflow is a likely performance bottleneck.” This can be a bit misleading in this case, because again, the costs are not actually any higher – they’ve just moved to earlier in the trace.

(Now it’s true that, if we call getBoundingClientRect() repeatedly and change the DOM in the process, then we might invoke layout thrashing, in which case the overall costs would indeed be higher. So the Chrome Dev Tools are right to warn folks in that case.)

In any case, my point is that it’s easy to fool yourself if you only measure explicit JavaScript execution, and ignore any event-loop-driven style and layout costs that come afterward. The two costs may be scheduled differently, but they both impact performance.

Conclusion

Accurately measuring layout on the web is hard. There’s no perfect metric to capture style and layout – or indeed, rendering – even though all three can impact the user experience just as much as JavaScript.

However, it’s important to understand how the HTML5 event loop works, and to place performance marks at the appropriate points in the component rendering lifecycle. This can help avoid any mistaken conclusions about what’s “slower” or “faster” based on an incomplete view of the pipeline, and ensure that style and layout costs are accounted for.

I hope this blog post was useful, and that the art of measuring client-side performance is a little less mysterious now. And maybe it’s time to push browser vendors to add requestAnimationFrameAfterStyleAndLayout (we’ll bikeshed on the name though!).

Thanks to Ben Kelly, Todd Reifsteck, and Alex Russell for feedback on a draft of this blog post.

38 responses to this post.

  1. Nice work. Would love to get this added into boomerang’s continuity metrics, specifically to enhance our measure of time to visually ready.

    Reply

  2. Wow Nolan! What a great post! Thank you so much. You made my day! (OK – enough exclamation points!!)

    I wish Long Tasks API had information beyond just script. That would be the best way to track this – using instrumentation that is spec’ed and supported by the browser. I love that you’re helping motivate that by giving developers an alternative technique.

    One downside of deploying this technique in RUM is that (AFAIK) this technique can’t separate JS execution from layout & render, but that might be important for developers to diagnose the problem. Is that true?

    Reply

    • Thanks for the feedback! Yes, that is a downside of the Long Tasks API (AIUI), and could lead to the “fooling ourselves” example I gave with getBoundingClientRect() (where we only measure JS execution and ignore the style/layout after).

      If you want to separate JS from style/layout costs, then I suppose the best bet is to put a mark in rAF for spec-compliant browsers, but then for non-spec-compliant browsers… I’m not sure. Perhaps microtasks? UA-sniffing might also be necessary for maximum accuracy. This goes back to the need for two separate rAF APIs…

      Reply

  3. Hi,
    Wanted to know your thoughts on setImmediate in this case. How would it compare with rAF + rAF?

    Reply

    • setImmediate is only supported in IE and Edge, and the main benefit is that it’s not clamped to 4ms, however it does also allow input events to “jump the queue” in IE whereas setTimeout does not. (Although to be fair, all other browsers should allow input to jump the queue in front of setTimeout, so I suppose IE’s setImmediate is more similar to other browsers’ setTimeout in that way.)

      All other things being equal, though, since setImmediate doesn’t clamp whereas setTimeout does, then yes, using setImmediate || setTimeout inside of rAF does seem reasonable to me. I’d have to test in Edge 18 though, to confirm that it does indeed fire after style/layout. Good suggestion!

      Reply

  4. Posted by julienwajsberg on September 27, 2018 at 12:11 AM

    Hey, thanks for the great post !
    I wonder if a single setTimeout wouldn’t work here ?
    (although I think that in a synthetic test I’d force a layout like you describe in the end)

    Reply

    • A single setTimeout wouldn’t consistently work, because since it’s clamped to 4ms but a frame is typically 16.7ms (or longer, depending on the monitor), it could fire before rAF would. Hence rAF+setTimeout.

      Reply

  5. Posted by Akira on September 27, 2018 at 4:51 AM

    “calling getBoundingClientRect() is slower than not calling it!” did you mean faster?

    Reply

    • Nope, I meant “slower.” :) The idea is if you only measure explicit JS execution, you’ll measure gBCR but none of the later style/layout costs. So moving costs from the right to the left “looks” slower, even though it’s the same.

      Reply

  6. […] Accurately measuring layout on the web […]

    Reply

  7. […] Accurately measuring layout on the web […]

    Reply

  8. […] Accurately measuring layout on the web […]

    Reply

  9. Posted by fabiordp on September 28, 2018 at 5:19 PM

    Thanks for sharing this Nolan. I made sure people on the DevTools team read it. We miss having your expertise in the house but glad you’re still sharing your knowledge publicly!

    Reply

  10. Thanks for writing this great post.

    We measure the JS execution cost of our front end scripts and have created a proxy metric for measuring interactivity. Have run in to the similar problem using rAF. One problem that is not mentioned here is

    When there are multiple scripts on the page and the first few scripts that are executed does not cause any Reflow and Paint and the next scripts are trying to render something on the page, We will end up measuring the wrong JS execution cost for former scripts since rAF fires later on with the render.

    We ended up not measuring the execution using rAF since it was inaccurate.

    Reply

  11. Superb reading. Thanks a lot!

    Reply

  12. […] Полное руководство по ленивой загрузке изображений• Точные замеры лейаута в вебе. О том, как измерять производительность сайтов и как […]

    Reply

  13. Posted by Dmitry on October 8, 2018 at 5:11 PM

    The rAF + setTimeout approach is interesting but wouldn’t the end marker be potentially delayed if there were other macro tasks queued?

    Reply

  14. […] pipeline and how to make websites faster? (Everyone does, right?) This article by developer Nolan Lawson explains it in one of the least complex ways possible so you can more accurately measure what’s happening […]

    Reply

  15. […] pipeline and how to make websites faster? (Everyone does, right?) This article by developer Nolan Lawson explains it in one of the least complex ways possible so you can more accurately measure what’s happening […]

    Reply

  16. […] pipeline and how to make websites faster? (Everyone does, right?) This article by developer Nolan Lawson explains it in one of the least complex ways possible so you can more accurately measure what’s happening […]

    Reply

  17. […] pipeline and how to make websites faster? (Everyone does, right?) This article by developer Nolan Lawson explains it in one of the least complex ways possible so you can more accurately measure what’s happening […]

    Reply

  18. […] pipeline and how to make websites faster? (Everyone does, right?) This article by developer Nolan Lawson explains it in one of the least complex ways possible so you can more accurately measure what’s happening […]

    Reply

  19. […] pipeline and how to make websites faster? (Everyone does, right?) This article by developer Nolan Lawson explains it in one of the least complex ways possible so you can more accurately measure what’s happening […]

    Reply

  20. […] pipeline and how to make websites faster? (Everyone does, right?) This article by developer Nolan Lawson explains it in one of the least complex ways possible so you can more accurately measure what’s happening […]

    Reply

  21. […] pipeline and how to make websites faster? (Everyone does, right?) This article by developer Nolan Lawson explains it in one of the least complex ways possible so you can more accurately measure what’s happening […]

    Reply

  22. […] pipeline and how to make websites faster? (Everyone does, right?) This article by developer Nolan Lawson explains it in one of the least complex ways possible so you can more accurately measure what’s happening […]

    Reply

  23. […] rendering pipeline and make web sites sooner? (Everybody does, proper?) This text by developer Nolan Lawson explains it in one of many least advanced methods potential so you possibly can extra precisely measure […]

    Reply

  24. […] pipeline and how to make websites faster? (Everyone does, right?) This article by developer Nolan Lawson explains it in one of the least complex ways possible so you can more accurately measure what’s happening […]

    Reply

  25. […] pipeline and how to make websites faster? (Everyone does, right?) This article by developer Nolan Lawson explains it in one of the least complex ways possible so you can more accurately measure what’s happening […]

    Reply

  26. […] pipeline and how to make websites faster? (Everyone does, right?) This article by developer Nolan Lawson explains it in one of the least complex ways possible so you can more accurately measure what’s happening […]

    Reply

  27. […] pipeline and how to make websites faster? (Everyone does, right?) This article by developer Nolan Lawson explains it in one of the least complex ways possible so you can more accurately measure what’s happening […]

    Reply

  28. […] pipeline and how to make websites faster? (Everyone does, right?) This article by developer Nolan Lawson explains it in one of the least complex ways possible so you can more accurately measure what’s happening […]

    Reply

  29. […] pipeline and how to make websites faster? (Everyone does, right?) This article by developer Nolan Lawson explains it in one of the least complex ways possible so you can more accurately measure what’s happening […]

    Reply

  30. […] pipeline and how to make websites faster? (Everyone does, right?) This article by developer Nolan Lawson explains it in one of the least complex ways possible so you can more accurately measure what’s happening […]

    Reply

  31. […] pipeline and how to make websites faster? (Everyone does, right?) This article by developer Nolan Lawson explains it in one of the least complex ways possible so you can more accurately measure what’s happening […]

    Reply

  32. […] pipeline and how to make websites faster? (Everyone does, right?) This article by developer Nolan Lawson explains it in one of the least complex ways possible so you can more accurately measure what’s happening […]

    Reply

  33. […] pipeline and how to make websites faster? (Everyone does, right?) This article by developer Nolan Lawson explains it in one of the least complex ways possible so you can more accurately measure what’s happening […]

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: