Mika Raento's Tech Blog: DOM and CSS performance with Mobile Safari

This blog post is the handout for my talk at HelsinkiJS. It's a bit long for a blog post and not very polished... I may get time to polish it later.

Problem setting

We at Brightside are building fairly data-heavy (supporting sheets that have several thousand rows by tens of columns) mobile web apps. We are especially not building web sites or HTML5 games.

We want our applications to feel snappy to the user.

We generate and manipulate the DOM with AngularJS and d3.js (as opposed to, say, jQuery). Both of these have specific performance implications and goals. AngularJS is expressedly meant to render one screenful of DOM at a time. D3 is extremely performant in generating DOM but the enter-update-exit set style doesn't lend itself to manipulating only some properties, and animating a large number of DOM nodes separately is not feasible. AngularJS doesn't query the DOM like you would probably do with jQuery - all the DOM manipulations are already tied to the right DOM nodes.

Basics of DOM and CSS performance

Changing the appearance of the page happens in four distinct phases:

DOM manipulation via javascript
Style recalculation
Layout
Painting

This post is about steps 2 to 4.

A good in-depth explanation of the browser's rendering model is given in Tali Garsiel and Paul Irish's classic 'How Browsers Work: Behind the scenes of modern web browsers'.

Browsers are continuously improving the performance of each of these phases.

Mobile browser performance, some simple figures

At the time of writing the performance of Mobile Safari on an iPhone 5 running iOS 6 performs at 25% to 50% of desktop Chrome in DOM manipulation, style recalculation, layout and painting (in our app and when testing with dromaeo's DOM tests.

DOM and CSS performance is thus closer to desktop than pure javascript performance, which is more like 10%.

What does snappy mean?

Jakob Nielsen famously wrote of the three main time limits:

0.1 second is about the limit for having the user feel that the system is reacting instantaneously, meaning that no special feedback is necessary except to display the result.

1.0 second is about the limit for the user's flow of thought to stay uninterrupted, even though the user will notice the delay. Normally, no special feedback is necessary during delays of more than 0.1 but less than 1.0 second, but the user does lose the feeling of operating directly on the data.

10 seconds is about the limit for keeping the user's attention focused on the dialogue. For longer delays, users will want to perform other tasks while waiting for the computer to finish, so they should be given feedback indicating when the computer expects to be done. Feedback during the delay is especially important if the response time is likely to be highly variable, since users will then not know what to expect.

To this we can add that animations and direct manipulation should have a refresh rate of 60 or 30 Hz (fps).

In general at Brightside we think in three categories of response times (in addition to initial page load, which I'm not going to discuss here):

view switching: < 300ms (to be hidden with animation, smooth animation)
feedback on clicks < 50ms
animation/touch manipulation: < 30ms (< 16ms preferred)

Here's an example Brightside application showing all three: click feedback, view switching and programmatic scroll in response to touch:

(The recording is from a slowed-down simulator, since there doesn't seem to be away to record touches from actual devices).

Click feedback, and View switching part 1

We started by the typical simple approach: render the requested list of items. Before optimizations, we were easily hitting 1--2s switching times with 50 rows of 3 columns. With optimization we were able to go from that 1--2s to 250m, with immediate feedback and smooth animation to hide the remaining latency.

The single most important optimization for getting immediate feedback, smooth animation and reasonable render times is to get rid of layout trashing in general making sure you only render those intermediate states you want to be visible.

Layout trashing

Layout trashing, in it's simplest form, is caused by javascript code that repeatedly dirties the layout tree (by e.g., creating nodes or changing their sizes) and queries the layout (by e.g., asking nodes for their size or scroll position).

For a longer explanation, see Arthur Evans and Tim Statler's 'Chrome DevTools Revolutions 2013'.

Simple example:

for (var i = 0; i < 500; i++) {
container.append(to_append.clone());
w = container.height();
}

Takes 10 times as long as

for (var i = 0; i < 500; i++) {
container.append(to_append.clone());
}
w = container.height();

(You can run this at http://jsperf.com/layout-trashing-example).

Put this way it seems like the fix is 'to not do that then'. In real life layout trashing is typically caused by combining independent components that are interested in layout, which may all be structured correctly by themselves but not in combination.

A slighly less obvious form of layout trashing happens when you manipulate the DOM tree asynchronously several times and the browser calculates the layout several times. A slighly less expensive for is 'style trashing' where manipulating the DOM and querying CSS properties are done repeatedly - style recalculation is typically only about 10% of the cost of layout.

Controlling layouts at Brightside

Angular's animation was composed of independent 'leave' and 'enter' animations, which were started and run asynchronously. We added a consolidated timeout function which coalesced the DOM manipulations and reduced the number of layouts.
We split the view switching into two parts, separated by a timeout: the first part gives visual feedback and the second part creates the new DOM. This way the feedback could be shown before the heavy DOM manipulation.
We animate using '-webkit-transform: translate3d(...)'. Those animations happen on the GPU and run smoothly even if we are causing layouts (by rendering more of the new view content).
We use window.innerWidth instead of $(elem).width() in calculating the animated positions. This hardcodes some assumptions about our styling but can be run without needing to layout the new DOM.

Tools for diagnosing layout trashing

Chrome DevTools's Timeline shows where layout trashing occurs (it's called 'Forced synchronous layout'). You can also emulate the iOS user agents and screen sizes to make it easier to see the same results in Chrome as on an iOS device.

Safari doesn't (at least yet) show synchronous layouts in it's Timeline.

We've written a tool that repeatedly navigates to a page under test on an iPhone/iPad, gathers the timeline data and can both tell you if it contains synchronous layouts and export the data to a format that can be loaded into Chrome's Timeline. The code (very rough, to be used as an example) can be found on https://github.com/brightside/dom-css-perf/tree/master/perf-tools. It uses the marvellous ios-webkit-debug-proxy from wrightt@google.com.

Optimizing your rendering

Although layout trashing/layout scheduling tends to be the biggest obstacle to snappiness, you may want to also optimize your CSS, javascript and DOM.

Making tweaks to dynamic and asynchronous manipulation of the DOM can be hard because the performance is not deterministic. Javascript's garbage collection can cause significant differences in timing from run to run. If you are trying to make incremental improvements to your rendering, it's easy to get false positives or negatives if you just keep staring at the DevTools Timeline.

The tools mentioned above (https://github.com/brightside/dom-css-perf/tree/master/perf-tools) can help by automatically running the same navigation sequence several times and telling you the average and standard deviation of the runs. With that approach you can see if you are making significant changes to your rendering or not.

View switching part 2: solving the rendering performance for good

Although we were able to get medium-sized amounts of data to render in a reasonable time, we could easily hit 10--20s when trying to render thousands of rows.

Some of the cost comes from using a complex DOM tree, with many Angular directives. We were not able to show 1000s of rows even on desktop, whereas others are happily rendering tables with 500k rows.

However, optimizing the DOM only takes you so far: 1) large DIVs and TABLEs easily make Mobile Safari run out of (GPU) memory and crash, and 2) you still have to deal with 25% of the desktop performance.

At least for us the real solution lies in rendering a smaller DOM tree. We do this by lazily rendering only visible elements, see example code on githuib.

The somewhat annoying part is that on Mobile Safari the only way to handle lazy rendering of long lists is to use programmatic scrolling (Mobile Safari's normal scrolling is hardware-accelerated and doesn't result in javascript scroll events until the end of the scroll).

You can take a look at the lazy scrolling/rendering code at https://github.com/brightside/dom-css-perf/tree/master/web (to see it in action, check out the code and load demo_plain_lazy.html in desktop Chrome).

Direct manipulation

The first example in 'What does snappy mean?' shows programmatic scrolling: direct manipulation of the scroll position. In the list each list item is absolutely positioned and then rendered at the desired location with '-webkit-transform: translate3d(...)'.

Another use for touch-based manipulation is pan-and-zoom in our charts:

Here we have several layers of SVGs inside DIVs rendered on top of each other (to support z-order of the axis vs. content vs. overlays) whose '-webkit-transform' we manipulate in response to touch (both translation and scale). This gives roughly 30 fps on iOS7 (on iOS6 we get flicker at the end when redrawing at the end of pan-and-zoom as -webkit-transform is applied asynchronously - we use non-hw-accelerated transforms on iOS6).

Key takeaways

Pay close attention layout trashing, it's too expensive for pretty much anybody
Use automated tools to measure the impact of DOM/CSS optimizations to account for nondeterminism caused by garbage collection
For data-heavy applications, you pretty much are forced to render lazily
Direct manipulation typically requires using hardware-accelerated CSS operations only

The Mobile HTML5 Rendering Profiler

I've now packaged the command line tools used to create the measurements in this blog into a desktop application. You can also run the measurements on Android (in addition to iPhone).

The Profiler will set you back 55 EUR (+ VAT), but do first download the 7-day trial and see what makes your app go fast (or slow...).

3 comments:

Maxisam said...: May I ask you what kinda simulator you used ?; 8:54 AM
Mika Raento said...: Sure.

The animations in the post have been done with the iOS Simulator that comes with XCode (and screen recording with Quicktime Player). The interaction was manual.

The measurements I talk about are done on real devices, using ios-webkit-debug-proxy.; 11:18 PM
Adiba Alam said...: It calls attention to the regions where JavaScript exceeds expectations as an online programming dialect and furthermore depicts circumstances where its utilization can really bring down the execution of a site. css; 12:25 AM

Thursday, January 30, 2014

DOM and CSS performance with Mobile Safari