Mika Raento's Tech Blog: 2014

Sunday, December 28, 2014

Post-its on the board are a symptom of the team knowing what they are supposed to do, not the cause of it

TL;DR: if people know what they are supposed to do from the text that fits on a post-it, things are going well - if they don't, the post-its aren't helping, you need more prose than that.

Recently I had the (mis?)fortune of working on a medium-sized (4 man-years) project with a very inexperienced team. I was supposed to be the coding architect supporting them but ended up as the project-manager and tech lead.

When I started in the project (4 months into project calendar time), we had a typical Agile/scrum whiteboard with a bunch of post-its.

There were two problems with that board: 1) even if the post-its moved forward, the project didn't and 2) the developers didn't know what the post-its meant.

To be a bit more concrete, we were supposed to build the back-end of a networked personal video recorder (npvr) for a company that already had one, but one that didn't support variable bandwidth (i.e., mobile). This required knowledge of the existing back-end, systems programming, devops, video transcoding, streaming video protocol and video player functionality - none of which the team had before starting the project. To move the project from the realm of unhappy to the absurd, none of the developers were even familiar with most of the chosen technology stack (Linux, python, RHEL, rpms, zabbix, REST, mpeg4).

So what I ended up doing was abandoning the post-its and moving the project management to a conventional ticket system (JIRA) with signifcant amounts of prose for each ticket - several paragraphs at least, with clear Definitions-of-Done.

If that sounds stupid to you, you have developers who actually know what the customer/client/user wants without excessive prose. Congrats. If they don't, what would you have done?

Tuesday, October 14, 2014

AngularJS 1.3 is 40% faster than 1.2 on an iPhone. Or is it 20% faster?

AngularJS 1.3 was released today!

I ran my simple list generation test on both AngularJS 1.2 and 1.3 using the Profiler.

A simple answer is: AngularJS 1.3 is taking 40% less time in javascript to manipulate the DOM than 1.2.

A different but equally simple answer is: AngularJS 1.3 is taking 20% less time to get the contents of the page to your user than 1.2.

Blurb: the Profiler now runs on Windows as well as OS-X!

Friday, October 10, 2014

Writing asynchronous nginx plugins with python and tornado for 1000 qps

TL;DR by returning X-Accel-Redirects and using multiple tornado processes you can use python as an async plugin language for nginx.

I've been using an architecture at work for building video streaming servers that I've not seen anybody else describe. I'm combining the ease of writing, reading and testing python with the performance of nginx. I've been pretty happy with the maintainability of the result. It's been taken from separate cache and origin to cache+origin and from Smooth Streaming to HLS; and I can hit 1000 qps and 8 Gbps of video with it (with pretty beefy hardware though).

See

http://linuxfiddle.net/f/5f96743c-3d62-40f7-89ef-f64460e904c1

This setup wouldn't serve 1000 qps, missing are:

You'd want to use more tornado front-end processes so that python isn't the bottleneck
If there is any cache of any kind (e.g., vfs) in the backend, you want to use content-aware backend selection
You'd cache the mysql results rather than getting them out of the db every time
In the front-end, you'd want to merge all simultaneous requests for the same URL into one
Nginx should cache the service responses

Wednesday, October 01, 2014

Experiences in using python and ffmpeg to transcode broadcast video

This is a rough guide to implementing a transcoding pipeline from broadcast video to multi-bitrate mobile-friendly video, using ffmpeg. No code (the code's not mine to give away).

I've successfully implemented a transcoding pipeline for producing multi-bitrate fragmented mp4 files from broadcast DVB input. More concretely, I'm taking in broadcast TS captures with either mpeg2 video and mp2 audio (SD, varying aspect ratio) or h264 video and aac audio, both potentially with dvbsub. From that I'm producing ISMV output with multiple bitrate h264 video at fixed 16:9 aspect ratio and multiple bitrate aac audio, with burned subtitles. The ISMV outputs are post-processed with tools/ismindex and with the hls muxer.

There are number of limitations in ffmpeg that I've had to work around:

I haven't gotten sub2video or async working without reasonably monotonous DTS. Broadcast TS streams can easily contain backward jumps in time (e.g., a cheapo source that plays mp4 files and starts each file at 0). The fix is to cut the TS into pieces at timestamp jump locations and using '-itsoffset' to rewrite the timestamps and then concatenate. I'm using the segment and concat muxers for that.
Sub2video doesn't work with negative timestamps, so I use '-itsoffset' to get positive timestamps
For HD streams, I need to scale up the sub2video results from SD to HD. Sub2video doesn't handle the HD subtitle geometries. I'm not enough of an expert to know whether that's the issue, or whether that's just the way it's supposed to work with SD subs (typical) with HD video.
For columnboxing, I use the scale, pad and setdar video filters. These work fine, but their parameters are only evaluated at start, so I need to cut the video into pieces with a single aspect ratio first and concatenate later.
Audio sync (using aresample) gets confused if the input contains errors, so I need to first re-encode audio (with '-copyts') and only after that synchronize.
The TS PIDs are not kept over the segment muxer, so I given them on the command line with '-streamid'.

I've automated the above steps with a bunch of python, using ffprobe to parse the video stream. The result manages to correctly transcode about 98% of the TS inputs that our commercial transcoding and packaging pipe fails to handle (it would probably do even better on the rest).

Hope this helps somebody trying to accomplish the same thing.

Monday, September 22, 2014

Flow - do you speak it?

There's a lot of talk about flow and interruptions in our industry. I think everybody agrees that you don't get much programming done if you don't get into the flow.

But do you?

I've been working as a coder/architect/project manager for a few months at a largish corporation. It's been quite eye-opening.

You arrive at 9:00, have the daily stand-up at 10:00, lunch at 11:45 and a meeting at 14:00. In between there's a couple of people asking you for help. And you get absolutely bugger-all done.

Today I pair-programmed from 14:00 to 17:15. No e-mail, no Skype, no meetings. That was very much an outlier. I was pretty exhausted afterwards too - I guess if I'm not tired after the day I haven't been working that hard.

Every now and then I retire into the test lab or a quiet room or work from home. 5--6 hours of actual coding beats a week in the office.

But if you are living the intended 9:00-17:00 life with those few major interruptions per day - you don't produce squat.

Saturday, September 20, 2014

People don't run the same code on different platforms, except when it's too hard not to

TL;DR - Google uses GWT and j2objc to implement Operational Transforms on all platforms - so I shouldn't expect to be able to implement them for myself.

Most of us write code that only runs on one part of our whole product. I write python code for the server and javascript for the client. Facebook famously gave up on HTML5 for the iPhone. Cross-platform UI toolkits like wxWidgets, Qt or GTK are by definition doomed to the lowest common denominator (you could, in theory, reimplement the best features of each, but that's not been happening). Several attempts have been made to run Microsoft technologies on Unix (IE for UNIX, Wine and Mono), which have had mainly marginal importance - though the jury is out on Mono.

I and some friends spent pretty much all of last year trying to fit a data-heavy app into mobile HTML5. That was one of the many mistakes we made.

As a young sysadmin I settled on perl, since it ran on DOS and Linux (and later Windows). That went pretty well, but you wouldn't really build client applications with it.

But there is a big payout waiting for those who manage radical reuse.

The biggest case is HTML and javascript, especially on the desktop. Mobile HTML works for reading a newspaper, but not for building a rich client. I've recently seen people struggle with HTML and javascript on set-top-boxes, which supposedly run Webkit but in practice tie you to a very proprietary widget environment.

I find that some of the most interesting reuse is when there is a significant technological component that's being reused.

Both iOS's and Android's success builds on reusing the operating system and lower-level libraries (OS-X/FreeBSD and Linux, respectively). Symbian tried to ignore the need to re-use lower-level code, and failed partially due to that.

Oracle's success builds on letting people run its databases and clients on all sorts of OS's.

And now we have Google and Google Docs (Drive). Docs implements the world's best collaborative editing through Operational Transforms (OT). OT are a technological innovation similar to the relational model: a reasonably simple theory, which requires super-human engineering effort to make into a working product. That effort is large enough that you don't want to do it multiple times on multiple platforms. In both Oracle's and Docs's case you also want the results on different platforms to be exactly the same - for OT in real time.

Google is writing the OT code in Java for their servers and Android, and using GWT and j2objc to convert the code to the web and iPhone. And that's atypical even for Google. GWT's not really been a success for Google or others in other areas, especially not for UI.

So Operating Systems, Databases and Operational Transforms are too hard to reimplement. What else?

(This post brought to you by me trying to come up with a reasonable solution to syncing data between mobile clients without wasting all my time on that instead of building actual applications, like Shozu did.)

Saturday, August 30, 2014

TDD vs static typing, HTML vs optimizing away undefined behaviour - which side do you stand on?

(I'm going to try and write some shorter posts while thinking through things for myself, let's see how it goes)

There is a school of thought that says you should understand the contract for any API you are programming against and only write code that stays within that contract. This is exemplified by some of the commenters on oldnewthing, by 'the right thing', by modern compilers' aggressive optimization in the face of undefined behaviour, by Design-by-contract, by static typing and by our dear linters, static checkers and valgrinds.

Then there is another school of thought that you should (just) test your code and if the tests pass it's good. This is exemplified by Linus and oldnewthing themselves, by Worse-is-better, by HTML and it's compatibility modes, by KISS and refactoring, by dynamic typing and by all our testing frameworks, red-green-refactor -tools, browserstacks and manual QA.

I think this a bit like Yegge's liberal vs. conservative - I feel conflicted about this dichotomy. I definitely want to have tests for my code and my colleagues' code, I'm writing more dynamically typed code than statically typed but I love my linting and valgrind and understanding the APIs I program to.

I think I want to prefer writing to contracts but my behaviour doesn't match.

Monday, May 19, 2014

innerHTML vs appendNode vs DocumentFragment - Optimizing bulk DOM operations for mobile

There are cases when you want to actually create and show a large chunk of new HTML from javascript. The simplest one is when you are loading a wholly new view in a single-page app. Should you worry about how to implement that?

TL;DR 1) on early-2014 mobile browsers the raw method used to create and show a couple of thousand nodes doesn't matter. 2) with more than a couple of thousand takes too long anyway. 3) the game has changed significantly in the past couple of years. 4) you must measure the right thing (create and show).

innerHTML, appendNode or DocumentFragment?

It's not easy to get good answers by Googling:

The 1st Google result (by Andrew Hedges) for 'innerhtml speed' heavily recommends using innerHTML (instead of DOM operations) on iPhone. To be fair, it's talking about iOS 2.2. It also does specifically note, that you should measure not only the javascript execution time but the time it takes to show the results.

The 1st Google result (by John Resig) for 'documentfragment' tells us that DocumentFragments give us a 2-3x performance improvement over direct appendNode(). This too is old, from 2008, and talks about desktop browsers.

For 'innerhtml vs dom speed' the 1st and 2nd results are Andrew Hedges's pages. The 3rd result is a stackoverflow discussion, where the highest-ranking answer does say that it shouldn't matter, but does also say that for large numbers of elements you should use DocumentFragments and links to John Resig's page.

If you measure just javascript, DOM methods are faster

(Link to test code, I'll explain the test setup in detail shortly.)

But when you factor in rendering time, the difference mostly disappears

Getting your HTML on front of the user requires not only the new DOM, but calculating styles, calculating layout and painting the results on-screen. (If this is unfamiliar ground, Tali Garsiel and Paul Irish wrote an excellent explanation called 'How Browsers Work'). The recalculations are not done synchronously as you manipulate the DOM, but in bulk, later, after your javascript is done.

With total rendering time, all variants come in within 10-15%. The results look similar, but aren't quite the same. On Android the absolute difference between DOM manipulation and innerHTML actually grows from 50ms to 80ms (I don't know why, maybe some calculations can be shared when reusing DOM nodes via cloneNode?), but on iPhone the absolute difference goes down from 70ms to 30ms. The Webkit on the iPhone is actually doing the style recalculation (and render tree creation) synchronously when using innerHTML (Blink recentishly changed innerHTML to use the same mechanism as appendNode). Timings below, the top result is with innerHTML and shows no Style calculation time, whereas the bottom result is with appendNode and shows 50ms spent in style calculation:

Commercial break: The Mobile HTML5 Rendering Profiler

The measurements in this blog post have been created with (and exported into a spreadsheet from) the Mobile HTML5 Rendering Profiler. The Profiler lets you test the speed of your whole DOM manipulation chain: javascript, style calculation, layout and paint.

You can measure some of the rendering time by delaying your 'finished' time to a setTimeout() as PPK pointed out in 2009.. That's definitely better than just measuring javascript execution time. The Profiler can also measure painting times, split recalculation to styles and layout; and it runs your tests several times to measure whether changes are significant or not (how much variance there is from run to run).

The Profiler will set you back 55 EUR (+ VAT), but do first download the 7-day trial and see what makes your app go fast (or slow...).

With enough nodes, there are differences

Firstly: you shouldn't be doing this. The times are in the seconds: your users aren't going to hang around for seconds. However, these results tell us some interesting things about the browser internals :-)

So with 10000 sibling nodes, innerHTML is suddenly faster on the iPhone. What gives? It's the interaction of two different properties of the WebKit used in the browser at the moment (iOS 7.1): 1) innerHTML calculates styles synchronously, appendNode lazily and 2) the lazy calculation is O(n^2) in the number of siblings. This is motivation behind the 'appendNode, nested' variant: that one cuts down on the number of sibling nodes by inserting some intermediate nodes in the tree (NB: this does not result in the same DOM tree, but may well work for you, depending on your styling and code).

On Android we finally see DocumentFragment making headway. I don't really know why.

Some conclusions

Browsers are trying to become better and better in dynamic manipulation of the content. The bugs and merges linked to above show how the internal mechanisms for the different APIs (innerHTML, appendNode, DocumentFragment) are converging, so it won't matter what you use. They also show that at any single point in time there might be issues with any of them...

I'm somewhat surprised that Chrome on the Samsung S5 is consistently about 100% slower than Mobile Safari on the iPhone 5 (not iPhone5S).

There are a lot of things these tests don't tell you anything about: the effect of more or less complex styles, how does modification work compared to creation of new nodes etc.

Test setup details

The devices used are an iPhone 5 running iOS 7.1 and a Samsung Galaxy S5 running Androd 4.4.2 and Chrome 34.

You can just go and read the code if you want to.

The test code creates a given number of items like:

<div ng-repeat="item in items" class="item row">
  <div class="name col-xs-6" ng-bind="item.name">item {{i}}</div>
  <div class="description col-xs-6" ng-bind="item.description"
    >A description of item {{i}} - Lorem ipsum dolor sit amet,
     consectetur adipisicing elit, sed do eiusmod tempor
     incididunt ut labore et dolore magna aliqua</div>
</div>

It's not using AngularJS, the attributes are there to make sure some other test comparisons are apples-to-apples. The {{i}} are replaced with actual item indices. Here are the code variants (edited for line length and brevity):

/* cloneNode */
function setHtmlCloneNodeDom() {
  var i = 1;
  var div = document.createElement("div");
  div.setAttribute("ng-repeat", "item in items");
  div.setAttribute("class", "item row");
  div.innerHTML = '' 
    + '  <div class="name col-xs-6" ng-bind="item.name">
    +     '</div>'
    + '  <div class="description col-xs-6" ' + 
    +     'ng-bind="item.description">A description of item '
    +     i + ' - ' + longtext + '</div>'
  ;
  var item = div.getElementsByTagName("div")[0];
  var description = div.getElementsByTagName("div")[1];
  for (i = 0; i < COUNT; i++) { 
    item.textContent = "item " + i;
    description.textContent = "A description of item " + i +
        " - " + longtext;
    container.appendChild(div.cloneNode(true));
  } 
}

/* innerHTML */
function setHtmlInnerHtml() {
  var html = [];
  for (var i = 0; i < COUNT; i++) {
    html.push('<div ng-repeat="item in items" class="item row">');
    html.push('  <div class="name col-xs-6" ng-bind="item.name">');
    html.push('item ');
    html.push(i.toString()); html.push('</div>');
    html.push('  <div class="description col-xs-6" ');
    html.push('ng-bind="item.description">A description of item ');
    html.push(i.toString()); html.push(' - ');
    html.push(longtext); html.push('</div>');
    html.push('</div>');
  }
  container.innerHTML = html.join("");
}

/* DocumentFragment */
function setHtmlCloneNodeFragment() {
  var fragment = document.createDocumentFragment();
  /* like setHtmlCloneNodeDom */
  for (i = 0; i < COUNT; i++) {
    /* like setHtmlCloneNodeDom */
    fragment.appendChild(div.cloneNode(true));
  }
  container.appendChild(fragment);
}

/* appendNode, nested */
function setHtmlCloneNodeTree() {
  /* like setHtmlCloneNodeDom */
  var each = COUNT / 50;
  var tree_templ = document.createElement("div");
  for (i = 0; i < COUNT; i++) {
    if (i % each === 0) {
      var tree = tree_templ.cloneNode();
      container.appendChild(tree);
    }
    /* like setHtmlCloneNodeDom */
    tree.appendChild(div.cloneNode(true));
  }
}

Tests were run with the Mobile HTML5 Profiler. Full data and graphs available as a Google Spreadsheet (exported via 'Copy TSV' from the Profiler).

Sunday, May 18, 2014

Rerunning Protractor tests when files change and reusing the same browser instance

Running ng-scenario tests with karma had the nice feature of keeping browsers open and rerunning the tests when files change. I at least wrote my end-to-end tests very incrementally, having one test enabled, adding a pause() at the end and writing test code as I checked out what the browser looked like. So I was a bit annoyed with Protractor's explicit lack of a permanent runner. This hit me especially when running against node-webkit, which takes about 8s to start on my machine (vs. 1s for regular Chrome).

So I wrote a hacky script that patches enough of protractor to enable running it multiple times (you can see it on github too).


// keep_running.js
//
// _Example_ for using a single browser for rerunning protractor
// tests when files change (inspired by ng-scenario+karma's normal behaviour)
//
// By Mika Raento, Karhea Oy
// Put in the public domain, do whatever you wish with it
//
// Restrictions of the example:
// - only chromedriver
// - only jasmine
// - tests are assumed to live under acceptance/
// - watches current directory
// - simplistic assumption about process.kill
// - monkey-patches protractor, jasmine and node internals, will
//   easily break with new versions of any
// - there's a reason protractor doesn't support this: it's much easier to
//   get deterministic tests if you restart with a fresh browser
//
// node-requirements: protractor q minijasminenode chokidar
//
// run with:
// $ node keep_running.js   <protractor options go here>
//
var q = require('q');

// Monkey-patch the chrome provider to reuse the same browser instance
chrome = Object.getPrototypeOf(
  require('protractor/lib/driverProviders/chrome.dp.js')());
var orig_getDriver = chrome.getDriver;
var driver;
chrome.getDriver = function() {
  if (driver) {
    this.driver_ = driver;
  } else {
    driver = orig_getDriver.apply(this);
  }
  return driver;
};
chrome.teardownEnv = function() {
  // could we do something here to clean at least some things up?
  return q.fcall(function() {});
};
chrome.setupEnv = function() {
  return q.fcall(function() {});
};

var maybe_run;
var should_run = true;
process.exit = function() {
  is_running = false;
  maybe_run();
};
var is_running = false;
var pending;

require('minijasminenode');
maybe_run = function() {
  if (is_running) return;
  if (!should_run) {
    if (pending) return;
    pending = setTimeout(function() {
      pending = null;
      maybe_run();
    }, 300);
    return;
  }
  is_running = true;
  should_run = false;
  // protractor and (mini)jasmine use node's require() for side effects,
  // reset the modules here
  delete require.cache[require.resolve('protractor/lib/cli.js')];
  delete require.cache[require.resolve('protractor/lib/runner.js')];
  for (var k in require.cache) {
    // especially our test specs
    if (k.indexOf("/acceptance/") > 0) {
      // TODO: parse the command line and configuration file to
      // see what are out specs
      delete require.cache[k];
    }
  }
  // Don't reuse the same jasmine environment for new tests
  jasmine.currentEnv_ = undefined;

  // Kick the run off
  // Here we could call the underlying protractor/lib/launcher.js instead
  // and give it a list of changed files
  require('protractor/lib/cli.js');
};

require('chokidar').watch('.').on('all', function() {
  should_run = true;
});

process.on('exit', function() { if (driver) driver.quit(); });

maybe_run();

Saturday, April 26, 2014

Why is AngularJS slower with jQuery?

(All data in this blog post is based on profiling on an iPhone5 with iOS 7.1)

TL;DR: AngularJS's JQLite has significantly faster .data() and .text()

In jQuery makes AngularJS 50% slower on an iPhone I showed the performance difference between AngularJS with jQuery and AngularJS without jQuery. That 50% was with AngularJS 1.2 and jQuery 1.11 - with AngularJS 1.3 (beta6) and jQuery 2.1 the difference was 35%.

I did a bit of testing to see if there was a simple reason for the slowdown, and turns out that there is. With two changes I got the difference down to 8% (created using The Mobile HTML5 Rendering Profiler; click for a large view):

The changes were to comment out most uses of .data() from AngularJS and changing ng-bind to use a simple inlined code to change the element text instead of jQuery's .text(). Note that these are not 'fixes' that can be applied to AngularJS - they don't preserve all the functionality. What they are is a way to see what is causing the slowdown. Here's the diff:

--- angular-1.3.0-beta.6.js 2014-04-26 16:02:35.000000000 +0300
+++ angular-1.3.0-beta.6-hacked.js 2014-04-26 15:57:23.000000000 +0300
@@ -6007,7 +6007,7 @@
           : $compileNodes;
 
         forEach(transcludeControllers, function(instance, name) {
-          $linkNode.data('$' + name + 'Controller', instance);
+          //$linkNode.data('$' + name + 'Controller', instance);
         });
 
         // Attach scope only to non-text nodes.
@@ -6015,7 +6015,7 @@
           var node = $linkNode[i],
               nodeType = node.nodeType;
           if (nodeType === 1 /* element */ || nodeType === 9 /* document */) {
-            $linkNode.eq(i).data('$scope', scope);
+            // $linkNode.eq(i).data('$scope', scope);
           }
         }
 
@@ -6105,7 +6105,7 @@
           if (nodeLinkFn) {
             if (nodeLinkFn.scope) {
               childScope = scope.$new();
-              $node.data('$scope', childScope);
+              // $node.data('$scope', childScope);
             } else {
               childScope = scope;
             }
@@ -6572,9 +6572,9 @@
           isolateScope = scope.$new(true);
 
           if (templateDirective && (templateDirective === newIsolateScopeDirective.$$originalDirective)) {
-            $linkNode.data('$isolateScope', isolateScope) ;
+            // $linkNode.data('$isolateScope', isolateScope) ;
           } else {
-            $linkNode.data('$isolateScopeNoTemplate', isolateScope);
+            // $linkNode.data('$isolateScopeNoTemplate', isolateScope);
           }
 
 
@@ -6677,7 +6677,7 @@
             // later, once we have the actual element.
             elementControllers[directive.name] = controllerInstance;
             if (!hasElementTranscludeDirective) {
-              $element.data('$' + directive.name + 'Controller', controllerInstance);
+              // $element.data('$' + directive.name + 'Controller', controllerInstance);
             }
 
             if (directive.controllerAs) {
@@ -18914,12 +18914,13 @@
    </example>
  */
 var ngBindDirective = ngDirective(function(scope, element, attr) {
-  element.addClass('ng-binding').data('$binding', attr.ngBind);
+  element.addClass('ng-binding'); // .data('$binding', attr.ngBind);
   scope.$watch(attr.ngBind, function ngBindWatchAction(value) {
     // We are purposefully using == here rather than === because we want to
     // catch when value is "null or undefined"
     // jshint -W041
-    element.text(value == undefined ? '' : value);
+    element[0].textContent = (value == undefined ? '' : value);
+    // element.text(value == undefined ? '' : value);
   });
 });

The test is comparing The version without jQuery to the version that loads jQuery before AngularJS; both with the above changes to use of .text() and .data(). Commenting out .data() took the difference from 35% to 20%, and changing .text() from 20% to 8%.

This simple test only really uses ng-repeat and ng-bind. Although these are important parts of AngularJS, a typical application will quite likely use ng-class, ng-if, ng-switch, ng-show etc. These might also show similar slowdowns with jQuery.

It would be interesting to try and optimize these two methods in jQuery, implementing them along the lines of what AngularJS does. I'm not really familiar enough with jQuery to know if there are some real semantic differences in how they are supposed to work (in comparison to AngularJS).

Oh, and I did first think there might be a difference in Garbage Collection, but that was clearly not the case (this is a Timeline image from running with jQuery, there is no GC drop in the memory usage during the rendering, only on the page reload):

Note that the Profiler lets you see Timelines from the iPhone in the (better) Chrome Developer Tools.

Friday, April 25, 2014

On Android, jQuery makes AngularJS (only) 20% slower

The previous blog post showed that adding jQuery to AngularJS made it 50% slower on an iPhone. I ran the same test on the now-released! Android version of the Profiler (topmost result comes from loading jQuery before AngularJS and bottom from not loading it - the loading time is not part of the times shown; click for larger view):

On Android the slowdown is only 20%, which is nice. What's not so nice, and a bit puzzling, is that Android 4.4 / Chrome 34 / Samsung S5 is over 100% slower than iPhone5/iOS7:

I have to admit that I haven't looked at other performance comparisons between iPhone5 and the S5.

The Mobile Safari Rendering Profiler

To delight users, your mobile app needs to feel snappy and respond to gestures immediately. Doing that with the mobile web can be tough. The Mobile Safari Rendering Profiler helps you buy measuring whether your changes are helping. It runs your test setup several times so that you can see distinguish real changes from noise caused by timing, javascript optimization and garbage collection.

Test setup details

The iPhone test was run on an iPhone5 running iOS 7.1 (and Mobile Safari). The Android device is a Samsung Galaxy S5, running Android 4.4.2 and Chrome 34. The test runs after the page has been loaded and initial contents rendered, it then measures the time to generate 2000 items using ng-repeat. The test was run with AngularJS 1.2.16 and jQuery 1.11.

I did also run the comparison on desktop Chrome 34. There the difference between plain AngularJS and AngularJS-with-jQuery is only 6%.

Thursday, April 24, 2014

jQuery makes AngularJS 50% slower on an iPhone

AngularJS can optionally use jQuery instead of its built-in jqLite DOM library. This is useful if you want to integrate jQuery-based components into your app. From previous testing I knew jQuery was somewhat slower, but I have to admit I was surprised when I found out how much slower it was.

The test is a very simple AngularJS app with a button that creates 2000 items rendered with ng-repeat. Below is a screenshot of profiling this with my Mobile HTML5 Rendering Profiler - the version with jQuery 1.11 spends over twice as much time in Javascript and a bit more time in Style calculation (click for larger view):

The Mobile HTML5 Rendering Profiler

To delight users, your mobile app needs to feel snappy and respond to gestures immediately. Doing that with the mobile web can be tough. The Profiler helps you buy measuring whether your changes are helping. It runs your test setup several times so that you can see distinguish real changes from noise caused by timing, javascript optimization and garbage collection.

Updating the visible UI of your web app requires changing the DOM (with javascript), recalculating the styles and layout and painting the results on-screen. The Profiler will show you the time taken by each phase, as well as how many recalculations are needed for the end results. These will guide you in making changes that improve the visible performance.

The Profiler will set you back 55 EUR (+ VAT), but do first download the 7-day trial and see what makes your app go fast (or slow...).

Test setup details

This uses angular 1.3.0-beta6 and jQuery 1.11.0, running on an iPhone5 with iOS 7.1.1. The profiling is done for the button press, so after the javascript has been fully loaded (but probably before significant optimizations of the javascript). The pages under test are plain AngularJS and with jQuery - the only difference is in loading or not loading jQuery (before Angular).

jQuery 2.1: better

jQuery 2.1 improves things a bit, but is still 35% slower:

AngularJS 1.2: the same story

(Update at 2014-04-24 21:17 EEST)

@m_gol on Twitter rightly pointed out that AngularJS 1.3 and jQuery 1.11 isn't the most relevant combination as AngularJS 1.3 has dropped IE8 support whereas jQuery 1.11 hasn't (but jQuery 2.1 has). Now that isn't the whole story, as the Angular folk are saying that 1.3 might still work with IE8, but it's a good point.

Here's then the same test but using AngularJS 1.2 (1.2.16) instead of 1.3. Now jQuery is 56% slower:

Android: not as bad, and yet, worse.

(Update on 2014-04-25) See the next blog post.

Why is AngularJS slower with jQuery?

(Update on 2014-04-26) See the followup.

Wednesday, February 19, 2014

Some numbers on Mobile Safari rendering performance

Update 2014-04-24: I've created a very-simple-to-use OS-X app that helps you gather rendering numbers like these from iDevices. See the Mobile Safari Rendering Profiler.

In the last post I presented an overview of measuring and improving response times with Mobile Safari. I alluded to some actual performance numbers - here you have one set, taken from a heavy view switch with AngularJS.

Remember: changing the browser UI happens in 5 steps: DOM manipulation (javascript, HTML), Style calculation, Layout and Paint. Only after all of them have happened will the user see the new thing, so it's their combination that you have to keep an eye on.

Total time to render

Click on the image to see more of the data

So an iPhone5 can render the view at roughly 3 times the time it takes latest Chrome on my mid-2010 mbp. Which isn't actually that bad - it can easily be possible to make a site that works well on desktop perform reasonably on a recent iPhone. Older iPhones are clearly a different matter.

All of the numbers I present are the average of 5 runs, with 2 burn-in runs before that. The standard deviation is was between 10 and 20% - you are well advised to get enough data if trying for micro-optimizations. There is no layout trashing in the test render sequence.

DOM manipulation

Click on the image to see more of the data

Somewhat surprisingly, the actual DOM manipulation (as done via AngularJS) is only twice as slow on the iPhone5. This isn't just a javascript benchmark, as it includes javascript, HTML parsing and DOM API calls. AngularJS is reasonably efficient in the way it generates the DOM, though there's quite a bit of javascript overhead for the two-way data binding. Again, the older phones are much worse.

Style (re)calculation

Click on the image to see more of the data

Here we see another surprise: Desktop Chrome is 10 times faster than Mobile Safari on iPhone5. I think there's been an algorithmic improvement in Chrome compared to the now somewhat old version of Webkit in iOS6. There are no synchronous style calculations in the traces.

Layout

Click on the image to see more of the data

Roughly as expected, iPhone layout takes 2.5 times as long as Desktop Chrome's.

Paint

I'm not going to bother showing the numbers. The paint times where only about 5% of the totals. The phone typically paints a lot fewer pixels so it performs reasonably well. I think that you'll only have to worry about paint performance in animations or if you are stacking a lot of hw-accelerated layers and changing their contents so that you have to paint all those textures.

Some parting thoughts

Mobile Safari can perform surprisingly well compared to a slightly older laptop, running at 1/3rd of the speed. Although javascript performance is important, even with AngularJS we are spending about half of the time in going from the new DOM to having something new on the screen.

Thursday, January 30, 2014

DOM and CSS performance with Mobile Safari

This blog post is the handout for my talk at HelsinkiJS. It's a bit long for a blog post and not very polished... I may get time to polish it later.

Problem setting

We at Brightside are building fairly data-heavy (supporting sheets that have several thousand rows by tens of columns) mobile web apps. We are especially not building web sites or HTML5 games.

We want our applications to feel snappy to the user.

We generate and manipulate the DOM with AngularJS and d3.js (as opposed to, say, jQuery). Both of these have specific performance implications and goals. AngularJS is expressedly meant to render one screenful of DOM at a time. D3 is extremely performant in generating DOM but the enter-update-exit set style doesn't lend itself to manipulating only some properties, and animating a large number of DOM nodes separately is not feasible. AngularJS doesn't query the DOM like you would probably do with jQuery - all the DOM manipulations are already tied to the right DOM nodes.

Basics of DOM and CSS performance

Changing the appearance of the page happens in four distinct phases:

DOM manipulation via javascript
Style recalculation
Layout
Painting

This post is about steps 2 to 4.

A good in-depth explanation of the browser's rendering model is given in Tali Garsiel and Paul Irish's classic 'How Browsers Work: Behind the scenes of modern web browsers'.

Browsers are continuously improving the performance of each of these phases.

Mobile browser performance, some simple figures

At the time of writing the performance of Mobile Safari on an iPhone 5 running iOS 6 performs at 25% to 50% of desktop Chrome in DOM manipulation, style recalculation, layout and painting (in our app and when testing with dromaeo's DOM tests.

DOM and CSS performance is thus closer to desktop than pure javascript performance, which is more like 10%.

What does snappy mean?

Jakob Nielsen famously wrote of the three main time limits:

0.1 second is about the limit for having the user feel that the system is reacting instantaneously, meaning that no special feedback is necessary except to display the result.

1.0 second is about the limit for the user's flow of thought to stay uninterrupted, even though the user will notice the delay. Normally, no special feedback is necessary during delays of more than 0.1 but less than 1.0 second, but the user does lose the feeling of operating directly on the data.

10 seconds is about the limit for keeping the user's attention focused on the dialogue. For longer delays, users will want to perform other tasks while waiting for the computer to finish, so they should be given feedback indicating when the computer expects to be done. Feedback during the delay is especially important if the response time is likely to be highly variable, since users will then not know what to expect.

To this we can add that animations and direct manipulation should have a refresh rate of 60 or 30 Hz (fps).

In general at Brightside we think in three categories of response times (in addition to initial page load, which I'm not going to discuss here):

view switching: < 300ms (to be hidden with animation, smooth animation)
feedback on clicks < 50ms
animation/touch manipulation: < 30ms (< 16ms preferred)

Here's an example Brightside application showing all three: click feedback, view switching and programmatic scroll in response to touch:

(The recording is from a slowed-down simulator, since there doesn't seem to be away to record touches from actual devices).

Click feedback, and View switching part 1

We started by the typical simple approach: render the requested list of items. Before optimizations, we were easily hitting 1--2s switching times with 50 rows of 3 columns. With optimization we were able to go from that 1--2s to 250m, with immediate feedback and smooth animation to hide the remaining latency.

The single most important optimization for getting immediate feedback, smooth animation and reasonable render times is to get rid of layout trashing in general making sure you only render those intermediate states you want to be visible.

Layout trashing

Layout trashing, in it's simplest form, is caused by javascript code that repeatedly dirties the layout tree (by e.g., creating nodes or changing their sizes) and queries the layout (by e.g., asking nodes for their size or scroll position).

For a longer explanation, see Arthur Evans and Tim Statler's 'Chrome DevTools Revolutions 2013'.

Simple example:

for (var i = 0; i < 500; i++) {
container.append(to_append.clone());
w = container.height();
}

Takes 10 times as long as

for (var i = 0; i < 500; i++) {
container.append(to_append.clone());
}
w = container.height();

(You can run this at http://jsperf.com/layout-trashing-example).

Put this way it seems like the fix is 'to not do that then'. In real life layout trashing is typically caused by combining independent components that are interested in layout, which may all be structured correctly by themselves but not in combination.

A slighly less obvious form of layout trashing happens when you manipulate the DOM tree asynchronously several times and the browser calculates the layout several times. A slighly less expensive for is 'style trashing' where manipulating the DOM and querying CSS properties are done repeatedly - style recalculation is typically only about 10% of the cost of layout.

Controlling layouts at Brightside

Angular's animation was composed of independent 'leave' and 'enter' animations, which were started and run asynchronously. We added a consolidated timeout function which coalesced the DOM manipulations and reduced the number of layouts.
We split the view switching into two parts, separated by a timeout: the first part gives visual feedback and the second part creates the new DOM. This way the feedback could be shown before the heavy DOM manipulation.
We animate using '-webkit-transform: translate3d(...)'. Those animations happen on the GPU and run smoothly even if we are causing layouts (by rendering more of the new view content).
We use window.innerWidth instead of $(elem).width() in calculating the animated positions. This hardcodes some assumptions about our styling but can be run without needing to layout the new DOM.

Tools for diagnosing layout trashing

Chrome DevTools's Timeline shows where layout trashing occurs (it's called 'Forced synchronous layout'). You can also emulate the iOS user agents and screen sizes to make it easier to see the same results in Chrome as on an iOS device.

Safari doesn't (at least yet) show synchronous layouts in it's Timeline.

We've written a tool that repeatedly navigates to a page under test on an iPhone/iPad, gathers the timeline data and can both tell you if it contains synchronous layouts and export the data to a format that can be loaded into Chrome's Timeline. The code (very rough, to be used as an example) can be found on https://github.com/brightside/dom-css-perf/tree/master/perf-tools. It uses the marvellous ios-webkit-debug-proxy from wrightt@google.com.

Optimizing your rendering

Although layout trashing/layout scheduling tends to be the biggest obstacle to snappiness, you may want to also optimize your CSS, javascript and DOM.

Making tweaks to dynamic and asynchronous manipulation of the DOM can be hard because the performance is not deterministic. Javascript's garbage collection can cause significant differences in timing from run to run. If you are trying to make incremental improvements to your rendering, it's easy to get false positives or negatives if you just keep staring at the DevTools Timeline.

The tools mentioned above (https://github.com/brightside/dom-css-perf/tree/master/perf-tools) can help by automatically running the same navigation sequence several times and telling you the average and standard deviation of the runs. With that approach you can see if you are making significant changes to your rendering or not.

View switching part 2: solving the rendering performance for good

Although we were able to get medium-sized amounts of data to render in a reasonable time, we could easily hit 10--20s when trying to render thousands of rows.

Some of the cost comes from using a complex DOM tree, with many Angular directives. We were not able to show 1000s of rows even on desktop, whereas others are happily rendering tables with 500k rows.

However, optimizing the DOM only takes you so far: 1) large DIVs and TABLEs easily make Mobile Safari run out of (GPU) memory and crash, and 2) you still have to deal with 25% of the desktop performance.

At least for us the real solution lies in rendering a smaller DOM tree. We do this by lazily rendering only visible elements, see example code on githuib.

The somewhat annoying part is that on Mobile Safari the only way to handle lazy rendering of long lists is to use programmatic scrolling (Mobile Safari's normal scrolling is hardware-accelerated and doesn't result in javascript scroll events until the end of the scroll).

You can take a look at the lazy scrolling/rendering code at https://github.com/brightside/dom-css-perf/tree/master/web (to see it in action, check out the code and load demo_plain_lazy.html in desktop Chrome).

Direct manipulation

The first example in 'What does snappy mean?' shows programmatic scrolling: direct manipulation of the scroll position. In the list each list item is absolutely positioned and then rendered at the desired location with '-webkit-transform: translate3d(...)'.

Another use for touch-based manipulation is pan-and-zoom in our charts:

Here we have several layers of SVGs inside DIVs rendered on top of each other (to support z-order of the axis vs. content vs. overlays) whose '-webkit-transform' we manipulate in response to touch (both translation and scale). This gives roughly 30 fps on iOS7 (on iOS6 we get flicker at the end when redrawing at the end of pan-and-zoom as -webkit-transform is applied asynchronously - we use non-hw-accelerated transforms on iOS6).

Key takeaways

Pay close attention layout trashing, it's too expensive for pretty much anybody
Use automated tools to measure the impact of DOM/CSS optimizations to account for nondeterminism caused by garbage collection
For data-heavy applications, you pretty much are forced to render lazily
Direct manipulation typically requires using hardware-accelerated CSS operations only

The Mobile HTML5 Rendering Profiler

I've now packaged the command line tools used to create the measurements in this blog into a desktop application. You can also run the measurements on Android (in addition to iPhone).

The Profiler will set you back 55 EUR (+ VAT), but do first download the 7-day trial and see what makes your app go fast (or slow...).

Tuesday, January 14, 2014

Struggles and successes with ios-driver

or how to get scrolling, the inspector and different SDK versions to work with ios-driver

I've been writing some code for Mobile Safari that is sensitive to scrolling, viewport (scaling) and browser chrome visibility (basically I want to show a as-fullscreen-as-possible popup that uses gestures). It ended up being a lot more code than anticipated (hah, what else is new) so I really wanted some tests for it.

My requirements for the testing environment are thus: runs Mobile Safari (not just UIWebView), can simulate touch gestures and native scroll, can test on iPhone, iPad; iOS6.1 and iOS7 with different orientations and resolutions - note that programmatic scrolling through the browser javascript doesn't hide the browser chrome on iOS7. The only thing that I found that can support these is ios-driver (a Selenium driver for iOS simulator and devices).

Getting everything working together turned out to be somewhat involved, somewhat under-documented and somewhat buggy:

Note: all code examples are in python. We at Brightside are a python shop.

The combination of versions I got to work was XCode 5.0.2/4.6.3 on OS-X 10.9.1 using the 'refactor' branch of ios-driver and maven from macports.

Ios-driver is actually not just a browser-driver: it can drive the browser through WebKit remote debugging and the native app through UI Automation. You must switch the driver between the two modes for things to work. This is done through driver.switch_to_window("Native") and driver.switch_to_window("Web"). E.g.,:

UI Automation (touch, native widgets) uses Native mode
URL navigation uses Web mode
You must switch the driver to Native mode before trying to access the Inspector
You can run javascript code in either, but in Native mode you are talking to the UI Automation javascript environment and in Web mode to the browser

Note that since UI Automation needs to be enabled for the app you are driving, ios-driver can't run against Mobile Safari on a (non-jailbroken) device. It automatically modifies the bundle on the simulator.

Scroll gestures are in theory really simple: driver.execute_script("UIATarget.localTarget().dragFromToForDuration({ x: 10, y: 250 }, { x: 10, y: 50 }, 0.5);"). However, they don't work on iOS7 (Or at least I couldn't get them to work, and neither have other people, see for example the workaround in Subliminal). They work great with iOS6 though.

There are a couple of ways to scroll on iOS7. You can scroll the scrollview with something like driver.execute_script("UIATarget.localTarget().frontMostApp().mainWindow().scrollViews()[0].scrollViews()[0].scrollDown()"), but this (in my experience) scrolls all the way to the bottom. You can also scroll to specific native elements with e = driver.find_element_by_name("your div's text goes here") and driver.execute_script("arguments[0].scrollToVisible();", e) (this I couldn't get to work with iOS6, it gave 'stale element' errors).

Switching orientation is really easy: driver.execute_script("UIATarget.localTarget().setDeviceOrientation(UIA_DEVICE_ORIENTATION_LANDSCAPELEFT);").

The ios-driver documentation talks (amongst the TODOs) about being able to use different SDK/simulator versions based on the desired capabilities the client asks for. I didn't get any of this to work (the best I managed was to get 6.1 Safari installed on the 7.0 simulator, which does not work). What does work is installing XCode 4.6 side-by-side with 5.0 and running DEVELOPER_DIR=/Applications/Xcode4.6.app/Contents/Developer/ java -jar ios-server-0.6.5-jar-with-dependencies.jar -simulators. Sadly Instruments 4.6 will prompt you for access every time on OS-X 10.9.

Ios-driver doesn't support getting the browser console logs as such. I wrote a shim that replaces window.console and stores the logged messages in an array. I then use messages = driver.execute_script("return window.bside_get_messages ? window.bside_get_messages() : [];") to get them to the client.

Thanks to the ios-driver developers! I got everything I wanted working in the end. Hope these notes prove useful to the next person trying to do the same.