I had the pleasure of speaking at BrazilJS 2013. My talk was about performance. While I was using Windows, IE11 and tools on Windows, the perf advice and the way that you need to thing about performance is completely agnostic. Chrome, Safari, Firefox and all of the browsers adhere to the same spec as we do and so are dictated to many things very similarly.
This is the third blog post in the series about BrazilJS. The first was about the the experience in Brazil called Obrigado BrazilJS. The second was A little about BrazilJS. And this one is about the talk itself.
The slide deck can be found on SlideShare.
The talk is my take on a talk that Tobin Titus did at //build this year.
I started off with an introduction of me. Feel free to drop me emails, my email address is in the deck. Next I talked about some of my experiences so far in Porto Alegre which you can find out about more of in posts 1 and 2.
The talk itself actually started on slide 12.
For this talk, we’re going to start off with a little exercise to test the audience on how much they know about performance. The theory is that if they’ve done web development, they’ve at least had to think about it a little bit.
For full disclosure, this is an example from Tobin’s talk that he borrowed from one of the other guys on the engineering team (I think Israel but I’m not sure) but it illustrates the point very crisply.
I had to fly in for the conference. It’s a long flight…
When I started shipping for this flight, I looked at some of the top web sites for booking flights for price comparisons. This included Priceline, Kayak, Travelocity, Expedia and Orbitz. All of these sites seem to have similar functionality. They each of logos in the upper left hand corner, banner ads, input fields for where I’m going to, where I’m coming from and so on. As such, they should have similar performance characteristics, right? Well, I would have thought so to but the reality is that these sites have fairly different performance. One of them loads very quickly while the the slowest takes several seconds.
Why is that?
Starting with the size of the download, you can see that there’s a pretty big difference between these different sites ranging from 1,061k to 3,697k. This is inclusive of the images, script files, CSS, markup and everything else that was downloaded. One of the things that I thought was fascinating is that #3 has the smallest download size but the most images (more on that in a moment).
The next thing is the number of DOM elements that are created by the page. This is an indication at least of how complicated the page is for the browser to parse and manipulate. And again, where there is a big difference between 900 elements on the low end and 4,252 elements on the high end.
Another indication of how complicated the page can be is the number of CSS rules. And again, there’s a massive difference ranging from 1,105 all the way up to 5,352. One of the things that I thought was fascinating here is that #2 with the most CSS rules was one of the lowest in terms of the number of elements created telling me that there were a fair number of those roles that were probably not being used.
The next thing that we need to look at is the number of image files that are downloaded. Some of these are large images, some of them are just little things like social media tags. As we can see, there’s a huge difference between the 6 on the low end and the 66 on the high end but as stated before, the number of images doesn’t necessarily correlate to the download size of the page in general.
So all of this brings us back to the question that started all of this. (And don’t read ahead if you can help it…)
How much do you know about performance?
If you analyze the numbers shown so far, can you predict which one is the fastest?
Which site is the slowest?
#2 is the slowest. This is surprising to a lot of folks because they see the size of #1 with the largest number of lines of script. However, it’s got a lot less CSS rules and image files and the libraries that it uses, jQuery and YUI, seem to be fairly well optimized for performance. #2, while not the largest in anything except for CSS rules, is towards the top in every category while #5 is towards the bottom in a lot of the categories.
So what does make a web site fast? All of the things that we’ve talked about contribute to it and are the things that most web developers think about but most of the time, people are looking for a single quick fix and are not thinking across the spectrum of things that can effect performance.
Network, CPU and GPU.
Hopefully the importance of these three categories is self evident but the GPU is a new(ish) category that we need to think about. We’ll talk more about each of these categories throughout the rest of the talk.
An important issues though is that per the spec, we’re only allowed to open up 6 simultaneous connections to any given server. That doesn’t sound like a lot of connections when you look at pages such as the #3 travel site which had 66 image files or pages such as Facebook. In fact, it sounds quite crippling. It’s actually up from where the spec started which was 2 simultaneous connections. That’s not a lot of connections but that’s where we started. The reason, because there is a reason rather than the W3C just being random (not saying that they aren’t random, just not in this case) is to help a server detect a DOS or Denial of Service Attack where the client opens up as many connections as possible and takes as much of the bandwidth from the server as they possibly can. If a browser can only open up 6 connections and download 6 things at a time, it means that if there is an app trying to download 100 things at the same time, it’s probably malicious.
Everyone should be familiar with the DOM tree. If you’re not, you’re probably reading the wrong blog. The DOM tree has been the best friend and the biggest bane to most web developers out there in the world. As optimized as the DOM can get, the reality is that there’s a lot of things that depend on it so when it changes, it affects a lot of moving parts. We’ll see this as we go forward.
The next subsystem is formatting. Formatting is important because the DOM tree is completely ignorant to anything visual. All it knows is the parent child relationships between the elements and the attributes. It’s the CSS cascade that knows all of that information. In formatting this information is joined up giving the DOM elements size, color, backgrounds, font sizes and so on.
The next step, now that all of the DOM elements know what they look like, we now need to figure out how they look together. CSS is inherently a block based layout so everything, images, paragraphs, divs, spans, event shapes such as circles, are actually blocks to CSS. And HTML/CSS is inherently a flow based layout unless something overrides that. As such, the primary job of the layout engine is to put all of the blocks on the screen. This includes positioning objects based on their relative or absolute positioning, wrapping or scaling things that are too wide and all of the other things that go into that lightning round of tetris that is required.
At the end of the layout phase, we’ve got the display tree. This is an interesting data structure. Some folks ask why we don’t just use the DOM tree but it’s not a 1-1 relationship between the items in the DOM tree and the items in the display tree. This upsets some folks. But let’s think about it. There are things that are in the DOM tree that are not in the display tree such as elements that are display:none or input fields that are type=hidden. There are also things that are in the display tree that are not in the DOM tree such as the numbers on an order list ( <ol><li>asdf</li></ol> ).
At this point, we have the display tree and we’re ready to paint. Notice that I didn’t say that we’re painting. We’re ready to paint. What’s required next requires waiting on the hardware. Monitors can only paint so many times a second. Most modern day ones do that 60 times a second, thus the 60 hertz refresh rate. This means that it’s 1000/60 (milliseconds divided by the refresh rate) on most monitors between refreshes which works out to 16.67 milliseconds (roughly). That doesn’t sound like a lot of time but in reality a modern day I7 processor can run roughly 2138333333 instructions (Millions of instructions per second as noted in wikipedia on Instructions per Second/60). Between me and you, that’s a lot of instructions. It’s not infinite but it’s a lot.
Once we have those surfaces, we send them to compositing and you have bits on the screen. This is an exciting time in the life of any web developer… 🙂
This is how we think about how bits get from the WWW to your customer’s monitor.
But better than just having a mental model of the subsystems in a browser is being able to actually peek into the those subsystems with a good profiler and see what your site is actually doing at any given point in time.
We’ll spend the rest of this talk digging into two of them. The first one that we’ll look at is built right into IE11 and that’s the F12 developer tools. The second one is free with Windows Performance Analysis Toolkit and it’s called the Windows Performance Analyzer.
Both of these are built on ETW or Event Tracing for Windows. Every piece of software that Microsoft produces is heavily instrumented to give out ETW log information. All you have to do is turn it on, capture that information and then analyze it.
The great news here is that you can add in your own instrumentation as well if you want.
The F12 Developer Tools
The F12 Developer tools in IE11 are brand new and written by the same team that does Visual Studio. The DOM inspector and the console have had an overhaul but I get really excited by the debugger and other tools.
The network tab has tremendous amount of great information. It details each of the downloads as to when it started, when it ended, the protocol, the method, the response code, the MIME type, how big it was, how long it took and even what initiated the download. Lastly, it shows you a really easy to follow graph of what’s happening along what timeline.
Next is the Memory tab. This tab allows you to watch the memory usage in real time as your page is working along but more importantly, you can create snapshots of where memory is any given point and compare that to a snapshot later in the page’s life. The snapshots give you the number objects, the size of the objects and much more information that’s critical to your profiling a site.
The last tab that I’ll talk about is the UI responsiveness tab. This one is built on ETW which means that you have to record the session and then analyze it. In this, you can see it broken out into the various subsystems so you can see rendering, image decoding, scripting, garbage collection, styling (which is what it calls formatting) and so on.
This is an incredibly powerful set of profiling tools and will give you what you need in most cases.
I spend the bulk of the time in this talk going through the demo which I’ll come back and detail here shortly but it’s time I post this now and there’s a lot of good information in this post already.
What’s new with IE11 right now? There’s a ton of new stuff. First, it’s shipping with Windows 8.1 which is in RTM at the moment and due out middle of October. Second, it’s shipping for Windows 7 as well. I don’t have a date at the moment but it’ll be exciting when it comes out. The current preview doesn’t have the F12 developer tools built in but the RTM will.
There’s a new User Agent string. You should not be doing browser sniffing anyways but rather be doing feature detection but that’s for a different blog post.
WebGL rocks. Enough said.
Evergreen updates – we’ve turned on auto-update for the browser by default so going forward everyone who doesn’t opt out of them so we shouldn’t end up in the scenario where we’ve got a ton of people on IE11 when we’re shipping IE90 or whatever.
Now, you should go to http://modern.IE and check it out. We’ve got a ton of useful things there. The first thing you’ll see is a scanner that you can use to quickly scan your site for common issues with compatibility with web sites across many of the modern day browsers. For example, if you’re using a –webkit prefix for something and there’s an available –ms and –moz prefix as well, we’ll suggest that. We’ll also look for things like a responsive web design and there’s even a screen-shotting service that will show you screen shots of your page across a large variety of devices.
We’ve also got offers such as 3 months free at BrowserStack which will let you do automated testing across a lot of platforms.
There’s also free 120 day VMs with a wide variety of configurations of operating systems and browser combos for you to use in your testing. Plus there’s a Parallels discount offer and a bunch more stuff.
Last thing to talk about is our userAgents.ie community. This is a group of very bright developers from around the world that are passionate about standards based web development. They do a lot of great things including share ideas amongst themselves and with the community. They give us a tremendous amount of feedback about Internet Explorer. We give them access training and bits as early as possible. We also work with them to help correct perceptions when there’s an incorrect perception about something IE related. But most importantly, they are an independent body of folks that we work with to understand the common challenges and issues facing the modern day web developer.
If you are interested in joining, email me or submit an application at http://userAgents.ie.
If you have any questions, feel free in emailing my at my email address which is on the first and last slide.
Also, feel free to reach out on twitter – @joshholmes.
There will be a lot more coming about performance in the very short term. Stay tuned and thanks for reading.