I had the pleasure of speaking at BrazilJS 2013. My talk was about performance. While I was using Windows, IE11 and tools on Windows, the perf advice and the way that you need to thing about performance is completely agnostic. Chrome, Safari, Firefox and all of the browsers adhere to the same spec as we do and so are dictated to many things very similarly.
This is the third blog post in the series about BrazilJS. The first was about the the experience in Brazil called Obrigado BrazilJS. The second was A little about BrazilJS. And this one is about the talk itself.
The slide deck can be found on SlideShare.
The talk is my take on a talk that Tobin Titus did at //build this year.
I started off with an introduction of me. Feel free to drop me emails, my email address is in the deck. Next I talked about some of my experiences so far in Porto Alegre which you can find out about more of in posts 1 and 2.
The talk itself actually started on slide 12.
For this talk, we’re going to start off with a little exercise to test the audience on how much they know about performance. The theory is that if they’ve done web development, they’ve at least had to think about it a little bit.
For full disclosure, this is an example from Tobin’s talk that he borrowed from one of the other guys on the engineering team (I think Israel but I’m not sure) but it illustrates the point very crisply.
I had to fly in for the conference. It’s a long flight…
When I started shipping for this flight, I looked at some of the top web sites for booking flights for price comparisons. This included Priceline, Kayak, Travelocity, Expedia and Orbitz. All of these sites seem to have similar functionality. They each of logos in the upper left hand corner, banner ads, input fields for where I’m going to, where I’m coming from and so on. As such, they should have similar performance characteristics, right? Well, I would have thought so to but the reality is that these sites have fairly different performance. One of them loads very quickly while the the slowest takes several seconds.
Why is that?
Let’s start by breaking down the numbers. We’re not trying to make fun of any of these sites so we’ve anonymized the data but we’ve broken out the data into sites 1-5 in the categories of total download size in number of kilobytes, the number of elements, the number of CSS rules, the number of image files, the number of lines of JavaScript and the different script libraries that each of the sites used.
Starting with the size of the download, you can see that there’s a pretty big difference between these different sites ranging from 1,061k to 3,697k. This is inclusive of the images, script files, CSS, markup and everything else that was downloaded. One of the things that I thought was fascinating is that #3 has the smallest download size but the most images (more on that in a moment).
The next thing is the number of DOM elements that are created by the page. This is an indication at least of how complicated the page is for the browser to parse and manipulate. And again, where there is a big difference between 900 elements on the low end and 4,252 elements on the high end.
Another indication of how complicated the page can be is the number of CSS rules. And again, there’s a massive difference ranging from 1,105 all the way up to 5,352. One of the things that I thought was fascinating here is that #2 with the most CSS rules was one of the lowest in terms of the number of elements created telling me that there were a fair number of those roles that were probably not being used.
The next thing that we need to look at is the number of image files that are downloaded. Some of these are large images, some of them are just little things like social media tags. As we can see, there’s a huge difference between the 6 on the low end and the 66 on the high end but as stated before, the number of images doesn’t necessarily correlate to the download size of the page in general.
When people think about performance, very often they will think about JavaScript so we definitely have to look at the number of lines of code. There’s not a simple yet consistent way to measure complexity so this is the best that we can do for the purposes of this talk. Again, we see a massive difference in the number of lines of script between the different sites ranging from 10,284 and 77,768. This includes the libraries that they are using as well as the custom code that they are using.
Lastly, what are the different libraries that they are using for their architecture? It’s not shocking that all of them are using jQuery but then it gets more diverse. One of them is using YUI. One of them is using Scriptaculous and so on. All but one are using at least two different JavaScript frameworks to accomplish their tasks.
So all of this brings us back to the question that started all of this. (And don’t read ahead if you can help it…)
How much do you know about performance?
If you analyze the numbers shown so far, can you predict which one is the fastest?
#5 is the fastest. A lot of people guess #4 because it’s only using one JavaScript framework and has the least amount of lines of JavaScript.
What we’re not so subtly pointing out is that JavaScript is not the only contributor to performance on a site. There are a lot of factors to take into account.
Now, given what we know about the sites and what we know about performance as it relates to download size, number of elements, CSS rules, image files, number of lines of script and the number of JavaScript frameworks…
Which site is the slowest?
#2 is the slowest. This is surprising to a lot of folks because they see the size of #1 with the largest number of lines of script. However, it’s got a lot less CSS rules and image files and the libraries that it uses, jQuery and YUI, seem to be fairly well optimized for performance. #2, while not the largest in anything except for CSS rules, is towards the top in every category while #5 is towards the bottom in a lot of the categories.
So what does make a web site fast? All of the things that we’ve talked about contribute to it and are the things that most web developers think about but most of the time, people are looking for a single quick fix and are not thinking across the spectrum of things that can effect performance.
You really need to think about 3 categories of things.
Network, CPU and GPU.
Hopefully the importance of these three categories is self evident but the GPU is a new(ish) category that we need to think about. We’ll talk more about each of these categories throughout the rest of the talk.
To really understand this, we need to go through the Web Runtime Architecture. I’m very consciously saying Web Runtime Architecture because all browsers work this way. They may have different names for their subsystems or have focused on optimizing for a given scenario more or less than IE has but the reality is that we’re all building for the same specifications. We’ll talk about some of these at a high level as we go through. This high level look at the web runtime architecture covers from when the user enters a URL in the address bar until the bits are painted on the screen, any JavaScript that executes and any input that happens.
The first thing that happens is Networking. Networking starts with the first URL entered. For today, I’ll be talking about starting with an HTML page but the reality is that Networking handles whatever type of resource it is including XML, JSON, IMG, PDF, etc and different ones of the subsystems come into play for various resources. But let’s start with an HTML file. The first thing that happens is that we start downloading the HTML file and as we are downloading it, we start pre-processing it. What we’re doing the is that we’re scrubbing the file for anything else that we could go get. This includes CSS files, JavaScript files, images or whatever is referenced by the file. This is important because we want to get all of the content that we’re going to need as quickly as possible because it’s all interdependent.
An important issues though is that per the spec, we’re only allowed to open up 6 simultaneous connections to any given server. That doesn’t sound like a lot of connections when you look at pages such as the #3 travel site which had 66 image files or pages such as Facebook. In fact, it sounds quite crippling. It’s actually up from where the spec started which was 2 simultaneous connections. That’s not a lot of connections but that’s where we started. The reason, because there is a reason rather than the W3C just being random (not saying that they aren’t random, just not in this case) is to help a server detect a DOS or Denial of Service Attack where the client opens up as many connections as possible and takes as much of the bandwidth from the server as they possibly can. If a browser can only open up 6 connections and download 6 things at a time, it means that if there is an app trying to download 100 things at the same time, it’s probably malicious.
Once an item is downloaded, it goes to the parsers. There are dozens of parses which parse everything including HTML, CSS, XML, XHTML, SVG, JavaScript and variations on all of these. The job of the parsers is to create the internal data structures that we’ll be using for the rest of the processing.
Everyone should be familiar with the DOM tree. If you’re not, you’re probably reading the wrong blog. The DOM tree has been the best friend and the biggest bane to most web developers out there in the world. As optimized as the DOM can get, the reality is that there’s a lot of things that depend on it so when it changes, it affects a lot of moving parts. We’ll see this as we go forward.
Another internal data structure that you should be familiar with is the CSS cascade. This is an amalgamation of all of the CSS rules that are referenced in the HTML and all of the CSS files as well as the rules that are set programmatically by JavaScript. It incorporates all of the orders of precedence as to what overrides what and so on.
Next is JavaScript. Poor lonely JavaScript all by itself down there in its sandbox. The JavaScript sandbox is where JavaScript parsing, byte code generation, native code generation and all of the execution of JavaScript happens. This is why JavaScript is safe to run inside of browsers. It’s not that it’s a safe language. Quite the contrary, it’s an incredibly powerful language that if unchecked to run rampant over everything on your machine.
Rather, it’s safe because it’s only two ways out of the JavaScript sandbox are to call either the DOM API or to call out through the network doing XHR requests. There are a ton of frameworks that help us with both of these tasks. jQuery is one of the most popular that helps with both. The $(“path statement”) syntax is incredibly powerful as is the $.getJSON() method.
The next subsystem is formatting. Formatting is important because the DOM tree is completely ignorant to anything visual. All it knows is the parent child relationships between the elements and the attributes. It’s the CSS cascade that knows all of that information. In formatting this information is joined up giving the DOM elements size, color, backgrounds, font sizes and so on.
The next step, now that all of the DOM elements know what they look like, we now need to figure out how they look together. CSS is inherently a block based layout so everything, images, paragraphs, divs, spans, event shapes such as circles, are actually blocks to CSS. And HTML/CSS is inherently a flow based layout unless something overrides that. As such, the primary job of the layout engine is to put all of the blocks on the screen. This includes positioning objects based on their relative or absolute positioning, wrapping or scaling things that are too wide and all of the other things that go into that lightning round of tetris that is required.
At the end of the layout phase, we’ve got the display tree. This is an interesting data structure. Some folks ask why we don’t just use the DOM tree but it’s not a 1-1 relationship between the items in the DOM tree and the items in the display tree. This upsets some folks. But let’s think about it. There are things that are in the DOM tree that are not in the display tree such as elements that are display:none or input fields that are type=hidden. There are also things that are in the display tree that are not in the DOM tree such as the numbers on an order list ( <ol><li>asdf</li></ol> ).
At this point, we have the display tree and we’re ready to paint. Notice that I didn’t say that we’re painting. We’re ready to paint. What’s required next requires waiting on the hardware. Monitors can only paint so many times a second. Most modern day ones do that 60 times a second, thus the 60 hertz refresh rate. This means that it’s 1000/60 (milliseconds divided by the refresh rate) on most monitors between refreshes which works out to 16.67 milliseconds (roughly). That doesn’t sound like a lot of time but in reality a modern day I7 processor can run roughly 2138333333 instructions (Millions of instructions per second as noted in wikipedia on Instructions per Second/60). Between me and you, that’s a lot of instructions. It’s not infinite but it’s a lot.
On the refresh rate, we get the hardware interrupt which lets us paint. At that point, we draw to the surface which for IE is DirectX surfaces.
Once we have those surfaces, we send them to compositing and you have bits on the screen. This is an exciting time in the life of any web developer… 🙂
This is how we think about how bits get from the WWW to your customer’s monitor.
But then there’s input such as touch or a mouse. And we think about those things coming in backwards through the pipeline because we have to look at the layout and the formatting to do the hit testing and figure out if you’ve touched anything important. Then we have to look at the DOM tree to see if there are any JavaScript events to trigger which fires off the JavaScript which touches the DOM tree which invalidates the display tree which causes formatting and layout to happen which creates a new display tree which causes a new painting and compositing cycle to happen and so on.
Make sense?
So this is the full pipeline. You need to have this mental model in your head as you’re creating performant (I know that performant is “not a word” but I’ve got an English degree so I’ve got a license to make up new words as needed or deem a series of sounds that’s commonly uttered with a common understanding of what those sounds mean to be a word so shut up about it…) sites. It’s not just JavaScript that’s the culprit,
But better than just having a mental model of the subsystems in a browser is being able to actually peek into the those subsystems with a good profiler and see what your site is actually doing at any given point in time.
We’ll spend the rest of this talk digging into two of them. The first one that we’ll look at is built right into IE11 and that’s the F12 developer tools. The second one is free with Windows Performance Analysis Toolkit and it’s called the Windows Performance Analyzer.
Both of these are built on ETW or Event Tracing for Windows. Every piece of software that Microsoft produces is heavily instrumented to give out ETW log information. All you have to do is turn it on, capture that information and then analyze it.
The great news here is that you can add in your own instrumentation as well if you want.
The F12 Developer Tools
The F12 Developer tools in IE11 are brand new and written by the same team that does Visual Studio. The DOM inspector and the console have had an overhaul but I get really excited by the debugger and other tools.
The network tab has tremendous amount of great information. It details each of the downloads as to when it started, when it ended, the protocol, the method, the response code, the MIME type, how big it was, how long it took and even what initiated the download. Lastly, it shows you a really easy to follow graph of what’s happening along what timeline.
The next tab that I covered (and yes, these are out of order compared to the tools) was the profiler. What this gives you is an in depth look at what JavaScript is doing starting with how often a particular function or property was called, how long it took to execute with both inclusive and exclusive time, where that was called from and more. It’ll even tell you which worker called it if it’s doing by a worker thread.
Next is the Memory tab. This tab allows you to watch the memory usage in real time as your page is working along but more importantly, you can create snapshots of where memory is any given point and compare that to a snapshot later in the page’s life. The snapshots give you the number objects, the size of the objects and much more information that’s critical to your profiling a site.
The last tab that I’ll talk about is the UI responsiveness tab. This one is built on ETW which means that you have to record the session and then analyze it. In this, you can see it broken out into the various subsystems so you can see rendering, image decoding, scripting, garbage collection, styling (which is what it calls formatting) and so on.
This is an incredibly powerful set of profiling tools and will give you what you need in most cases.
When these tools are not enough, you can step up to the Windows Performance Analyzer. It will be overkill for a lot of what you want to get done but it’s a great tool to have in your bag.
I spend the bulk of the time in this talk going through the demo which I’ll come back and detail here shortly but it’s time I post this now and there’s a lot of good information in this post already.
What’s new with IE11 right now? There’s a ton of new stuff. First, it’s shipping with Windows 8.1 which is in RTM at the moment and due out middle of October. Second, it’s shipping for Windows 7 as well. I don’t have a date at the moment but it’ll be exciting when it comes out. The current preview doesn’t have the F12 developer tools built in but the RTM will.
There’s a new User Agent string. You should not be doing browser sniffing anyways but rather be doing feature detection but that’s for a different blog post.
WebGL rocks. Enough said.
Evergreen updates – we’ve turned on auto-update for the browser by default so going forward everyone who doesn’t opt out of them so we shouldn’t end up in the scenario where we’ve got a ton of people on IE11 when we’re shipping IE90 or whatever.
Now, you should go to http://modern.IE and check it out. We’ve got a ton of useful things there. The first thing you’ll see is a scanner that you can use to quickly scan your site for common issues with compatibility with web sites across many of the modern day browsers. For example, if you’re using a –webkit prefix for something and there’s an available –ms and –moz prefix as well, we’ll suggest that. We’ll also look for things like a responsive web design and there’s even a screen-shotting service that will show you screen shots of your page across a large variety of devices.
We’ve also got offers such as 3 months free at BrowserStack which will let you do automated testing across a lot of platforms.
There’s also free 120 day VMs with a wide variety of configurations of operating systems and browser combos for you to use in your testing. Plus there’s a Parallels discount offer and a bunch more stuff.
Last thing to talk about is our userAgents.ie community. This is a group of very bright developers from around the world that are passionate about standards based web development. They do a lot of great things including share ideas amongst themselves and with the community. They give us a tremendous amount of feedback about Internet Explorer. We give them access training and bits as early as possible. We also work with them to help correct perceptions when there’s an incorrect perception about something IE related. But most importantly, they are an independent body of folks that we work with to understand the common challenges and issues facing the modern day web developer.
If you are interested in joining, email me or submit an application at http://userAgents.ie.
If you have any questions, feel free in emailing my at my email address which is on the first and last slide.
Also, feel free to reach out on twitter – @joshholmes.
There will be a lot more coming about performance in the very short term. Stay tuned and thanks for reading.