Yahoo! on Web Page Performance
A recent post by Tenni Theurer, who works in a performance team at Yahoo!, appeared in the Yahoo! User Interface Blog. The post begins with the claim that ...
... most of web page performance is affected by front-end engineering, that is, the user interface design and development.Theurer introduces the Pareto Principle, commonly known as the 80/20 rule, which states that 80% of the consequences come from 20% of the causes. In the case of Web page download time, she argues that the backend systems which generate an HTML document -- apache, C++, databases, etc. -- should be regarded as the 80% of causes that account for only 20% of download time. The other 80% of download time is spent fetching the elements that make up the page, such as images, scripts, and stylesheets.
She presents two graphics to illustrate this point. A table summarizing home page measurements of 8 popular Web sites -- Yahoo!, Google, MySpace, MSN, eBay, Amazon, YouTube, and CNN -- shows that retrieving HTML accounts for between 5% and 38% of their download times, and just 12% on average. The other 88% is taken up downloading and rendering other page elements.
To illustrate the reason why HTML accounts for such a small percentage of all page download time, Theurer uses the diagram shown here [click to enlarge]. I call this kind of data visualization a 'waterfall chart', because of its shape, and because it is similar to the way the waterfall model of software development has always been always depicted graphically.
Theurer says (in blog comment #22) she used the IBM Page Detailer tool to get the measurement data shown in this diagram. I have used this tool, and it produces a Chart Panel similar to the one above, which is a graphical representation of all of the component downloads that make up a single page.
I have not seen much documentation online, but if you take the time to register with IBM and download the software, its Help file is quite informative. It explains that the ...
... IBM Page Detailer attaches to the workstation TCP/IP socket stack and monitors all of the HTTP/HTTPS protocol based communications between your Web browser and the network. IBM Page Detailer then displays web activities in a series of color-coded bars.It also has quite a long section on Results Analysis and other Considerations regarding Web download times.
This kind of analysis is not new. Since 1998, Keynote has offered a similar Web Content Diagnostics feature in its MyKeynote portal -- here's a datasheet describing the newest version of this service, recently released. For example, the figure on the right [click to enlarge] shows a Keynote diagnostic measurement of the Yahoo Search Marketing page.
Unlike the IBM Page Detailer, which always measures from the desktop where it is installed, Keynote's diagnostic measurements can be taken from anywhere on the company's worldwide network of 2500 measurement computers (or "agents"). This example comes from an agent located on the Verizon network in Houston.
The Lesson for Web Designers
Even without looking at the data in any detail, the message of these waterfall charts is evident from their general shape:
Web page download times rise with the number of bytes to be downloaded and with the number of separate content elements comprising a page.This fact is not news to anyone whose job involves monitoring or managing site performance. Even for a Web page consisting only of a single HTML file, you cannot compute download time simply as page size divided by bandwidth. That's because of the way the TCP slow-start protocol works, sending one or two packets (typically 1.5Kb - 3KKb), then doubling, and doubling again, up to the receive window size of the client.
But most Web pages these days contain many imbedded graphic objects, CSS files, and script files, all of which extend download times. Most of these files are small, requiring just a few TCP packets to transmit. But every transmission requires a separate "Turn", adding another round-trip delay time (RTT) from the client (i.e. browser) to the server(s) and back. This behavior has been widely documented and is well-understood by performance specialists:
- In a previous post here, I described the Web Site Response Time Model and linked to the CMG2000 paper (E-Commerce Response Time: A Reference Model) in which I introduced it.
- In 2001, Peter Sevcik and John Bartlett of NetForecast published their paper on Understanding Web Performance, in which they first outlined a formula for computing Web page download times based on total bytes, the number of components, and on round-trip times between browser and server(s).
- That formula was simplified and explained in a highly readable paper by Alberto Savoia -- Web Page Response Time 101 -- published in July 2001 in STQE, the magazine of software testing and quality engineering (now Better Software Magazine).
Tags: Yahoo, Pareto principle, 80/20 Rule, performance, Web performance management, IBM Page Detailer,Keynote, page size, round trip time, RTT, TCP, slow start.