A recent post by Tenni Theurer, who works in a performance team at Yahoo!, appeared in the
Yahoo! User Interface Blog. The post begins with the claim that ...
... most of web page performance is affected by front-end engineering, that is, the user interface design and development.
Theurer introduces the
Pareto Principle, commonly known as the 80/20 rule, which states that 80% of the consequences come from 20% of the causes. In the case of Web page download time, she argues that the backend systems which generate an HTML document -- apache, C++, databases, etc. -- should be regarded as the 80% of causes that account for only 20% of download time. The other 80% of download time is spent fetching the elements that make up the page, such as images, scripts, and stylesheets.
She presents two graphics to illustrate this point. A table summarizing home page measurements of 8 popular Web sites -- Yahoo!, Google, MySpace, MSN, eBay, Amazon, YouTube, and CNN -- shows that retrieving HTML accounts for between 5% and 38% of their download times, and just 12% on average. The other 88% is taken up downloading and rendering other page elements.
Waterfall Charts To illustrate the reason why HTML accounts for such a small percentage of all page download time, Theurer uses the diagram shown here [
click to enlarge]. I call this kind of data visualization a 'waterfall chart', because of its shape, and because it is similar to the way the
waterfall model of software development has always been always depicted graphically.
Theurer says (in blog comment #22) she used the
IBM Page Detailer tool to get the measurement data shown in this diagram. I have used this tool, and it produces a
Chart Panel similar to the one above, which is a graphical representation of all of the component downloads that make up a single page.
I have not seen much documentation online, but if you take the time to register with IBM and download the software, its Help file is quite informative. It explains that the ...
... IBM Page Detailer attaches to the workstation TCP/IP socket stack and monitors all of the HTTP/HTTPS protocol based communications between your Web browser and the network. IBM Page Detailer then displays web activities in a series of color-coded bars.
It also has quite a long section on
Results Analysis and other
Considerations regarding Web download times.
This kind of analysis is not new. Since 1998,
Keynote has offered a similar
Web Content Diagnostics feature in its MyKeynote portal -- here's a
datasheet describing the newest version of this service, recently released. For example, the figure on the right [
click to enlarge] shows a Keynote diagnostic measurement of the
Yahoo Search Marketing page.
Unlike the IBM Page Detailer, which always measures from the desktop where it is installed, Keynote's diagnostic measurements can be taken from anywhere on the company's worldwide network of 2500 measurement computers (or "agents"). This example comes from an agent located on the Verizon network in Houston.
The Lesson for Web DesignersEven without looking at the data in any detail, the message of these waterfall charts is evident from their general shape:
Web page download times rise with the number of bytes to be downloaded and with the number of separate content elements comprising a page.
This fact is not news to anyone whose job involves monitoring or managing site performance. Even for a Web page consisting only of a single HTML file, you cannot compute download time simply as page size divided by bandwidth. That's because of the way the
TCP slow-start protocol works, sending one or two packets (typically 1.5Kb - 3KKb), then doubling, and doubling again, up to the receive window size of the client.
But most Web pages these days contain many imbedded graphic objects, CSS files, and script files, all of which extend download times. Most of these files are small, requiring just a few TCP packets to transmit. But every transmission requires a separate "Turn", adding another
round-trip delay time (RTT) from the client (i.e. browser) to the server(s) and back. This behavior has been widely documented and is well-understood by performance specialists:
Since then, many articles and papers have reiterated this message. An entire industry segment --
Application Delivery Systems -- has emerged to help organizations offset delays in delivering their Web pages and satisfy customers' expectations of responsiveness. All the same, as is clear from the comments posted to the Yahoo! User Interface Blog, some Web designers and developers still need to learn this lesson.
Tags: Yahoo,
Pareto principle,
80/20 Rule,
performance,
Web performance management,
IBM Page Detailer,
Keynote,
page size,
round trip time,
RTT,
TCP,
slow start.