NOTE: All posts in this blog have been migrated to Web Performance Matters.
All updated and new content since 1/1/2007 is there. Please update your bookmarks.

Tuesday, December 12, 2006

Yahoo! on Web Page Performance

A recent post by Tenni Theurer, who works in a performance team at Yahoo!, appeared in the Yahoo! User Interface Blog. The post begins with the claim that ...
... most of web page performance is affected by front-end engineering, that is, the user interface design and development.
Theurer introduces the Pareto Principle, commonly known as the 80/20 rule, which states that 80% of the consequences come from 20% of the causes. In the case of Web page download time, she argues that the backend systems which generate an HTML document -- apache, C++, databases, etc. -- should be regarded as the 80% of causes that account for only 20% of download time. The other 80% of download time is spent fetching the elements that make up the page, such as images, scripts, and stylesheets.

She presents two graphics to illustrate this point. A table summarizing home page measurements of 8 popular Web sites -- Yahoo!, Google, MySpace, MSN, eBay, Amazon, YouTube, and CNN -- shows that retrieving HTML accounts for between 5% and 38% of their download times, and just 12% on average. The other 88% is taken up downloading and rendering other page elements.

Waterfall Charts
Yahoo Waterfall Chart [click to enlarge] To illustrate the reason why HTML accounts for such a small percentage of all page download time, Theurer uses the diagram shown here [click to enlarge]. I call this kind of data visualization a 'waterfall chart', because of its shape, and because it is similar to the way the waterfall model of software development has always been always depicted graphically.

Theurer says (in blog comment #22) she used the IBM Page Detailer tool to get the measurement data shown in this diagram. I have used this tool, and it produces a Chart Panel similar to the one above, which is a graphical representation of all of the component downloads that make up a single page.

I have not seen much documentation online, but if you take the time to register with IBM and download the software, its Help file is quite informative. It explains that the ...
... IBM Page Detailer attaches to the workstation TCP/IP socket stack and monitors all of the HTTP/HTTPS protocol based communications between your Web browser and the network. IBM Page Detailer then displays web activities in a series of color-coded bars.
It also has quite a long section on Results Analysis and other Considerations regarding Web download times.

Keynote Waterfall Chart [click to enlarge] This kind of analysis is not new. Since 1998, Keynote has offered a similar Web Content Diagnostics feature in its MyKeynote portal -- here's a datasheet describing the newest version of this service, recently released. For example, the figure on the right [click to enlarge] shows a Keynote diagnostic measurement of the Yahoo Search Marketing page.

Unlike the IBM Page Detailer, which always measures from the desktop where it is installed, Keynote's diagnostic measurements can be taken from anywhere on the company's worldwide network of 2500 measurement computers (or "agents"). This example comes from an agent located on the Verizon network in Houston.

The Lesson for Web Designers
Even without looking at the data in any detail, the message of these waterfall charts is evident from their general shape:
Web page download times rise with the number of bytes to be downloaded and with the number of separate content elements comprising a page.
This fact is not news to anyone whose job involves monitoring or managing site performance. Even for a Web page consisting only of a single HTML file, you cannot compute download time simply as page size divided by bandwidth. That's because of the way the TCP slow-start protocol works, sending one or two packets (typically 1.5Kb - 3KKb), then doubling, and doubling again, up to the receive window size of the client.

But most Web pages these days contain many imbedded graphic objects, CSS files, and script files, all of which extend download times. Most of these files are small, requiring just a few TCP packets to transmit. But every transmission requires a separate "Turn", adding another round-trip delay time (RTT) from the client (i.e. browser) to the server(s) and back. This behavior has been widely documented and is well-understood by performance specialists: Since then, many articles and papers have reiterated this message. An entire industry segment -- Application Delivery Systems -- has emerged to help organizations offset delays in delivering their Web pages and satisfy customers' expectations of responsiveness. All the same, as is clear from the comments posted to the Yahoo! User Interface Blog, some Web designers and developers still need to learn this lesson.

Tags: , , , , , ,, , , , , .

1 Comments:

Anonymous Anonymous said...

The Yahoo fellow has hit it right on the mark. Though I would argue that back-end systems can have as much damage on the end-user experience as the actual design itself.

All too often I visit websites where I have to wonder if the company was holding a contest to see how many separate objects they could cram onto the page. The sad part about it is, this type of web design is encouraged nowadays! I'm a big fan of Microsoft Live (www.live.com) but they have went overboard on cramming every little object, module and anything else they can think of onto the pages. And when it's slow.... it is unbelievably slow. All it takes is for one module to have problems downloading and you are stuck.

Makes me yearn for the days of the Wild-Wild-West when it came to the web. A simple page, some simple HTML and you had a masterpiece that could download in 10 seconds flat -- even on a 14,400 baud connection!

1/28/2007 09:52:00 PM  

Post a Comment

<< Home