The New Web Caching Tier
Previously, I wrote about the Rise of The Dynamic Web and how the build up of dynamic content, social linkages, and increasing usage is putting a strain on origin site data centers, adversely affecting performance and page loading.
The nature of dynamic web applications means that content (lots of personalized or "long tail" data) cannot be cached at the edge on Content Distribution Networks such as Akamai, Level 3 or Limelight. This means that clients making requests of these origin sites suffer from a round trip latency floor of 40-90ms (depending on relative location)... and this does not account for processing time!

If these dynamic client requests were served by processing database queries, or execution of application code logic (not to mention accessing disk) the latency can balloon to hundreds of milliseconds or more. You may be asking "does a fraction of second matter?" Absolutely! Time is money (and user satisfaction). To quantify this, see The Psychology of Web Performance where the following figures are listed:
· Tests at Amazon revealed for every 100 ms increase in page load time, sales decreased 1% (Kohavi and Longbotham 2007);
· Google found that moving from a 10-result page loading in 0.4 seconds to a 30-result page loading in 0.9 seconds decreased traffic and ad revenues by 20% (Linden 2006); and
· Experiments at Microsoft on Live Search showed that when search results pages were slowed by 1 second ad clicks per user declined 1.5% (Kohavi 2007)
The unassisted three-tiered LAMP or Java origin site architecture was not designed to handle such high volumes of data retrieval by dynamic web application- which is what happens when millions of people change their Facebook status 20times an hour, for example, or when geo-location information is being constantly updated, or when item ratings or entertainment reviews are being added and accessed by hundreds of thousands of people simultaneously.
At Gear6, we're talking to our customers about the new "Web scale architecture," which introduces a new Web caching tier. The idea is to move database and application functionality into a distributed cache where data can be retrieved in microseconds or less. As Evan Weaver from Twitter commented at QCON earlier this year: "Everything runs from memory in Web 2.0."
Most popular distributed caching system deployed on the web today is based on Memcached, originally created to help boost Live Journal's performance. It's a two tiered distributed hashing table that is the scaling foundation for 85% of today's top web sites. One could say Memcached is one of the best-kept secrets in Web 2.0. We'll cover some Memcached basics, do's and don'ts over the next several posts. Let us know if there's anything in specific you'd like to see as well.

Great post. One question:
Great post. One question: I've seen a wide range of stats on the use of memcached among the top web sites (50-85%). I'm wondering how was this is determined and whether there a source for this figure. Thanks! - joe
Post new comment