Thursday, August 13, 2009

Effective Web Performance: Choosing a CDN

Content Delivery Networks (CDNs) are a key component to any Web performance strategy. If you examine the content from any large online business or media provider, it won't take long to find the objects that these organizations have entrusted to CDNs to ensure faster delivery and a better user experience.

When working with CDNs, it is critical to understand some terms or concepts that you will be presented with. Each CDN will present them in it's own unique way and using its own unique terminology. Having an understanding of the underlying concepts, you will be able to have discussions with CDNs that are more meaningful, and targeted on your needs.

The Massively Distributed Model

CDNs fall into one of two categories, the first being the massively distributed model. CDNs that use this method will demonstrate how they have hardware and caching content servers in almost every city and town of any size in the world. As well, they have their systems located on every major consumer network in order to ensure that they are as close to the end-user as possible.

The CDN everywhere model, while far-reaching and seemingly extremely effective does have its disadvantages. First, the CDN infrastructure relies on having extremely accurate maps of the Internet in order to direct visitors to the most proximate CDN server location. However, these maps are only truly effective when visitors use DNS servers that are on the same network that they are. Services such as OpenDNS and DNS Advantage can seriously effect the proximity algorithms of the distributed CDN by removing the key piece of localization information that they need to ensure that the best cache location is selected.

Also, as with any proxy caching methodology, this model relies on use. More popular items stay in the cache longer, while less popular items may be pushed aside or stored further upstream at parent caches for retrieval, adding a few extra milliseconds for the initial request. Also, new content has to be pushed out to the edge, and may take a few hours to be completely propagated.

The Massively Concentrated Model

CDNs that use this model rely on a smaller number of locations than the massively distributed model. However, these locations tend to be massive and incredibly well connected, relying on the concept that even if they are a few more hops away, their content is always there and ready for requests.

These sites have massive amounts of storage and rely on private networks to ensure that new content is immediately pushed out to the super-nodes as soon as it is added. And while they may be those extra few hops away, the performance difference may not be enough for the average site visitor to notice.

The obvious disadvantage of the massively concentrated model is that it is great for serving those places where there is a lot of traffic. However, in regions with less traffic, or less developed infrastructures, the fewer boots on the ground may begin to have an effect on performance.

Other CDN Concepts

Application Proxy

CDNs offer many institutions the ability to use their network for all incoming requests, even if they are for dynamic content that will require processing in the client datacenter. In these instances, the CDN acts as an application proxy, using its superior knowledge of routing and traffic patterns to move requests from the edge of the Internet back to the datacenter more effectively.

Remember: Just because the CDN is providing fast routing and delivery to the visitor, your application is still the bottleneck. Poor app design or slow queries will affect the application in exactly the same way that it would if the call was coming straight to your datacenter.

Traffic Acceleration

In certain circumstances, security and regulatory concerns completely eliminate the ability of a business to use the standard CDN model. Banks, government agencies, and health-care providers cannot store data in an environment whose security they cannot vouch for, no matter how many safeguards are put in place.

These organizations still need to be able to deliver a good customer experience, so there has to be a way to help accelerate their content without taking control of it. Traffic acceleration serves this purpose by using proprietary network protocol adaptations that remove some of the overhead associated with standard network protocols.

Content is intercepted at the datacenter and routed across private networks using the streamlined network protocols to an network location that is as close to the visitor as possible. Once it has reached the appropriate location, it is converted back to standard TCP and passed to the visitor.

The method above describes how a standard Web request works, but this can also be extended to true point-to-point VPNs with endpoints separated by great network and/or physical distances.

Validating the Claims

Any component of choosing or using a CDN is quantifying the effectiveness of the solution. The standard for many years has been the bake-off method of comparison. The prospect's origin site is measured against the same site delivered by one or more CDNs. The CDN vendor with the fastest performance and the best price usually wins.

Before walking into a bake-off, come prepared. Turn your CDN bake-off into an episode of Iron Chef. Come to the table with the ingredients, and make the CDNs prepare a solution that meets your needs.

Measure Transactions

The standard base measurement that CDNs will use in a bake-off is single object(s) or page measurement. Your visitors do not just visit a single page, so ensure that the CDN has an effective solution that produces noticeable performance improvements across all the key functions of your site, including the secure components of the site, where the money is made.

Measure from the Edge

Backbone measurements are great for baselining and detecting operational issues that require a consistent and stable dataset. Your customers, however, do not have direct connections to high-priced datacenters with fat pipes.

The two CDN models will react differently to under certain circumstances, and this will appear in edge measurements. Measuring on the ground, from the ISPs that your customers use, will give you a clear sense of how much improvement a CDN will provide when compared to the performance of your origin datacenter.

The edge is messy, chaotic, and what your customers deal with everyday.

Understand the SLAs/SLOs

CDNs will always provide either service level agreement (SLA) with service level objectives (SLOs) stated in it. This topic is at once recognizable and about as well understood as 11 Dimensional Theoretical Physics.

I have written briefly about SLAs and SLOs before [here and here]. Do your research before you wade into this polite version of white-collar trench warfare.

Make sure you understand what the goal of the SLA is. Make sure that the SLOs are clear, measurable, valid, and enforceable. Then ensure that the method used to measure the SLOs is one that your organization can understand and can accept as valid.

Finally, ensure that the SLOs are reviewed monthly.


Understanding the foundational technology that underlies the CDNs you use or are considering using will help you make better decisions.

No comments:

Post a Comment