mPulse

Tuesday, September 30, 2008

FriendFeedHolic - A Social Media Ranking Model for Advertising and Marketing Success

One of the most challenging things in social media is finding the conversation leaders. Those people who drive the conversation, and create a community.

FriendFeedHolic (ffholic) has taken the base knowledge that exists in FriendFeed and added a ranking mechanism to it based on input and output. In fact, they weight the participation in the FriendFeed community more heavily than participation in other communities.

This is important. Although FriendFeedHolic is separate from FriendFeed, they have found the way to isolate and target those users who are most likely to participate and create conversations. These users, be it Scoble or Mona N, are where advertisers and marketers can target their money.

How would they do this?

Think about it. If someone that is a large commenter or conversation-creator on FriendFeed creates new content, they are assigned a higher ranking in the new conversation-driven ad-discovery model that advertisers will have to create to succeed.

This new targeted advertising logic will be forced to discover:

  • The content of the conversation

  • The context of the conversation

  • The tone of the conversation

  • The participants in the conversation


This model will be able to identify when it is an inward-facing conversation that involves mostly super-users, or if it is a conversation that engages a wide-spectrum of people.

Conversations among super-users will lead to more passive advertising being shown, as that is a spectator event, with only a few participants.

Conversations created by super-users, or that involve super-users, but have a higher participation from the general community will get more intelligent attention to ensure that the marketing messages and advertising shown fit the four criteria above.

In this new model, advertisers will have to see that they can't simply slap a set of ads up on the popular kids web sites. They will have to understand who leads a community, who generates buzz, and who can engage the most people on a regular basis.

In this model, the leader has far less power than the community that they create. And maintain.

Monday, September 29, 2008

Ferrari Full Service: FAIL

funny pictures
moar funny pictures

Full Story here.

Advertising to the Community: Is PageRank a Good Model for Social Media?

In previous posts about advertising and marketing to the new social media world [here and here], I postulated that it is very difficult to assign a value to a stream of comments, a community of followers, or a conversation.

As always, Google seems (to think) it has the answer. BusinessWeek reports the vague concept of PageRank for the People [here]. Matt Rhodes agrees with this idea, and that advertising will become more and more focused on the community, rather than on the content.

Where the real value in this discussion lies is in targeting the advertising to be relevant to the conversation. It's not just matching the content. It's all about making the advertising relevant to the context.

Is the tone of the conversation about the brand positive or negative? I like to point out that I see my articles about Gutter Helmet creating a content-match in the AdSense logic that drives this product to be advertised. What is lost in the logic that AdSense uses is that I am describing my extremely negative experience with Gutter Helmet.

Shouldn't the competitors of Gutter Helmet be able to take advantage of this, based on the context of the article? Shouldn't Gutter Helmet be trying to respond to these negative posts by monitoring the conversation and actively trying to turn a bad customer experience into a positive long-term relationship?

Conversation and community marketing is a far more complex problem than a modified PageRank algorithm. It is not about the number of connections, or the level of engagement. In the end, it is about ensuring that advertisers can target their shrinking marketing dollars at the conversations that are most important.

Injecting irrelevant content into conversation is not the way to succeed in this new approach. Being an active participant in the conversation is the key.

In effect, the old model that is based on the many eyeballs for the lowest cost approach is failing. A BuzzLogic model that examines conversations and encourages firms to intelligently and actively engage in them is the one that will win.

The road to success is based on engagement, not eyeballs.

The Dog and The Toolbox: Using Web Performance Services Effectively

The Dog and The Toolbox


One day, a dog stumbled upon a toolbox left on the floor. There was a note on it, left by his master, which he couldn't read. He was only a dog, after all.

He sniffed it. It wasn't food. It wasn't a new chew toy. So, being a good dog, he walked off and lay on his mat, and had a nap.

When the master returned home that night, the dog was happy and excited to see him. He greeted his master with joy, and brought along his favorite toy to play with.

He was greeted with yelling and anger and "bad dog". He was confused. What had he done to displease his master? Why did the master keep yelling at him, and pointing at the toolbox. He had been good and left it alone. He knew that it wasn't his.

With his limited understanding of human language, he heard the words "fix", "dishwasher", and "bad dog". He knew that the dishwasher was the yummy cupboard that all of the dinner plates went in to, and came out less yummy and smelling funny.

He also knew that the cupboard had made a very loud sound that had scared the dog two nights ago, and then had spilled yucky water on the floor. He had barked to wake his master, who came down, yelling at the dog, then yelling at the machine.

But what did fix mean? And why was the master pointing at the toolbox?

The Toolbox and Web Performance


It is far too often that I encounter companies that have purchased Web performance service that they believe will fix their problems. They then pass the day-to-day management of this information on to a team that is already overwhelmed with data.

What is this team supposed to do with this data? What does it mean? Who is going to use it? Does it make my life easier?

When it comes time to renew the Web performance services, the company feels gipped. And they end up yelling at the service company who sold them this useless thing, or their own internal staff for not using this tool.

To an overwhelmed IT team, Web performance tools are another toolbox on the floor. They know it's there. It's interesting. It might be useful. But it makes no sense to them, and is not part of what they do.

Giving your dog the toolbox does not fix your dishwasher. Giving an IT team yet another tool does not improve the performance of a Web site.

Only in the hands of a skilled and trained team does the Web performance of a site improve, or the dishwasher get fixed. As I have said before, a tool is just a tool. The question that all organizations must face is what they want from their Web performance services.

Has your organization set a Web performance goal? How do you plan to achieve your goals? How will you measure success? Does everyone understand what the goal is?

After you know the answers to those questions, you will know that that as amazing as he is, your dog will not ever be able to fix your dishwasher.

But now you know who can.

Friday, September 26, 2008

Managing Web Performance: A Hammer is a Hammer

Give almost any human being a hammer, and they will know what to do with it. Modern city dwellers, ancient jungle tribes, and most primates would all look at a hammer and understand instinctively what it does. They would know it is a tool to hit other things with. They may not grasp some of the subtleties, such as that is designed to drive nails into other things and not beat other creatures into submission, but they would know that this is a tool that is a step up from the rock or the tree branch.

Simple tools produce simple results. This is the foundation of a substantial portion of the Software-as-a-Service (SaaS) model. SaaS is a model which allows companies to provide a simple tool in a simple way to lower the cost of the service to everyone.

Web performance data is not simple. Gathering the appropriate data can be as complex as the Web site being measured. The design and infrastructure that supports a SaaS site is usually far more complex than the service it presents to the customer. A service that measures the complexity of your site will likely not provide data that is easy to digest and turn into useful information.

As any organization who has purchased a Web performance measurement service, a monitoring tool, a corporate dashboard expecting instant solutions will tell you, there are no easy solutions. These tools are the hammer and just having a hammer does not mean you can build a house, or craft fine furniture.

In my experience, there are very few organizations that can craft a deep understanding of their own Web performance from the tools they have at their fingertips. And the Web performance data they collect about their own site is about as useful to them as a hammer is to a snake.

Tuesday, September 23, 2008

Web Performance and Advertising: Latency Kills

One of the ongoing themes is the way that slow or degrading response times can have a negative effect on how a brand is perceived. This is especially true when you start placing third-party content on your site. Jake Swearingen, in an article at VetureBeat, discusses the buzz currently running through the advertising world that Right Media is suffering from increasing latency, a state that is being noticed by its customers.

In the end, the trials and tribulations of a single ad-delivery network are not relevant to world peace and the end of disease. However, the performance of an advertising platform has an affect on the brands that host the ads on their sites and the on the brand of the ad platform itself. And in a world where there are many players fighting for second place, it is not good to have a reputation as being slow.

The key differentiators between advertising networks fighting for revenue are not always the number of impressions or the degree to which they have penetrated a particular community. An ad network is far more palatable to visitors when it can deliver advertising to a visitor without affecting or delaying the ability to see the content they originally came for.

If a page is slow, the first response is to blame the site, the brand, the company. However, if it is clear that the last things to load on the page are the ads, then the angst and anger turns toward those parts of the page. And if visitors see ads as inhibitors to their Web experience, the ads space on a page is more likely to be ignored or seen as intrusive.

Monday, September 22, 2008

Welcome Back!

If you can see this post, the DNS system has finally propagated my new host information out to the Web, and you have reached me at the new server, located at BlueHost.

After my LinkedIN request last night, I got two separate rcommendations for BlueHost, both from folks I highly respect.

Let me know what you think.

Web Performance: Managing Web Performance Improvement

When starting with new clients, finding the low-hanging fruit of Web performance is often the simplest thing that can be done. By recommending a few simple configuration changes, these early stage clients can often reap substantial Web performance improvement gains.

The harder problem is that it is hard for organizations to build on these early wins and create an ongoing culture of Web performance improvement. Stripping away the simple fixes often exposes deeper, more base problems that may not have anything to do with technology. In some cases, there is no Web performance improvement process simply because of the pressure and resource constraints that are faced.

In other cases, a deeper, more profound distrust between the IT and Business sides of the organization leads to a culture of conflict, a culture where it is almost impossible to help a company evolve and develop more advanced ways of examining the Web performance improvement process.

I have written on how Business and IT appear, on the surface, to be a mutually exclusive dichotomy in my review of Andy King's Website Optimization. But this dichotomy only exists in those organizations where conflict between business and technology goals dominate the conversation. In an organization with more advanced Web performance improvement processes, there is a shared belief that all business units share the same goal.

So how can a company without a culture of Web performance improvement develop one?

What can an organization crushed between limited resources and demanding clients do to make sure that every aspect of their Web presence performs in an optimal way?

How can an organization where the lack of transparency and the open distrust between groups evolve to adopt an open and mutually agreed upon performance improvement process?

Experience has shown me that a strong culture of Web performance improvement is built on three pillars: Targets, Measurements, and Involvement.

Targets


Setting a Web performance improvement target is the easiest part of the process to implement. it is almost ironic that it is also the part of the process that is the most often ignored.

Any Web performance improvement process must start with a target. It is the target that defines the success of the initiative at the end of all of the effort and work.

If a Web performance improvement process does not have a target, then the process should be immediately halted. Without a target, there is no way to gauge how effective the project has been, and there is no way to measure success.

Measurements


Key to achieving any target is the ability to measure the success in achieving the target. However, before success can be measured, how to measure success must be determined. There must be clear definitions on what will be measured, how, from where, and why the measurement is important.

Defining how success will be measured ensures transparency throughout the improvement process. Allowing anyone who is involved or interested in the process to see the progress being made makes it easier to get people excited and involved in the performance improvement process.

Involvement


This is the component of the Web performance improvement process that companies have the greatest difficulty with. One of the great themes that defines the Web performance industry is the openly hostile relationships between IT and Business that exist within so many organizations. The desire to develop and ingrain a culture of Web performance improvement is lost in the turf battles between IT and Business.

If this energy could be channeled into proactive activity, the Web performance improvement process would be seen as beneficial to both IT and Business. But what this means is that there must be greater openness to involve the two parts of the organization in any Web performance improvement initiative.

Involving as many people as is relevant requires that all parts of the organization agree on how improvement will be measured, and what defines a successful Web performance improvement initiative.

Summary


Targets, Measurements, and Involvement are critical to Web performance initiatives. The highly technical nature of a Web site and the complexities of the business that this technology supports should push companies to find the simplest performance improvement process that they can. What most often occurs, however, is that these three simple process management ideas are quickly overwhelmed by time pressures, client demands, resource constraints, and internecine corporate warfare.

Web Performance: Outages and Reputation

In the last few months, I have talked on a couple of occasions on how an outage can affect a brand, be it personal or corporate [here and here].

Yesterday my servers experienced a 11-hour network outage due to a broken upstream BGP route.

It's sometimes scary to see how worn the cobbler's shoes are.

Sunday, September 21, 2008

GrabPERF Network Outage

Today, there was a network outage that affected the servers from September 21 2008 15:30 GMT until September 22 2008 01:45 GMT.

The data from this period has been cut and hourly averages have been re-calculated.

We apologize for the inconvenience.

Saturday, September 20, 2008

Metrics in Conversational and Community Marketing

There is clear dissatisfaction with the current state of marketing among the social media mavens.

So what can be done? Jeff Jarvis points out that the problem lies with measurement. I agree, as there is only value in a system where all of the people involved agree on what the metric of record will be, and how it can be validly captured.

Currently CPM is the agreed upon metric. In a feed based online world, how does a CPM model work? And, most importantly, why would I continue to place your ads on my site if all your doing is advertising to people based on the words on the page, rather than who is looking at the page and how often that page is looked at.

In effect, advertisers should be the ones thrying to figure out how to get into the community, get into the conversation. As an advertiser, don't you want to be where the action is? But how do you find an engaged audience in an online world that makes a sand castle on the beach in a hurricane look stable?

The challenge for advertisers is to be able to find the active communities and conversations effectively. The challenge for content creators and communities is to understand the value of their conversations, the interactions that people who visit the site have with the content.

In effect, a social media advertising model turns the current model on its head. Site owners and community creators gain the benefit of being attractive to advertisers because of the community, not because of the content. And site owners who understand who visits their site, what content most engages them, how they interact with the system will be able to reap the greatest rewards by selling their community as a marketable entity.

And Steven Hodson rounds out the week's think on communities by throwing out the subversive idea that communities are not always free (as in 'beer', not as in 'land of'). If a community has paid for the privilege of coming together to participate in communal events and discussions, then can't that become an area for site owners to further control the cost of advertising on their site?

While the benefit of reduced or no marketing content is the benefit of many for-pay communities, this benefit can be used by site owners by saying that an advertiser can have access to the for-pay community at the cost of higher ad rates and smaller ads. The free community is a completely different set of rules, but there are also areas in the free community that are of higher value than others.

In summary, the current model is broken. But there is no way to measure the value of a Twitter stream, a FriendFeed conversation, a Disqus thread, or a Digg rampage. And until there is, we are stuck with an ad model that based on the words on the page, and not the community that created the words.

Friday, September 19, 2008

Blog Advertising: Fred Wilson has Thoughts on Targeted Feed-vertising

Fred Wilson adds his thoughts to the conversation about a more intelligent way to target blog and social media advertising. His idea plays right into the ideas I discussed yesterday, ideas that emphasize that a new and successful advertising strategy can be dictated by content creators and bloggers by basing advertising rates on the level of interaction that an audience has with a post.

Where the model I proposed is one that is based on community and conversation, Fred sees an opportunityfor fims that can effectively inject advertising and marketing directly into the conversation, not added on as an afterthought.

Today's conversations take place in the streams of Twitter and FriendFeed, and are solidly founded on the ideas of community and conversation. They are spontaneous, unpredictable. Marketing into the stream requires a level of conversational intelligence that doesn't exist in contextual advertising. It is not simply the words on the screen, it is how those ads are being used.

For example, there is no sense trying to advertise a product on a page or in a conversation that is actively engaged in discussing the flaws and failings of that product. It makes an advertiser look cold, insensitive, and even ridiculous.

In his post, Fred presents examples of subtle, targeted advertising that appears in the streams of an existing conversation without redirecting or changing the conversation. As a VC, he recognizes the opportunity in this area.

Community and conversation focused marketing is potentially huge and likely very effective, if done in a way that does not drive people to filter their content to prevent such advertising. The advertisers will also have to adopt a clear code of behavior that prevents them from being seen as anything more than new-age spammers.

Why will it be more effective? It plays right to the marketers sweet spot: an engaged group, with a focused interest, creating a conversation in a shared community.

If that doesn't set of the buzzword bingo alarms, nothing will.

It is, however, also true. And the interest in this new model of advertising is solely drive by one idea: attention. I have commented on the attention economy previously, and I stick to my guns that a post, a conversation, a community that holds a person's attention in today's world of media and information saturation is one that needs to be explored by marketers.

Rob Crumpler and the team at BuzzLogic announced their conversation ad service yesterday (September 18 2008). This is likely the first move in this exciting new area. And Fred and his team at Union Square recognize the potential in this area.

Thursday, September 18, 2008

Blog Advertising: Toward a Better Model

This week, I have been discussing the different approaches to blog analytics that can be used to determine what posts from a blog's archive are most popular, and whether a blog is front-loaded or long-tailed. The thesis is that it's not always what the words in the blog are that are important.

In a guest post this morning at ProBlogger, Skellie discusses how the value of social media visitors is different and inherently more complex than the value of visitors generated from traditional methods, such as search and feedreaders. Her eight points further support my ideas that the old advertising models are not the best suited for the new blogging world.

Stepping away from the existing advertising models that have been used since blogging popularity exploded in 2005 and 2006, it is clear that the new, interactive social web model requires an advertising approach that centers on community and conversation, rather than the older idea of context and aggregated readership.

The Current Model


Current blog advertising falls into two categories:

  1. Contextual Ads. This is the Google model, and is based on the ad network auctioning off keywords and phrases to advertisers for the privilege of seeing their ad links or images appear on pages that contain those words or phrases.

  2. Sponsored Ads. Once a blog is popular enough and can prove a well-developed audience, the blogger can offer to sell space on his blog to advertisers who wish to have their products, offerings or companies presented to the target audience.


In my opinion, these two approaches fail blog owners.

Contextual ads understand the content of the page, but do not understand the popularity of the page, or its relationship to the popularity of other pages in the archive.Contextual ads lack a sense of community, a sense of conversation. While the model has proven successful, it does not maximize the reach that a blog has with its own audience.

Sponsored ads understand the audience that the blog reaches, but do not account for posts that draw the readers' attention for the longest time, both in terms of time spent reading and thinking about the post as well as over time in an historical sense. The sponsored ad model assumes that all posts get equal attention, or drive community and conversation to the same degree.

The New Model


In the new model, more effective use of visitor analytics is vital to shaping the type and value of the ads sold. Studying the visitor statistics of a blog will allow the owners to see whether the blog is, in general, front-loaded or long-tailed.

If the blog has a front-loaded audience, the most recent posts are of higher value and could be auctioned of at higher prices. In order for this to work, both the ad-hoster and the advertiser would have to agree to the value of the most recent posts using a proven and open statistical analysis methodology. In the case of front-loaded blogs, this analysis methodology would have to demonstrate that there is a higher traffic volume for posts that are between 0-3 days old (setting a hypothetical boundary on front-loading).

For blogs that are long-tailed, those posts that continue to draw consistent traffic would be valued far more highly than those that fall out into the general ebb and flow of a bloggers traffic. These posts have proven historically that they appear highly in search results and are visited often.

In addition to the posts themselves, the comment stream has to be considered. Posts that generate an active conversation are farmore valuable those that don't. Again, showing the value of the conversation is reliant of the ability to track the numbers of people in the conversation (through Disqus or some other commenting system).

This model can be further augmented by using a tool like Lookery that helps to clearly establish the demographics of the blog audience. Being able to pinpoint not only where on a blog to advertise but also who the visitors are who view those page, provides a further selling point for this new model and helps build faith in the virtues of a blog that sells space using this new, more effectively targeted advertising pricing structure.

Now, I separate the front-loaded and long-tailed blogs as if they are distinct. Obviously these categories apply to nearly every blog as there are new posts that suddenly capture the imagination of an audience, and there are older posts that continue to provide specific information that draws a steady stream of traffic to them.

Summary


This is a very early stage idea, one that has no code or methodology to support it. However, I believe that the current contextual advertising model, one based solely on the content of the post, is not allowing the content creators and blog entities to take advantage of their most valuable resource - their own posts and the conversations that they create.

I also believe that blog owners are not taking advantage of their own best resource, Web analytics, to help determine the price for advertising of their site. Not all blog posts are created or read equally. Being able to very clearly show what drives the most eyeballs to your site is a selling point that can be used in a variable-price advertising model.

By providing tools to blog owners that intimately link the analytics they already gather and the advertising space they have to sell, a new advertising model can arise, one that is uniquely suited to the new Web. This advertising model will be founded in the concepts of conversation and community, providing more discretely targeted eyeballs to advertisers, and higher ad revenues to blog owners and content creators.

UPDATES


Appears that BuzzLogic has already started down this path. VentureBeat has commentary here.

Wednesday, September 17, 2008

Web Performance: Blogs, Third Party Apps, and Your Personal Brand

The idea that blogs generate a personal brand is as old as the "blogosphere". It's one of those topics that rages through the blog world every few months. Inexorably the discussion winds its way to the idea that a blog is linked exclusively to the creators of its content. This makes a blog, no matter what side of the discussion you fall on, the online representation of a personal brand that is as strong as a brand generated by an online business.

And just as corporate brands are affected by the performance of their Web sites, a personal brand can suffer just as much when something causes the performance of a blog Web site to degrade in the eyes of the visitors. For me, although my personal brand is not a large one, this happened yesterday when Disqus upgraded to multiple databases during the middle of the day, causing my site to slow to a crawl.

I will restrain my comments on mid-day maintenance for another time.

The focus of this post is the effect that site performance has on personal branding. In my case, the fact that my blog site slowed to a near standstill in the middle of the day likely left visitors with the impression that my blog about Web performance was not practicing what it preached.

For any personal brand, this is not a good thing.

In my case, I was able to draw on my experience to quickly identify and resolve the issue. Performance returned to normal when I temporarily disabled the Disqus plugin (it has since been reactivated). However, if I hadn't been paying attention, this performance degradation could have continued, increasing the negative effect on my personal brand.

Like many blogs, Disqus is only one of the outside services I have embedded in my site design. Sites today rely on AdSense, Lookery, Google Analytics, Statcounter, Omniture, Lijit, and on goes the list. These services have become as omnipresent in blogs as the content. What needs to be remembered is that these add-ons are often overlooked as performance inhibitors.

Many of these services are built using the new models of the over-hyped and mis-understood Web 2.0. These services start small, and, as Shel Israel discussed yesterday, need to focus on scalability in order to grow and be seen as successful, rather than cool, but a bit flaky. As a result, these blog-centric services may affect performance to a far greater extent than the third-party apps used by well-established, commercial Web sites.

I am not claiming that any one of these services in and of themselves causes any form of slowdown. Each has its own challenges with scaling, capacity, and success. It is the sheer number of the services that are used by blog designers and authors poses the greatest potential problem when attempting to debug performance slowdowns or outages. The question in these instances, in the heat of a particularly stressful moment in time, is always: Is it my site or the third-party?

The advice I give is that spoken by Michael Dell: You can't manage what you can't measure. Yesterday, I initiated monitoring of my personal Disqus community page, so I could understand how this service affected my continuing Web performance. I suggest that you do the same, but not just of this third-party. You need to understand how all of the third-party apps you use affect how your personal brand performance is perceived.

Why is this important? In the mind of the visitor, the performance problem is always with your site. As with a corporate site that sees a sudden rise in response times or decrease in availability, it does not matter to the visitor what the underlying cause of the issue is. All they see is that your site, your brand (personal or corporate), is not as strong or reliable as they had been led to believe.

The lesson that I learned yesterday, one that I have taught to so many companies but not heeded myself, is that monitoring the performance of all aspects of your site is critical. And while you as the blog designer or writer might not directly control the third-party content you embed in your site, you must consider how it affects your personal brand when something goes wrong.

You can then make an informed decision on whether the benefit of any one third-party app is outweighed by the negative effect it has on your site performance and, by extension, your personal brand.

Tuesday, September 16, 2008

Blog Statistics Analysis: Page Views by Day of Week, or When to Post

Since I started self-hosting this blog again on August 6 2008, I have been trying to find more ways to pull traffic toward the content that I put up. Like all bloggers, I feel that I have important things to say (at least in the area of Web performance), and ideas that should be read by as many people as possible.

As well, I have realized that if I invest some time and effort into this blog, it can be a small revenue source that could get me that much closer to my dream of a MacBook Pro.

The Analysis


In a post yesterday morning, Darren Rowse had some advice on when the best time to release new post is. Using his ideas as the framework, I pulled the data out of my own tracking database and came up with the chart below. This shows the page view data between September 1 2007 and September 15 2008 based on the day of the week vistors came to the site.
Blog Page Views by Day of Week

Using this data and the general framework that Darren subscribes to, I should be releasing my best and newest thoughts in a week on Monday and Tuesday (GMT).

After Wednesday, I should release only less in-depth articles, with a focus on commentary on news and events. And I must learn to breathe, as I suffer from an ailment all to common in bipolars: a lack of patience.

A new post doesn't immediately find its target audience unless you have hundreds or thousands (Tens? Ones?) of readers who are influential. If you are luckyin this regard, then these folks will leave useful comments, and through their own attention, help gently show people that a new post is something they should devote their valuable attention towards.

It takes a while for any post to percolate through the intertubes. So patience you must have.

Front-loaded v Long-tailed


Unless, of course, your traffic model is completely different than a popular blogger.

The one issue that I had with Darren's guidance is that it applies only to blogs that are front-loaded. A front-loaded blog is one that is incredibly popular, or has a devoted, active audience who help push page views toward the most recent 3-5 posts. Once the wave has crested, or the blogger has posted something new, the volume of traffic to older posts falls off exponentially, except in the few cases of profound or controversial topics.

When I analyzed my own traffic, I found that the most of my traffic volume was aimed toward posts from 2005 and 2006. In fact, more recent posts are nowhere near as popular as these older posts. In contrast to the front-loaded blog, mine is long-tailed.

There are a number of influential items in my blog which have proven staying power, which draw people from around the world. They have had deep penetration into search engines, and are relvant to some aspect of peoples' lives that keeps pulling them back.

Summary


I would highly recommend analyzing your traffic to see it is front-loaded or long-tailed. I know that I wish that this blog  was more front-loaded, with an active community of readers and commentators. However, I am also happy to see that I have created a few sparks of content that keep people returning again and again. If your blog is  long-tailed, then when you post becomes far less relevant than ensuring the freshness and validity of those few popular posts. Ensure that these are maintained and current so that they remain relevant to as many people as possible.

Monday, September 15, 2008

Web Performance: A Review of Steve Souders' High Performance Web Sites

It's not often as a Web performance consulatant and analyst that I find a book that is useful to so many clients. It's much more rare to discover a book that can help most Web sites improve their response times and consistency in fewer than 140 pages.

Steve Souders' High Performance Web Sites (O'Reilly, 2007 - Companion Site) captures the essence of one-side of the Web performance problem succinctly and efficiently, delivering a strong message to a group he classifies as front-end engineers. It is written in a way that can be understood by marketing, line-of-business, and technical teams. It is written in a manner designed to provoke discussions within an organization with the ultimate goal of improving Web performance

Once these discussion have started, there may some shock withing these very organizations. Not only with the ease with which these rules can be implemented, but by the realization that the fourteen rules in this book will only take you so far.

The 14 Rules


Web performance, in Souders' world, can be greatly improved by applying his fourteen Web performance rules. For the record, the rules are:
Rule 1 - Make Fewer HTTP Requests
Rule 2 - Use a Content Delivery Network
Rule 3 - Add an Expires Header
Rule 4 - Gzip Components
Rule 5 - Put Stylesheets at the Top
Rule 6 - Put Scripts at the Bottom
Rule 7 - Avoid CSS Expressions
Rule 8 - Make JavaScript and CSS External
Rule 9 - Reduce DNS Lookups
Rule 10 - Minify JavaScript
Rule 11 - Avoid Redirects
Rule 12 - Remove Duplicate Scripts
Rule 13 - Configure ETags
Rule 14 - Make AJAX Cacheable

From the Companion Site [here]



These rules seem simple enough. And, in fact, most of them are easy to understand, and, in an increasingly complex technical world, easy to implement. In fact, the most fascinating thing about the lessons in this book, for the people who think about these things everyday, is that they are pieces of basic knowledge, tribal wisdom, that have been passed down for as long as the Web has existed.

Conceptually, the rules can be broken down to:

  • Ask for fewer things

  • Move stuff closer

  • Make things smaller

  • Make things less confusing


These four things are simple enough to understand, as they emphasize simplicity over complexity.

For Web site designers, these fourteen rules are critical to understanding how to drive better performance not only in existing Web sites, but in all of the sites developed in the future. They provide a vocabulary to those who are lost when discussions of Web performance occur. The fourteen rules show that Web performance can be improved, and that something can be done to make things better.

Beyond the 14 Steps


There is, however, a deeper, darker world beneath the fourteen rules. A world where complexity and interrelated components make change difficult to accomplish.

In a simple world, the fourteen rules will make a Web site faster. There is no doubt about that. They advocate for the reduction object size (for text objects), the location of content closer to the people requesting it (CDNs), and the optimization of code to accelerate the parsing and display of Web content in the browser.

Deep inside a Web site lives the presentation and application code, the guts that keep a site running. These layers, down below the waterline are responsible for the heavy lifting, the personalization of a bank account display, the retrieval of semantic search results, and the processing of complex, user-defined transactions. The data that is bounced inside a Web application flows through a myriad of network devices -- firewalls, routers, switches, application proxies, etc -- that can be as complex, if not more so, than the network complexity involved in delivering the content to the client.

It is fair to say that a modern Web site is the proverbial duck in a strong current.

The fourteen rules are lost down here beneath the Web layer. In these murky depths, far from the flash and glamor, parsing functions that are written poorly, database table without indices, internal networks that are poorly designed can all wreak havoc on a site that has taken all fourteen rules to heart.

When the content that is not directly controlled and managed by the Web site is added into this boiling stew, another layer of possible complexity and performance challenge appears. Third parties, CDNs, advertisers, helper applications all come from external sources that are relied on to have taken not only the fourteen rules to heart, but also to have considered how their data is created, presented, and delivered to the visitors to the Web site that appears to contain it.

Remember the Complexity


High Performance Web Sites is a volume (a pamphlet really) that delivers a simple message: there is something that can be done to improve the performance of a Web site. Souders' fourteen rules capture the items that can be changed quickly, and at low-cost.

However, if you ask Steve Souders' if this is all you need to do to have a fast, efficient, and reliable Web site, he should say no. The fourteen rules are an excellent start, as they handle a great deal of the visible disease that infects so many Web sites.

However, like the triathlete with an undiagnosed brain tumor, there is a lot more under the surface that needs to be addressed in order to deliver Web performance improvements that can be seen by all, and support rapid, scalable growth.

This is a book that must be read. Then deeper questions must be asked to ensure that the performance of the 90% of a Web site design not seen by visitors matches the 10% that is.

Sunday, September 14, 2008

Blog Statistics Analysis - What do your visitors actually read?

Steven Hodson of WinExtra posted a screenshot of his personal Wordpress stats for the last three years last night. I then posted my stats for a similar period of time, and Steven shot back with some question about traffic, and the ebbs and flows of readers.

Being the stats nut that I am, I went and pulled the data from my own tracking data, and came up with this.
Blog Posts Read Each Month, By Year Posted

I made a conscious choice to analyze what year the posts being read were posted in. I wanted to understand when people read my content, which content kept people coming back over and over again. The chart above speaks for itself: through most of the last year it's clear that the most popular posts were made in 2005.

What is also interesting is the decreasing interest in 2007 posts as 2008 progressed. Posts from 2006 remained steady, as there are a number of posts in that year that amount to my self-help guides to Web compression, mod_gzip, mod_deflate, and Web caching for Web administrators.

This data is no surprise to me, as I posted my rants against Gutter Helmet and their installation process in 2005. Those posts are still near the top of the Google search response for term "Gutter Helmet". And improving the performance of a Web site is of great interest to many Apache server admins and Web site designers.

What is also clear is that self-hosting my blog and the posting renaissance it has provoked has driven traffic back to my site.

So, what lessons did I learn from this data?

  1. Always remember the long tail. Every blogger wants to be relevant, on the edge, and showing that they understand current trends. The people who follow those trends are a small minority of the people who read blogs. Google and other search engines will expose them to your writings in the time of their choosing, and you may find that the three year-old post gets as much traffic as the one posted three hours ago

  2. Write often. I was in a blogging funk when my blog was at Wordpress.com. As a geek, I believe that the lack of direct control over the look and feel of my content was the cause of this. In a self-hosted environment, I feel thta I am truly the one in charge, and I can make this blog what I want.

  3. Be cautious of your fame. If your posts are front-loaded, i.e. if all your readers read posts from the month and year they are posted in, are you holding people's long-term attention? What have you contributed to the ongoing needs of those who are outside the technical elite? What will drive them to keep coming to your site in the long run?


So, I post a challenge to other bloggers out there. My numbers are miniscule compared to the blogging elite, but I am curious to get a rough sense of how the long tail is treating you.

Saturday, September 13, 2008

Web Performance: GrabPERF Performance Measurement System Needs YOU!

In 2004-2005, as a lark, I created my own Web performance measurement system, using PERL, PHP and MySQL. In August 2005, I managed to figure out how to include remote agents.

I dubbed it...GrabPERF. An odd name, but an amalgamation of "Grab" and "Performance" that made sense to my mind at the time. I also never though that it would go beyond my house, a couple of basement servers, and a cable modem.

In the intervening three years, I have managed to:

  • scale the system to handle over 250 individual measurements

  • involve nine remote measurement locations

  • move the system to the Technorati datacenter

  • provide key operational measurement data to system visitors


Although the system lives in the Technorati datacenter and is owned by them, I provide the majority of the day-to-day maintenance on a volunteer basis, if only to try and keep my limited coding skills up.

But this post is not about me. It's about GrabPERF.

Thanks to the help of a number of volunteers, I have measurement locations in the San Francisco Bay Area, Washington DC, Boston, Portugal, Germany and Argentina.

While this is a good spread, I am still looking to gather volunteers who can host a GrabPERF measurement location. The areas where GrabPERF has the most need are:

  • Asia-Pacific

  • South Asia (India, Pakistan, Bangladesh)

  • UK and Continental Europe

  • Central Europe, including the ancestral homeland of Polska


It would also be great to get a funky logo for the system, so if you are a graphic designer and want to create a cool GrabPERF logo, let me know.

The current measurement system requires Linux, cURL and a few add-on Perl modules. I am sure that I could work on other operating systems, I just haven't had the opportunity to experiment.

If you or your organization can help, please contact me using the GrabPERF contact form.

Friday, September 12, 2008

Web Performance: David Cancel Discusses Lookery Performance Strategies

David Cancel and I have had sort of a passing vague, same space and thought process, living in the same Metropolitan area kind of distant acquaintance for about the same year.

About 2-3 months ago, he wrote a pair of articles discussing the efforts he has undertaken in order to try and offload some of the traffic to the servers for his new company, Lookery. While they are not current, in the sense that time moves in one direction for most technical people, and is compressed into the events of the past eight hours and the next 30 minutes, these articles provide an insight that should not be missed.

These two articles show how easily a growing company that is trying to improve performance and customer experience can achieve measureable results on a budget that consists of can recycling money and green stamps.

Measuring your CDN


A service that relies on the request and downloading of a single file from a single location very quickly realizes the limitations that this model imposes as traffic begins to broaden and increase. Geographically diverse users begin to notice performance delays as they attempt to reach a single, geographically-specific server. And the hosting location, even one as large as Amazon S3, can begin to serve as the bottleneck to success.

David's first article examines the solution path that Lookery chose, which was moving the tag, which drives the entire opportunity for success in their business model, onto a CDN. With a somewhat enigmatic title (Using Amazon S3 as a CDN?), he describes how the Lookery team measured the distributed performance of their JS tag using a free measurement service (not GrabPERF) and compared various CDNs against the origin configuration that is based on the Amazon S3 environment.

This deceptively simple test, which is perfect for the type of system that Lookery uses, provided that team with the data they needed to realize that they had made a good choice in choosing a CDN and that their chosen CDN was able to deliver improved response times when compared to their origin servers.

Check your Cacheability


Cacheability is a nasty word that my spell-checker hates. To define it simply, it refers to the ability of end-user browsers and network-level caching proxies to store and re-use downloaded content based on clear and explicit caching rules delivered in the server response header.

The second Article in David's series describes how, using Mark Nottingham's Cacheability Engine, the Lookery team was able to examine the way that the CDNs and the Origin site informed the visitor browser of the cacheability of the JS file that they were downloading.

Cacheability doesn't seem that important until you remember that most small firms are very conscious of the Bandwidth outlay. These small startups arevery aware when their bandwidth usage reaches 250GB/month level (Lookery's bandwidth usage at the time the posts were written). Any method that can improve end-user performance while stilll delivering the service they expect is a welcome addition, especially when it is low-cost to free.

In the post, David notes that there appears to be no way in their chosen CDN to modify the Cacheability settings, an issue which appears to have been remedied since the article went up [See current server response headers for the Lookery tag here].

Conclusion


Startups spend a lot of time imagining what success looks like. And when it comes, sometimes they aren't ready for it, especially when it comes to the ability to handle increasing loads with their often centralized, single-location architectures.

David Cancel, in these two articles, shows how a little early planning, some clear goals, and targeted performance measurement can provide an organization with the information to get them through their initial growth spurt in style.

Wednesday, September 10, 2008

Web Performance: Your Teenage Web site

It's critical to your business. It affects revenue. It's how people who can't come to you perceive you.

It's your Web site.

Its complex. Abstract. Lots of conflicting ideas and forces are involved. Everyone says they now the best thing for it. Finger-pointing. Door slamming. Screaming.

Am I describing your Web site and the team that supports it? Or your teenager?

If you think of your Web site as a teenager, you begin to realize the problems that your facing. Like a teenager, it has grown physically and mentally, and, as a result, thinks its an experienced adult, ready to take on the world. However, let's think of your site as a teenager, and think back to how we, as teenagers (yeah, I'm old), saw the world.

MOM! This doesn't fit anymore!


Your Web site has grown as all of your marketing and customer service programs bear fruit. Traffic is increasing. Revenue is up. Everyone is smiling.

Then you wake up and realize that your Web site is too small for your business. This could mean that the infrastructure is overloaded, the network is tapped out, your connectivity is maxed, and your sysadmins, designers, and network teams are spending most of your day just firefighting.

Now, how can you grow a successful business, or be the hip kid in school, when your clothes don't fit anymore?

But, you can't buy an entire wardrobe every six months, so plan, consider your goals and destinations, and shop smart.

DAD! Everyone has one! I need to have one to be cool!


Shiny.

It's a word that has been around for a long time, and was revived (with new meaning) by Firefly. It means reflective, bright, and new. It's what attracts people to gold, mirrors, and highly polished vintage cars. In the context of Web sites, it's the eye-candy that you encounter in your browsing, and go "Our site needs that".

Now step back and ask yourself what purpose this new eye-candy will serve.

And this is where Web designers and marketing people laugh, because it's all about being new and improved.

But can you be new and improved, when your site is old and broken?

Get your Web performance in order with what you, then add the stuff that makes your site pop.

But those aren't the cool kids. I don't hang with them.


Everyone is attracted to the gleam of the cool new Web sites out there that offer to do the same old thing as your site. The promise of new approaches to old problems, lower cost, and greater efficiencies in our daily lives are what prompt many of us to switch.

As a parent, we may scoff, realizing that maybe the cool kids never amounted to much outside of High School. But, sometimes you have to step back and wonder what makes a cool kid cool.

You have to step back and say, why are they attracting so much attention and we're seen as the old-guard? What can we learn from the cool kids? Is your way the very best way? And says who?

And once you ask these questions, maybe you agree that some of what the cool kids do is, in fact, cool.

Can I borrow the car?


Trust is a powerful thing to someone, or to a group. Your instinctive response depends on who you are, and what your experiences with others have been like in the past.

Trust is something often found lacking when it comes to a Web site. Not between your organization and your customers, but between the various factions within your organization who are trying to interfere or create or revamp or manage the site.

Not everyone has the same goals. But sometimes asking a few questions of other people and listening to their reasons for doing something will lead to a discussion that will improve the Web site in a way that improves the business in the long run.

Sometimes asking why a teenager wants to borrow the car will help you see things from their perspective for a little while. You may not agree, but at least now it's not a yes/no answer.

YOU: How was school today? - THEM: Ok.


Within growing organizations, open and clear communication tends to gradually shrivel and degenerate. Communications become more formal, with what is not said being as important as what is. Trying to find out what another department is doing becomes a lot like determining the state of the Soviet Union's leadership based on who attends parades in Red Square.

Abstract communication is one of the things that separates humans from a large portion of the rest of the animal kingdom. There is nothing more abstract than a Web site, where physical devices and programming code produce an output that can only be seen and heard.

The need for communication is critical in order to understand what is happening in another department. And sometimes that means pushing harder, making the other person or team answer hard questions that they think you're not interested in, or that you is non of your business.

If you are in the same company, it's everyone's business. So push for an answer, because working to create an abstract deliverable that determines the success or failure of the entire firm can't be based on a grunt and a nod.

Summary


There are no easy answers to Web performance. But if you consider your Web site and your teams as a teenager, you will be able to see that the problems that we all deal with in our daily interactions with teens crop up over an over when dealing with Web design, content, infrastructure, networks and performance.

Managing all the components of a Web site and getting best performance out of it often requires you to have the patience of Job. But it is also good to carry a small pinch of faith in these same teams, faith  that everyone, whether they say it or not, wants to have the best Web site possible.

Tuesday, September 9, 2008

Thoughts on Web Performance at the Browser

Last week, lost in the preternatural shriek that emerged from the Web community around the release of Google Chrome, John Resig posted a thoughtful post on resources usage at the browser. In it, he states that the use of the Process Manager in Chrome will change how people see Web performance. In his words:

The blame of bad performance or memory consumption no longer lies with the browser but with the site.



Coming to the discussion from the realm of Web performance measurement, I realize that the firms I have worked with and for have not done a good job of analyzing this , and, in the name of science have tried to eliminate the variability of Web page processing from the equation.

The company I currently work for has realized that this is a gap and has released a product that measures the performance of a page in the browser.

But all of this misses the point, and goes to one of the reasons why I gave up on Chrome on my older, personal-use computer: Chrome exposes the individual load that a page places on a Web browser.

Resig highlights that browser that make use of shared resources shift the blame about poor performance out to the browser and away from the design of the page. Technologies that modern designers lean on (Flash, AJAX, etc.) all require substantially greater resource-consumption in a browser. Chrome, for good or ill, exposes this load to the user be instantiating a separate, sand-boxed process for each tab, clearing indicating which page is the culprit.

It will be interesting if designers take note of this, or ignore in pursuit of the latest shiny toy that gets released. While designers assume that all visitors run the cutting edge of machine, I can show them that a laptop that is still plenty useful is completely locked up when their page is handled in isolation.

Monday, September 8, 2008

Bipolar Lives: Living with Bipolar in an Insane World

This morning I launched Bipolar Lives, a blog that discusses the broad issues and personal challenges of living with Bipolar Syndrome.

Readers of this blog will know that I was diagnosed with Bipolar I in 2006. It's a condition I am very open about and that is a challenge (and an opportunity) that I live with. Medication, therapy, and a loving and very understanding family help me make through each day.

Bipolar Lives will present research, ramblings, personal experiences, and other things of interest to people with Bipolar.

Come over if you want to learn a little about how we see the normal people.

Sunday, September 7, 2008

Browser Wars II: Why I returned to Firefox

Since the release of Google Chrome on September 2, I have been using it as my day-to-day browser. Spending up to 80% of my computer time in a browser means that this was decision which affected a huge portion of my online experience.

I can say that I put Chrome through its paces, on a wide-variety of sites, from the simple to the extremely content-rich. From the mainstream, to the questionable.

This morning I migrated back to Firefox, albeit the latest Minefield/Firefox 3.1alpha.

The reasons listed below are mine. Switching back is a personal decision and everyone is likely to have their own reasons to do it, or to stay.

Advertising


I mentioned a few times during my initial use of Chrome that I was having to become used to the re-appearance of advertising in my browsing experience [here and here]. From their early release as extensions to Firefox, I have used AdBlock and AdBlock Plus to remove the annoyance and distraction of online ads from my browsing experience.

When I moved to Chrome, I had to accept that I would see ads. I mean, we were dealing with a browser distributed by one of the largest online advertising agencies. It could only be expected that they were not going to allow people to block ads out of the gate, if ever.

As the week progressed, I realized that I was finding the ads to be a distraction from my browsing experience. Ads impede my ability to find the information I need quickly.

Older Machines


My primary machine for online experiences at home is a Latitude D610. This is a 3-4 year-old laptop, with a single core. It is still far more computing power than most people actually need to enjoy the Web.

While cruising with Chrome, I found that Flash locked up the entire machine on a very regular basis. Made it unsuable. This doesn't happen on my much more powerful Latitude D630, provided by my work. However, as I have a personal laptop, I am not going to use my work computer for my personal stuff, especially at home.

I cannot have a browser that locks up a machine when I simply close a tab. It appears that the vaunted QA division at Google overlooked the fact that people don't all run the latest and greatest machines in the real world.

Auto-Complete


I am completely reliant on form auto-completes. Firefox has been doing this for me for a long time, and it is very handy to simply start typing and have Firefox say "Hey! This form element is called email. Here are some of the other things you have put into form elements called email."

If you can build something as complex as the OmniBox, surely you can add form auto-completes.

The OmniBox


I hate it. I really do. I like having my search and addresses separate. I also like an address bar that remembers complete URLs (including those pesky parameters!), rather than simply the top-level domain name.

It is a cool idea, but it needs some refining, and some customer-satisfaction focus groups.

I Don't Use Desktop-replacing Web Applications


I do almost all of my real work in desktop-installed Web applications. I have not made the migration to Web applications. I may in the future. But until then, I do not need a completely clean browsing experience. I mentioned that the battle between Chrome and Firefox will come down to the Container v. the Desktop - a web application container, or a desktop-replacing Web experience application.

In the last 48 hours, I have fallen back into the Web-desktop camp.

Summary


In the future, I will continue to use Chrome to see how newer builds advance, and how it evolves as more people begin dictating the features that should be available to it.

For my personal use, Chrome takes away too much from, and injects too much noise into, my daily Web experience to continue to use as the default browser. To quote more than a few skeptics of Chrome when it was relased - "It's just another browser".

Friday, September 5, 2008

DNS: Without it, your site does not exist

In my presentations and consultations on Web performance, I emphasize the importance of a correctly configured DNS system with the phrase: "If people can't resolve your hostname, your site is dead in the water".

Yesterday, it appears that the large anti-virus and security firm Sophos discovered this lesson the hard way.

Of course hindsight is perfect, so I won't dwell for too long on this single incident. The lesson to be learned here is that DNS is complex and critical, yet is sometimes overlooked when considered the core issues of Web performance and end-user experience.

This complexity means that if an organization is not comfortable managing their own DNS, or want to broaden and deepen their DNS infrastructure, there are a large number of firms who will assist with this process. These firms whose entire business is based on managing large-scale DNS implementations for organizations.

DNS is critical. Never take it for granted.

Joost: A change to the program

In April 2007, I tried out the Joost desktop client.  [More on Joost here and here]

I was underwhlemed by the performance, and the fact that the application completely maxxed out my dual core CPU, my 2G of RAM, and my high-speed home broadband. I do remember thinking at the time that it seemed weird to have a Desktop Client in the first place. Well, as Om Malik reports this morning, it seems that I was not alone.

After this week's hoopla over Chrome, moving in the direction of the browser seems like a wise thing to do. But I definitely hear far more buzz over Hulu than I do for Joost on the intertubes.

Update


Michael Arrington and TechCrunch weigh into the discussion.

Thursday, September 4, 2008

Web Performance, Part IX: Curse of the Single Metric

While this post is aimed at Web performance, the curse of the single metric affects our everyday lives in ways that we have become oblivious to.

When you listen to a business report, the stock market indices are an aggregated metric used to represent the performance of a set group of stocks.

When you read about economic indicators, these values are the aggregated representations of complex populations of data, collected from around the country, or the world.

Sport scores are the final tally of an event, but they may not always represent how well each team performed during the match.

The problem with single metrics lies in their simplicity. When a single metric is created, it usually attempts to factor in all of the possible and relevant data to produce an aggregated value that can represent a whole population of results.

These single metrics are then portrayed as a complete representation of this complex calculation. The presentation of this single metric is usually done in such a way that their compelling simplicity is accepted as the truth, rather than as a representation of a truth.

In the area of Web performance, organizations have fallen prey to this need for the compelling single metric. The need to represent a very complex process in terms that can be quickly absorbed and understand by as large a group of people as possible.

The single metrics most commonly found in the Web performance management field are performance (end-to-end response time of the tested business process) and availability (success rate of the tested business process). These numbers are then merged and transformed by data from a number of sources (external measurements, hit counts, conversions, internal server metrics, packet loss), and this information is bubbled up in an organization. By the time senior management and decision-makers receive the Web performance results, that are likely several steps removed from the raw measurement data.

An executive will tell you that information is a blessing, but only when it speeds, rather than hinders, the decision-making process. A Web performance consultant (such as myself) will tell that basing your decisions on a single metric that has been created out of a complex population of data is madness.

So, where does the middle-ground lie between the data wonks and the senior leaders? The rest of this post is dedicated to introducing a few of the metrics that will, in a small subset of metrics, give a senior leaders better information to work from when deciding what to do next.

A great place to start this process is to examine the percentile distribution of measurement results. Percentiles are known to anyone who has children. After a visit to the pediatrician, someone will likely state that "My son/daughter is in the XXth percentile of his/her age group for height/weight/tantrums/etc". This means that XX% of the population of children that age, as recorded by pediatricians, report values at or below the same value for this same metric.

Percentiles are great for a population of results like Web performance measurement data. Using only a small set of values, anyone can quickly see how many visitors to a site could be experiencing poor performance.

If at the median (50th percentile), the measured business process is 3.0 seconds, this means that 50% of all of the measurements looked at are being completed in 3.0 seconds or less.

If the executive then looks up to the 90th percentile and sees that it's at 16.0 seconds, it can be quickly determined that something very bad has happened to affect the response times collected for the 40% of the population between these two points. Immediately, everyone knows that for some reason, an unacceptable number of visitors are likely experiencing degraded and unpredictable performance when they visit the site.

A suggestion for enhancing averages with percentiles is to use the 90th percentile value as a trim ceiling for the average. Then side-by-side comparisons of the untrimmed and trimmed averages can be compared. For sites with a larger number of response time outliers, the average will decrease dramatically when it is trimmed, while sites with more consistent measurement results will find their average response time is similar with and without the trimmed data.

It is also critical to examine the application's response times and success rates throughout defined business cycles. A single response time or success rate value eliminates

  • variations by time of day

  • variations by day of week

  • variations by month

  • variations caused by advertising and marketing


An average is just an average. If at peak buiness hours, response times are 5.0 seconds slower than the average, then the average is meaningless, as business is being lost to poor performance which has been lost in the focus on the single metric.

All of these items have also fallen prey to their own curse of the single metric. All of the items discussed above aggregate the response time of the business process into a single metric. The process of purchasing items online is broken down into discrete steps, and different parts of this process likely take longer than others. And one step beyond the discrete steps are the objects and data that appear to the customer during these steps.

It is critical to isolate the performance for each step of the process to find the bottlenecks to performance. Then the components in those steps that cause the greatest response time or success rate degradations must be identified and targeted for performance improvement initiatives. If there are one or two poorly performing steps in a business process, focusing performance improvement efforts on these is critical, otherwise precious resources are being wasted in trying to fix parts of the application that are working well.

In summary, a single metric provides a sense of false confidence, the sense that the application can be counted on to deliver response times and success rates that are nearly the same as those simple, single metrics.

The average provides a middle ground, a line that says that is the approximate mid-point of the measurement population. There are measurements above and below this average, and you have to plan around the peaks and valleys, not the open plains. It is critical never to fall victim to the attractive charms that come with the curse of the single metric.

GrabPERF: State of the System

This is actually a short post to write, as the state of the GrabPERF system is currently very healthy. There was an eight-hour outage in early August 2008, but that was a fiber connectivity issue, not a system issue.

Over the history of ther service, we have been steadily increasing the number of measurements we take each day.

grabperf-measurements-per-day 



The large leap occurred when a very large number of tests were added to the system on a single day. But based on this data, the system is gathering more than 900,000 measurements every day.

Thanks to all of the people who volunteer their machines and bandwidths to support this effort!

Chrome v. Firefox - The Container and The Desktop

The last two days of using Chrome have had me thinking about the purpose of the Web browser in today's world. I've talked about how Chrome and Firefox have changed how we see browsers, treating them as interactive windows into our daily life, rather than the uncontrolled end of an information firehose.

These applications, that on the surface seem to serve the same purpose, have taken very different paths to this point. Much has been made about Firefox growing out of the ashes of Netscape, while Chrome is the Web re-imagined.

It's not just that.

Firefox, through the use of extensions and helper applications, has grown to become a Desktop replacement. Back when Windows for Workgroups was the primary end-user OS (and it wasn't even an OS), Norton Desktop arrived to provide all of the tools that didn't ship with the OS. It extended and improved on what was there, and made WFW a better place.

Firefox serves that purpose in the browser world. With its massive collections of extensions, it adds the ability to customize and modify the Web workspace. These extensions even allow the incoming content to be modified and reformatted in unique ways to suit the preferences of each individual. These features allowed the person using Firefox to feel in control, empowered.

You look at the Firefox installs of the tech elite, and no two installed versions will be configured in the same way. Firefox extends the browser into an aggregator of Web data and information customization.

But it does it at the Desktop.

Chrome is a simple container. There is (currently) no way to customize the look and feel, extend the capabilities, or modify the incoming or outgoing content. It is a simple shell designed to perform two key functions: search for content and interact with Web applications.

There are, of course, the hidden geeky functions that they have built into the app. But those don't change what it's core function is: request, receive, and render Web pages as quickly and efficiently as possible. Unlike Firefox's approach, which places the app being the center of the Web, Chrome places the Web at the center of the Web.

There is no right or wrong approach. As with all things in this complicated world we are in, it depends. It depends on what you are trying to accomplish and how you want to get there.

The conflict that I see appearing over the next few months is not between IE and Firefox and Safari and Opera and Chrome. It is a conflict over what the people want from an application that they use all the time. Do they want a Web desktop or a Web container?

Wednesday, September 3, 2008

Chrome and Advertising - Google's Plan

Since I downloaded and started using Chrome yesterday, I have had to rediscover the world of online advertising. Using Firefox and Adblock Plus for nearly three years has shielded from their existence for the most part.

Stephen Noble, in a post on the Forrester Blog for Interactive Marketing Professionals, seems to discover that Chrome will be a source for injecting greater personalization and targeting into the online advertising market.

This is the key reason Chrome exists, right now.

While their may be discussions about the online platform and hosted applications, there are only a small percentage of Internet users who rely on hosted desktop-like applications, excluding email, in their daily work and life.

However, Google's biggest money-making ventures are advertising and search. With control of AdSense and DoubleClick, there is no doubt that Google controls a vast majority of the targeted and contextual advertising market, around the world.

One of the greatest threats to this money-making is a lack of control of the platform through which ads are delivered. There is talk of IE8 blocking ads (well, non-Microsoft ads anyway), and one of the more popular extensions for Firefox is Adblock Plus. While Safari doesn't have this ability natively built in, it can be supported by any number of applications that, in the name of Internet security, filter and block online advertisers using end-user proxies.

This threat to Google's core revenue source was not ignored in the development of Chrome. One of the options is the use of DNS pre-fetching. Now I haven't thrown up a packet sniffer, but what's to prevent a part of the pre-fetching algorithm to go beyond DNS for certain content, and pre-fetch the whole object, so that the ads load really fast, and in that way are seen as less intrusive.

Ok, so I am noted for having a paraoid streak.

However, using the fastest rendering engine and a rocket-ship fast Javascript VM is not only good for the new generation of online Web applications, but plays right into the hands of improved ad-delivery.

So, while Chrome is being hailed as the first Web application environment, it is very much a context Web advertising environment as well.

It's how it was built.

Hit Tracking with PHP and MySQL

Recently there was an outage at a hit-tracking vendor I was using to track the hits on my externally hosted blog, leaving me with a gap in my visitor data several hours long. While this was an inconvenience for me, I realized that this could be mission critical failure to an online business reliant on this data.

To resolve this, I used the PHP HTTP environment variables and the built-in function for converting IP addresses to IP numbers to create my own hit-tracker. It is a rudimentary tracking tool, but it provides me with the basic information I need to track visitors.

To begin, I wrote a simple PHP script to insert tracking data into a MySQL database. How do you do that? You use the gd features in PHP to draw an image, and insert the data into the database.




header ("Content-type: image/png");

include("dbconnect_logger.php");
$logtime = date("YmdHis");
$ipquery = sprintf("%u",ip2long($_SERVER['REMOTE_ADDR']));

$query2 = "INSERT into logger.blog_log values \
($logtime,$ipquery,'$HTTP_USER_AGENT','$HTTP_REFERER')";
mysql_query($query2) or die("Log Insert Failed");

mysql_close($link);

$im = @ImageCreate (1, 1)
or die ("Cannot Initialize new GD image stream");
$background_color = ImageColorAllocate ($im, 224, 234, 234);
$text_color = ImageColorAllocate ($im, 233, 14, 91);

// imageline ($im,$x1,$y1,$x2,$y2,$text_color);
imageline ($im,0,0,1,2,$text_color);
imageline ($im,1,0,0,2,$text_color);

ImagePng ($im);
?>



Next, I created the database table.




DROP TABLE IF EXISTS `blog_log`;
CREATE TABLE `blog_log` (
`date` timestamp NOT NULL default '0000-00-00 00:00:00',
`ip_num` double NOT NULL default '0',
`uagent` varchar(200) default NULL,
`visited_page` varchar(200) NOT NULL default '',
UNIQUE KEY `date` (`date`,`ip_num`,`visited_page`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;



It's done. I can now log any request I want using this embedded tracker.

Data should begin flowing to your database immediately. This sample snippet of code will allow you to pull data for a selected day and list each individual hit.




$query1 = "SELECT
bl.ip_num,
DATE_FORMAT(bl.date,'%d/%b/%Y %H:%i:%s') AS NEW_DATE,
bl.uagent,
bl.visited_page
FROM blog_log bl
WHERE
DATE_FORMAT(bl.date,'%Y%m%d') ='$YMD'
and uagent not REGEXP '(.*bot.*|.*crawl.*|.*spider.*|^-$|.*slurp.*|.*walker.*|.*lwp.*|.*teoma.*|.*aggregator.*|.*reader.*|.*libwww.*)'
ORDER BY bl.date ASC";

print "<table border=\"1\">\n";
print "<tr><td>IP</td><td>DATE</td><td>USER-AGENT</td><td>PAGE VIEWED</td></tr>";
while ($row = mysql_fetch_array($result1)) {
$visitor = long2ip($row[ip_num]);
print "<tr><td>$visitor</td><td nowrap>$row[NEW_DATE]</td><td nowrap>$row[uagent]</td><td>";

if ($row[visited_page] == ""){
print " --- </td></tr>\n";
} else {
print "<a href=\"$row[visited_page]\" target=\_blank\">$row[visited_page]</a></td></tr>\n";
}

}

mysql_close($link);



And that's it. A few lines of code and you're done. With a little tweaking, you can integrate the IP number data with a number of Geographic IP databases available for purchase to track by country and ISP, and using graphics applications for PHP, you can add graphs.

For my own purposes, this is an extension of the Geographic IP database I created a number of years ago. This application extracts IP address information from the five IP registrars, and inserts it into a database. Using the log data collected by the tracking bug above and the lookup capabilities of the Geographic IP database, I can quickly track which countries and ISP drive the most visitors to my site, and use this for general interest purposes, as well as the ability to isolate any malicious visitors to the site.

Tuesday, September 2, 2008

Browsers: The Window and The Firehose

Three years ago, in a post on this blog, I stated that I thought that the browser was becoming less important as more data moved into streams of data through RSS and aggregated feeds, as well as a raft of other consumer-oriented Web services.

This position was based on the assumption that the endpoint, in the form of installed applications, wouldcontinue to serve as the focus for user interactions, that these applications would be the points where data was accumulated and processed by users. This could be best described as the firehose: The end-user desktop would be at the end of a flood of data being pushed to it a never-ending flood.

Firefox and Chrome have changed all of that.

The browser has, instead, become the window through which we view and manipulate our data. It's now ok, completely acceptable in fact, to use online applications as replacements for installed applications, stripping away a profit engine that has fed so many organizations over the years.

The endpoint has been shown to be the access point to our applications, to our data. Data is not brought and stored locally: It is stored remotely and manipulated like a marionette from afar.

While Chrome and Firefox are not perfect, they serve as powerful reminders of what the Web is, and why the browser exists. The Browser is not the end of a flod of incoming data, it is the window through which we see our online world.

While some complain that there is still an endless stream of data, we control and manipulate it. It doesn't flood us.