Wednesday, August 30, 2006

Web Performance, Part V: Baseline Your Data

Up to this point, the series has focused on the mundane world of calculating statistical values in order to represent your Web performance data in some meaningful way. Now we step into the more exciting (I lead a sheltered life) world of analyzing the data to make some sense from it.

When companies sign up with a Web performance company, it has been my experience that the first thing that they want to do is get in there and push all the buttons and bounce on the seats. This usually involves setting up a million different measurements, and then establishing alerting thresholds for every single one of them that is of critical importance, emailed to the pagers of the entire IT team all the time.

Well interesting, it is also a great way for people to begin to actually ignore the data because:

  1. It's not telling them what they need to know

  2. It's telling them stuff when they don't need to know it.

When I speak to a company for the first time, I often ask what their key online business processes are. I usually get either stunned silence or "I don't know" as a response. Seriously, what has been purchased is a tool, some new gadget that will supposedly make life better; but no thought has been put into how to deploy and make use of the data coming in.

I have the luxury of being able to concentrate on one set of data all the time. In most environments, the flow of data from systems, network devices, e-mail updates, patches, business data simply becomes noise to be ignored until someone starts complaining that something is wrong. Web performance data becomes another data flow to react to, not act on.

So how do you begin to corral the beast of Web performance data? Start with the simplest question: what do we NEED to measure?

If you talked to IT, Marketing and Business Management, they will likely come up with three key areas that need to be measured:

  1. Search

  2. Authentication

  3. Shopping Cart

Technology folks say, but that doesn't cover the true complexity of our relational, P2P, AJAX-powered, social media, tagging Web 2.0 site.

Who cares! The three items listed above pay the bills and keep the lights on. If one of these isn't working, you fix it now, or you go home.

Now, we have three primary targets. We're set to start setting up alerts, and stuff, right?

Nope. You don't have enough information yet.


This is your measurement after the first day. This gives you enough information to do all those bright and shiny things that you've heard your new Web performance tool can do, doesn't it?


Here's the same measurement after 4 days. Subtle but important changes have occurred. The most important of these is that the first day that data was gathered happened to be on a Friday night. Most sites would agree that the performance on a Friday night is far different than what you would find on a Monday morning. Monday morning shows this site showing a noticeable performance shift upward.

And what do you do when your performance looks like this?


Baselining is the ability to predict the performance of your site under normal circumstances on an ongoing basis. This is based on the knowledge that comes from understanding how the site has performed in the past, as well as how it has behaved under situations of abnormal behavior. Until you can predict how your site should behave, you can begin to understand why it behaves the way it does.

Focusing on the three key transaction paths or business processes listed above helps you and your team wrap your head around what the site is doing right now. Once a baseline for the site's performance exists, then you can begin to benchmark the performance of your site by comparing it to others doing the same business process.

Web Performance, Part IV: Finding The Frequency

In the last article, I discussed the aggregated statistics used most frequently to describe a population of performance data.


The pros and cons of each of these aggregated values has been examined, but now we come to the largest single flaw: these values attempt to assign a single value to describe an entire population of numbers.

The only way to describe a population of numbers is to do one of two things: Display every single datapoint in the population against the time it occurred, producing a scatter plot; or display the population as a statistical distribution.

The most common type of statistical distribution used in Web performance data is the Frequency Distribution. This type of display breaks the population down into measurements of a certain value range, then graphs the results by comparing the number of results in each value container.

So, taking the same population data used in the aggregated data above, the frequency distribution looks like this.


This gives a deeper insight into the whole population, by displaying the whole range of measurements, including the heavy tail that occurs in many Web performance result sets. Please note that a statistical heavy tail is essentially the same as Chris Anderson's long tail, but in statistical analysis, a heavy tail represents a non-normally distributed data set, and skews the aggregated values you try and produce from the population.

As was noted in the aggregated values, the 'average' performance like falls between 0.88 and 1.04 seconds. Now, when you take these values and compare them to the frequency distribution, these values make sense, as the largest concentration of measurement values falls into this range.

However, the 85th Percentile for this population is at 1.20 seconds, where there is a large secondary bulge in the frequency distribution. After that, there are measurements that trickle out into the 40-second range.

As can be seen, a single aggregated number cannot represent all of the characteristics in a population of measurements. They are good representations, but that's all they are.

So, to wrap up this flurry of a visit through the world of statistical analysis and Web performance data, always remember the old adage: Lies, Damn Lies, and Statistics.

In the next article, I will discuss the concept of performance baselining, and how this is the basis for Web performance evolution.

Tuesday, August 29, 2006

Web Performance, Part III: Moving Beyond Average

In the previous article in this series, I talked about the fallacy of 'average' performance. Now that this has been dismissed, what do I propose to replace it with. There are three aggregated values that can be used to better represent Web performance data:

The links take you to articles that better explain the math behind each of these statistics. The focus here is why you would choose to use them rather than Arithmetic Mean.

The Median is the central point in any population of data. It is equal to the calculated value of the 50th Percentile, and is the point where half of the population lies above and below. So, in a large population of data, it can provide a good estimation of where center or average performance value is, regardless of the outliers at either end of the scale.

Geometric Mean is, well, a nasty calculation that I prefer to allow programmatic functions to handle for me. The advantage that it has over the Arithmetic Mean is that is influenced less by the outliers, producing a value that is always lower than or equal to the Arithmetic Mean. In the case of Web performance data, with populations of any size, the Geometric Mean is always lower than the Arithmetic Mean.

The 85th Percentile is the level below which 85% of the population of data lies. Now, some people use the 90th or the 95th, but I tend to cut Web sites more slack by granting them a pass on 15% of the measurement population.

So, what do these values look like?


These aggregated performance values are extracted from the same data population. Immediately, some things become clear. The Arithmetic Mean is higher than the Median and the Geometric Mean, by more than 0.1 seconds. The 85th Percentile is 1.19 seconds and indicates that 85% of all measurements in this data set are below this value.

Things that are bad to see:

  • An Arithmetic Mean that is substantially higher than the Geometric Mean and the Median

  • An 85th Percentile that is more than double the Geometric Mean

In these two cases, it indicates that there is a high number of large values in the measurement population, and that the site is exhibiting consistency issues, a topic for a later article in this series.

In all, these three metric provide a good quick hit, a representative single number that you can present in a meeting to say how the site is performing. But they all suffer from the same flaw -- you cannot represent the entire population with an entire number.

The next article will discuss Frequency Distributions, and their value in the Web performance analysis field.

Web Performance, Part II: What are you calling average?

For a decade, the holy grail of Web performance has been a low average performance time. Every company wants to have the lowest time, in some kind of chest-thumping, testosterone-pumped battle for supremacy.

Well, I am here to tell you that the numbers you have been using for the last decade have been lying. Well, lying is perhaps to strong a term. Deeply misleading is perhaps the more accurate way to describe the way that an average describes a population of results.

Now before you call your Web performance monitoring and measurement firms and tear a strip off them, let's look at the facts. The numbers that everyone has been holding up as the gospel truth have been averages, or, more correctly, Arithmetic Means. We all learned these in elementary school: the sum of X values divided by X produces a value that approximates the average value for the entire population of X values.

Where could this go wrong in Web performance?

We wandered off course in a couple of fundamental ways. The first is based on the basic assumption of Arithmetic Mean calculations, that the population of data used is more or less Normally Distributed.

Well folks, Web performance data is not normally distributed. Some people are more stringent than I am, but my running assumption is that in a population of measurements, up to 15% are noise resulting from "stuff happens on the Internet". This outer edge of noise, or outliers, can have a profound skewing effect on the Arithmetic Mean for that population.

"So what?", most of you are saying. Here's the kicker: As a result of this skew, the Arithmetic Mean usually produces a Web performance number that is higher than the real average of performance.

So why do we use it? Simple: Relational databases are really good at producing Arithmetic Means, and lousy at producing other statistical values. Short of writing your own complex function, which on most database systems equates to higher compute times, the only way to produce more accurate statistical measures is to extract the entire population of results and produce the result in external software.

If you are building an enterprise class Web performance measurement reporting interface, and you want to calculate other statistical measures, you better have deep pockets and a lot of spare computing cycles, because these multi-million row calculations will drain resources very quickly.

So, for most people, the Arithmetic Mean is the be all and end all of Web performance metrics. In the next part of this series, I will discuss how you can break free of this madness and produce values that are truer representations of average performance.

Web Performance, Part I: Fundamentals

If you ask 15 different people what the phrase Web performance means to them, you will get 30 different answers. Like all things in this technological age, the definition is in the eye of the beholder. To the Marketing person, it is delivering content to the correct audience in a manner that converts visitors into customers. To the business leader, it is the ability of a Web site to deliver on a certain revenue goal, while managing costs and creating shareholder/investor value.

For IT audiences, the mere mention of the phrase will spark a debate that would frighten the UN Security Council. Is it the Network? The Web server? The designers? The application? What is making the Web site slow?

So, what is Web performance? It is everything mentioned above, and more. Working in this industry for nine years, I have heard all facets of the debate. And all of the above positions will appear in every organization with a Web site to varying degrees.

In this ongoing series, I will examine various facets of Web performance, from the statistical measures used to truly analyze Web performance data, to the concepts that drive the evolution of a company from "Hey, we really need to know how fast our Web page loads" to "We need to accurately correlate the performance of our site to traffic volumes and revenue generation".

Defining Web performance is much harder than it seems. It's simplest metrics are tied into the basic concepts of speed and success rate (availability). These concepts have been around a very long time, and are understood all the way up to the highest levels of any organization.

However, this very simple state is one that very few companies manage to evolve away from. It is the lowest common denominator in Web performance, and only provides a mere scraping of the data that is available within every company.

As a company evolves and matures in its view toward Web performance, the focus shifts away from the basic data, and begins to focus on the more abstract concepts of reliability and consistency. These force organizations to step away from the aggregated and simplistic approach of speed and availability, to a place where the user experience component of performance is factored into the equation.

After tackling consistency and reliability, the final step is toward performance optimization. This is a holistic approach to Web performance, a place where speed and availability data are only one component of an integrated whole. Companies at this strata are usually generating their own performance dashboards with combinations of data sources that correlate disparate data sources in a way that provides a clear and concise view not only of the performance of their Web site, but also of the health of their entire online business.

During this series, I will refer to data and information very frequently. In today's world, even after nearly a decade of using Web performance tools and services, most firms only rely on data. All that matters is that the measurements arrive.

The smartest companies move to the next level and take that data and turn it into information, ideas that can shape the way that they design their Web site, service their customers, and view themselves against the entire population of Internet businesses.

This series will not be a technical HOWTO on making your site faster. I cover a lot of that ground in another of my Web sites. It will also not be data heavy; again, I point you to another of my Web sites if you want only the numbers.

What this series will do is lead you through the minefield of Web performance ideas, so that when you are asked what you think Web performance is, you can present the person asking the question with a clear, concise answer.

The next article in this series will focus on Web performance measures: why and when you use them, and how to present them to a non-technical audience.

Monday, August 28, 2006

GrabPERF: Compression Performance Study, Early Results

I have been running the GrabPERF Compression and Performance study for less than a week, but I thought that I should share some of the initial results with everyone.

GrabPERF Compression Study -- Initial Results -- Aug 28 2006

As you can see above, the byte transmission savings gained by some sites is pretty astounding. Google News sends a pages with a median weight of near 31,000 bytes when compressed; but when compression is disabled on the client, this jumps to over 139,000 bytes.

What is interesting is that the performance gains don't look truly significant. However, they compressed pages are faster, and have the added benefit of costing the site less, as bandwidth costs count by the byte (I know it's more complicated than that, but for now, let's assume a fantasy world).

I will continue to monitor that results and will close the measurements after 14 days and write up a final report.

Technorati Tags: , , , , ,

Sunday, August 27, 2006

Never Eat Alone: The Introvert’s Review

I sat down and finally read my copy of Never Eat Alone, by Keith Ferrazzi. Well, I agonizingly got my way through 80% of the book before I threw it across the room in disgust.

What a load of crap.

There might be a message in the book somewhere. But the book is mostly about Mr. Ferrazzi's preening ego and self-importance.

He obviously thinks that all the world's ills can be solved by reaching out your hand and saying, "Hey, I'm important and you need to know me!".

Keith, get over yourself and your tale of the American dream. Focus on the facts. I don't need to listen to a celebrity gossip story every 5 pages. In fact, your approach turned me off.

Obviously, if you took the time to get to know introverts, which I am, you would find that doing something is more important than who you know. We are people who don't care who you know; we want to know what you have done.

Introverts have very tight, very small networks. But if you REALLY need to get something done, you usually end up working with an introvert.

I can truly say that I lost my money on this book. Maybe if he took the time to explore how the over half of the population works, he would find that his approach is seen as vacuous and disingenuous.

It's not the number you know; it's how well you know them.

Spend a month focusing on those people who are truly close to you. Then you will never eat alone.

Technorati Tags: , , , , ,

Saturday, August 26, 2006

Fisher Space Pen and Rite in the Rain: Bonus from a friendly supply sergeant

Alan at MREater hit the jackpot when he gave a supply sergeant a lift.

As a bonus he got a Fisher Space Pen. And of course...

It’s the perfect companion to the “Rite in the Rain” All-Weather Field Book the soldier also gave to me. Nothing like a friendly supply sergeant. The Field Book has paper “created to shed water and enhance the written image.” Hot damn. That book is a good piece of gear, the kind of thing you wonder how you ever lived without.

Based on his description, he got a tan Tactical Field Book.

I need to find a supply seargeant to give a lift to!

Technorati Tags: , , , ,

Friday, August 25, 2006

When I regained consciousness...

The title is a play on an old Royal Canadian Air Farce skit.

I have been having an adverse reaction to a new medication, which the doctor asked me to stop taking today. This reaction involves am itchy rash slowly spreading all over my skin.

Tonight I took 1.5 teaspoons of Benadryl to see if that would help.

Well, that's 3 hours of my life I won't get back from the sleep goddess.

Blogger Upgrade: Dude! You forgot the compression switch!

Ok, the Blogger blogs all use GZIP compression.

BUT! The homepage does not.

Ummm...attention to detail? Anyone?

Technorati Tags: , , , , , ,

Black and Worn: Weathering a storm of the mind

I wandered around the net today, linking random connections together. Richard Thompson, John Martyn, Nick Drake.

When I visited Nick Drake's official site (sadly out of date) I found this lovely image dominating the front page.

The front page of the Nick Drake site

A lovely, weathered, black leather notebook.

Nick Drake strikes me as a person that is a lot like I could have been. Painfully shy, suffering from depression, trying to get the ideas out in a world that was not his. When he died in 1974, he was ignored and forgotten.

Now that he is all the rage again, it important to go back and consider his life. Consider what he made in a few short years. The stories he tore out of himself, willing to share this one aspect of his life with us.

The rest, well, they are hidden in the little black book.

Technorati Tags: , , , ,

AOL and Movie Studios: This is your life takes on a whole new meaning…

I wonder if the AOL/Movie Studio deal will lead to a whole bunch of new genre know AOL has some pretty compelling information to share with the producers...

Technorati Tags: , , , , , ,

Thursday, August 24, 2006

GrabPERF: Main Page Performance Improvement

One of the performance hits that the GrabPERF system has is the dynamic generation of the main page. The nature of the SQL calls and the underlying PHP makes it scale exponentially past a certain number of measurements.

Last night, Kevin Burton made a grand suggestion: generate a static page on a regular schedule.


Today, I wrote the script that does this. The performance of the main page has adjusted accordingly.

GrabPERF Main Page Performance Improvement - Aug 24 2006


UPDATE: Ian Holsman reminded that if I use cURL, I can use the exiting PHP to build the pages without a PERL script.


Now, bedtime.

Technorati Tags: , , , , , ,

"Slow walkin’ Walter, the fire-engine guy"

If you know this joke, you also know which band I have been listening to this morning....

Technorati Tags: ,

Caching for Performance Article Posted

A few years ago, I wrote an article ablout how to best set up Web server cache-control messages to take advantage of this free form of content distribution. Until now, it has only existed as a PDF file.

Last night, I sent a copy to Kevin Burton of TailRank in response to some of his recent musings around making TailRank faster by sending explicit caching messages in his server responses. His response to the PDF was "make it an HTML file".

You can now find the Caching for Performance article at, in the Caching Library.

Use it. Live it.

Technorati Tags: , , , ,

Wednesday, August 23, 2006

GrabPERF: GZIP Performance Experiment Revisited

A few years ago, I wrote an article on how GZIP compression improved Web performance. Don Marti at the Linux Journal was a great editor, and eventually, the article ended up in the online version of the Magazine.

At the time, I used Ian Holsman's (now renamed ITScales) to capture the data. Now that I have built my own Web performance monitoring network, I thought I would repeat the experiment.

You can see the comparative results at these locations:

After I have collected a lot more data, I will be re-visiting the article and commenting on the state of compression technology on the Internet.

If you would like to suggest a site to measure, please leave a comment.

Technorati Tags: , , , , ,

Tuesday, August 22, 2006

Number One, Almost Done

Slurred Moleskine -- Nearly Done

After more than a year, my first Moleskine notebook is nearly done.

A year!?!

Yes, a year. All that I have been using it for is work notes, jotting down the facts that make my customers and colleagues get up every morning.

Its replacement is in my bag, still wrapped in its cellophane, calling me; tempting me.

Technorati Tags: , , , ,

Monday, August 21, 2006

Bridging the Gap

The last 4 weeks have been extremely traumatic for me. It has culminated in an extended period of renewal, reflection and rejuvenation, where I have looked back over the last 15 years of my life and asked, "What next?".

An interesting note on the word rejuvenation: it means to reclaim your childlike state (ok, I'm playing fast and loose with the definition).

Why now? Why 15 years?

In the Fall of 1991, I bought my first computer. Until then, I had avoided using them like the plague. I had managed to get through my undergraduate years with a pen, paper, and an electronic typewriter with rudimentary spell-checking. I felt that I had achieved something; I felt bonded to the works I created.

I was also an avid and active journal-keeper. In the months after my father died, the writing in my journal was what let me empty my naive mind, letting me vent the chaos that rushed through my head on a constant basis.

Then I went to grad school. And I realized then that I would need to step up in order to generate the massive amount of paper that is required in a graduate history program.

It turns out that I found the technology more enticing than the program. To this day, my failure to complete my Masters degree haunts me. Someday, I will return to that, and complete it. Knot the loose ends of my life together.

Ok, this really is going somewhere; thanks for hanging on this far.

After 15 years of intense immersion in technology, the Web, networking, and all that comes along with that, I have realized that something has been missing from my work, my writing, my life. I have missed the rushing sound of pen on a clean sheet of blank paper. No lines to slow you down; nothing besides the edges of the page to define what you put in the book.

Technology has lost its lustre. The rushing stream of this new laptop, that new technology, another over-inflated boom have left me feeling empty, asking "So what?". In a hundred years, we can be so far down the path to post-humanism that computers as we know them are a vague and distant antique amusement.

Or we could be living in caves, scratching by a subsistence existence.

In either case, the only thing that will remain, that will linger, that will connect us to the past will be the written word. Not the electronic bits and bytes we are now so addicted to, but the ink on paper, graphite on wood pulp.

The smooth, quiet, seductive transition of ideas from mind to physical reality.

I have been trimming back my blog-reading. Gone are the political blogs. I fear that the gadget blogs are next.

What you have left are those people who celebrate life outside the electronic realm. Those who step back, and look back on the knowledge that preceded us. Who pick up a book that was published before they were born.

A book that left the mind of the author and flowed gracefully from the pen, to the paper, to another mind.

15 years is a long time to try and live without paper. Those 15 years have seen the niceties of a bygone age evaporate, get swallowed by an endless sea, a raging torrent of information.

The cursive hand; the thoughtful response; the flowing of ideas from person to person.

To calm the storm of my mind, I have returned to my first love: ideas of the mind, of the soul. Ideas that were worthy of the preparation of the parchment, the sharpening of the quill, the grinding of the pigment to create the ink.

We have walked away from those ideas, grasping at the brass ring in front of us, to the disdain of the treasure chest we leave behind.

To focus on the ideas, that is to live again.

To heal my mind, I must write my mind. Not type it; not IM it or e-mail it or blog it.

That familiar scratch of pen on paper. The rush that comes from committing something to paper; something that you can share with others.

Something that you can set adrift, watch as it floats, the glow from its candle on the gentle rippled flow of all the ideas that have come before.

I am setting my ideas free again.

Picture: girlzone41

Technorati Tags: , , , , , , , , , , , , , , , ,

Mick! Turn it down! My hearing aid is feeding back!

Well, I just heard that everyone's favourite corporate rock and greed group is returning to Boston.

That's right: The Rolling Stones.

They announced that there are student discounts available. Perhaps they should start adding senior's discounts as well.

Technorati Tags: , , , , ,

Progressive Rock: I am getting old

Ok, I am taking a break from writing yet another customer report to make the note that lately I have been cranking through the vast amounts of Progressive Rock that dominated AOR in the 1970s. So yes, my mind is filled with Pink Floyd, Yes, King Crimson, Deep Purple, Steely Dan, Fairport Convention, and EARLY Genesis (not the Phil Collins schlock).

Did I miss anyone?

Saturday, August 19, 2006

Either way, they win.

I'm with Kevin Burton on this.

The more restrictions they place on air travel the more our economy will suffer - which means the terrorists win.

Technorati Tags: , , , , , ,

Friday, August 18, 2006

Wordpress: When did it achieve world dominance?

I was considering the amazing popularity of Wordpress (the hosted service as well ad the application), now the agreed upon champ in the blogging world. I was considering this in light of the fact that when I started blogging in the dark ages of 2004, MoveableType and TypePad were the undisputed champs.

When did the shift occur? What was the watershed moment?

It hit me. it was the day Scoble announced his blog would be a blog. [here]

Now, Scoble may not be as large a force in the blogging world anymore, but that day in October 2005 when he made that announcement sealed the fate of SixApart. The buzz momentum swung to Wordpress and all of the yummy goodness therein.

The SixApart/MoveableType/TypePad fiends out there are likely to flame me, but the latest release of MoveableType received the response usually reserved for yet another Who farewell tour. It is bloated, complex and difficult to manage.

On the other hand, I can install and/or update Wordpress in less than 5 minutes and no one would notice a thing.

I wonder what the next seminal blogging tool will be?

Technorati Tags: , , , , , , , ,

Thursday, August 17, 2006

Microsoft: The NSA Made us do it!

Apparent using HTTP compression alongside HTTP/1.1 will cause certain versions of MSIE 6.0 to implode. [here]

I personally think this was because the NSA power shortage was making it too hard for the spooks to snoop on compressed Web traffic. [here]

Via: Port80 Software

PS: No, I won't turn off compression because Microsoft did something really stupid.

Technorati Tags: , , , , , , ,

Bruce Schneier Fact

Most people use passwords. Some people use passphrases. Bruce Schneier uses an epic passpoem, detailing the life and works of seven mythical Norse heroes.

Random Bruce Schneier Facts.


Technorati Tags:

Wednesday, August 16, 2006

Moleskine: Joe Lavin Skewers the Cult of the Black Book

I have this search set up to deliver the things that Google's Blogsearch finds out in the blogosphere containing Moleskine in it. Sometimes, it delivers some real gems, like Joe Lavin's The Condensed Guide to Looking Like a Writer (found via Professor Barnhardt's Journal).

The take-away quote from this article?
At the very least, costing $15 a pop, the Moleskine can certainly put the "starving" back into starving artist.

Read it. It's a reminder that having the tools doesn't make the owner an artist.

Technorati Tags: , , , , ,

Sunday, August 13, 2006

Location? We don’t need no stinkin’ location! We have BROADBAND!

This post has two underlying reasons for existing: 1) to test out the new MSFT Live Writer Beta; and 2) to talk about a great story that GigaOm pointed us to today.

Om Malik pointed out a story in the Seattle Times today that talked about "Broadband in the Boonies". Having grown up in the boonies of British Columbia, this immediately got my attention. The story discusses the explosive growth of Internet businesses in the now heavily wired interior of Washington State; the story focuses on the are around Twisp, Winthrop and the Methow Valley.

Until you have been in this area, and I have, you don't get the possibility of winter isolation. The story talks about how these places are four hours from Seattle; what they neglect to mention is that this is 4 hours in the period between April 15 and October 15, depending on snow.

The direct westerly route to Seattle from these locations passes through the Cascades. Through the extremely high and snowy Cascades.

Samantha and I took a spur of the moment detour through this little part of heaven, pausing a night in a campground in Twisp. Right on the river. When we woke up the next morning, I remembered how much I missed those early morning moments in the mountains.

Twisp is far more isolated than Golden, BC, or any of the other towns that we passed through on our trip this summer. But it is a reminder to us all that place is important. Not because we have to be there, but because it is where we are at home.

I have lived in the Valley. I have lived in Massachusetts. But neither has been home.

And to me, home is worth more than anything.

Technorati Tags: , , , , , , , , ,

Friday, August 11, 2006

GrabPERF: New Agent Deployed

The new GrabPERF Agent code, with support for plain text or regular expression content matching, is now in production on all active measurement agents.

I added one more feature before I rolled out the new code: when a content match error occurs, the server headers and HTML content for 14 days.

I have not exposed this feature yet, but will be doing so in the next few days.

Again, thanks to the GrabPERF community for your continued support.

Technorati Tags: , , , ,

GrabPERF: CrunchGear Crunched

Michael Arrington's latest Crunch product, CrunchGear, is getting beat up this morning.
CrunchGear Crunched


In the past, these would not appear as errors, but the new text match feature in GrabPERF is working like a charm. The new code is up on 3 of the 5 measurement agents, and the remaining 2 should be updated by tomorrow.

Technorati Tags: , , , ,

Blog Search: Technorati, where art thou bot?

July 25, 2006 at 19:48:24 GMT.

That's the last time that the Technorati bot indexed my blog.

I am confused, because of all the sites out there, my blog should be pretty easy for Technorati to index -- this server, as well as the GrabPERF servers is hosted in Technorati's racks. Theoretically, the bot should be able to index my blog without leaving the building.

I posted something this morning, and IceRocket, Google Blogsearch, Blog Search all have it.

I am wondering if anyone else is noticing this.

Technorati Tags: , , , , ,

Moleskine: Made in China

It was to be expected. On Moleskinerie there was a post that highlighted that the latest Moleskines are "Made in China". The response from the Moleskine fan community has been overwhelming: we want the old books back.

China is responsible for a large number of the consumer products that we use today. However, there is an expectation that Moleskines were better than a mass-produced throwaway consumable. I imagine we all had images of a workshop filled with dedicated craftsmen, carefully hand-binding each notebook with absolute focus and attention to detail.

Sorry folks: these books have always been mass-produced. What is irksome even to me is that Modo e Modo (or their new French corporate masters) is no longer making a pretense of selling a quality journal that is unique and worth posessing. An item that sets the owner apart as someone who takes their notes, sketches and writings seriously, as thoughts worth dedicating to a medium that will last beyond them.

It's all about brand. And the Moleskine notebooks are the icon of the social networking brand growth vision held by so many companies today. The core, dedicated following evangelizes the product, drawing more people to try the product and love it. As with so many things, will popularity denude and degrade the product?

If it is true that the latest production runs of Moleskines are originating in China and are of a lower quality than the community has come to expect, nay, demand, of this fine piece of crafting, then the no longer have the cachet, and are no longer unique, and will die the death of a million blog posts.

I am voting for the Rite in the Rain notebooks to be the next iconoclastic notebook. The unique yellow covers and indestructible paper have made me think twice about this addiction to Moleskines. They are books designed to be noticed (try finding a black notebook in the woods after it's fallen out of your pack!), and stand out in a coffee shop, especially one filled with darkly dressed artist types.

Moleskine, I am willing to give you a chance. The community wants to hear your answer.

Technorati Tags: , , , , , , , ,

Wednesday, August 9, 2006

Work: Start-up v. Established Firm — Thoughts on Inflection Points

Scott Berkun wrote a great post that discusses how he encounters the start-up inflection point in companies. This is the point where the company has to make that brutal transition from the fast-and-loose dynamic of the true start-up to the more established and "normal" business methods.

This week, Niall Kennedy provided an example of someone who gave an established firm a try, but decided that the start-up world is more to his liking.
The object here is not to decide which is best, the start-up or the established firm, but to discuss the transition that occurs when moving between these two phases; and the direction of travel is always one-way, to the established firm. For all their talk of "thinking like a start-up", established firms are what they are.

I have made this transition twice now. The first time was during the exuberance of the 1999 bubble world with a company that had just gone public. Here the transition was initially hidden by the exponential growth and overly optimistic predictions made by the executives. When reality stepped in early in 2001, the true effect of the transition became clear: this was no longer a start-up, and there were people who were more than willing to make the tough decisions. Whether, in the long-term, these were the correct decisions is a question that I am not willing to answer; I was merely an observer.

I was an observer when a similar change occurred at my current company. A start-up in the sense that it was still a VC-funded private firm, this company had (and still has) an excellent product developed by some top-flight technical talent. The issue now was to take that foundation and build a team that could execute. Again, I can't say whether the decisions that were made were the correct ones, but the team that was built during that time has lead the company out of the wilderness and in a very solid direction.

These are simply my experiences. In my experience, there are start-up people and established company people; and there are the rare folks who can slide in and out of both worlds. Me, I fall into the start-up category. When a company starts edging toward 200 employees, I begin to feel a bit edgy. In a very quick exchange I had with Niall Kennedy on Tuesday, he said that he set a magic number whan a company became a "189" "187" (a number he also mentioned was police slang for homicide).

Is there a magic number? Or does it depend on the company? What defines a start-up? What defines an established firm?

Technorati Tags: , , , , , , ,

Pencils: The New Trendy Scribe Tool

In the last year, I have used every trendy writing instrument that I have read about. Fisher Space Pens, G2 Gel Pens, Uni-Ball Signos, Uni-Ball Power Tanks, and even the old standby, the fountain pen.

In the last week, I have re-discovered the joy of the pencil. There is something liberating in using something so simple.

The New Space Pen

It's old-fashioned, but I love it.

Have you sharpened a pencil today?

Technorati Tags: , , , , , , , , , ,

Moleskine: Hi, my name is Lost in Scotland and I have a problem

The Flickr tag search for Moleskine is always good for a laugh or two.

I think that this fair lass from my ancestral homeland has a larger issue with Moleskines than I do.

"They are not all here believe me...just sifting through stuff to pack...or not to pack...."

She also has her own Moleskine pool.

Technorati Tags: , ,

GrabPERF: Text Matching Example

I now have a true live example of how text matching can provide information on issues where a successful page is returned.

Text Match Example -- Aug 09 2006

In this example, the TEST AGENT returned a Text Match Failed error, while 3 of the agents running the current production code said the page was a success.

How do I know that the TEST AGENT is right? Take a look at the byte count. For the successful pages, the byte count is in the 3,600-3,900 byte range; the page that had the Text Match failure only returned 1076 bytes. And three other measurements around that time reported the same approximate size, but reported successful page downloads.

If this Agent code shows continued success and robust behaviour, then I will push it into production on August 14.

Technorati Tags: , , , ,

Tuesday, August 8, 2006

GrabPERF: New Feature - Text Matching

Ok folks, this is it. I have finally truly woken up from my slumber and I am starting to add features to the system.

Today's latest: content | text matching.

This is a critical step in the development, as it allows for quality checking against the returned data. Currently, I can only catch errors under 3 defined circumstances:

  1. The server returns an HTTP code >= 400

  2. The measurement times out when delivering the data (45 seconds)

  3. The connection to the server fails (only some agents and kernels)

Now, the ability to confirm that the data being returned under what would be considered by the server admins as a success criteria, i.e. something with an HTTP/200 OK message got sent back to the client, can trigger an error in a defined text string or regular expression does not appear in the HTML.

This is currently up on a test agent, and if it proves stable, I will roll it out to the production agents later this week. If you would like to be a part of this beta, drop me a line or a comment.

Technorati Tags: , , , ,

Moleskine: All Powerful Mind Control in every notebook!

Andreas Reinhold demonstrates one of the new features to be added to Moleskines since Modo e Modo was purchased last week.

Technorati Tags: , ,

Performancing Extension: Trying something new

Since I started using the Performancing Firefox extension, it seems that Technorati takes a while to find me. FInally figured out that I haven't enabled pings.

So this is a test post to see if the ping function works correctly.

Technorati Tags: , ,

Monday, August 7, 2006

GrabPERF: Some System Statistics

Over the last year, GrabPERF has been something that has caught the fancy of a few in the Blogging/Social Media world. It has given some perspective of how performance can affect business and image in the connected world.

But what of GrabPERF itself? It has been on a development hiatus for the last few months due to pressures from my "real" job and various trips (business and pleasure) that I have been undertaking. Over the last two weeks, I have been trying to clear out the extra measurements and focus the features and attention on the community that appears most interested in the data.

During this process, I heard back from some folks who had been using GrabPERF in stealth mode (even I can't track all the hits!), and who asked, "Hey! Where did my data go?". Glad to hear from all of you.

Just to give everyone some idea of the growth, here is a snapshot of aggregated daily performance and number of measurements.

GrabPERF Statistics (by day)

The number of measurements shot up, until I started culling the unused measurements. Over the last 3 weeks, average performance became extremely variable, and that's when I began considering the culling. As well, the New York PubSub Agent appears to have gone permanently offline, as a part of their winding down process.

The fact that the system was taking 390,000 measurements per day still astounds me.

This was also comparable to the number of distinct sites we were measuring.


After the latest cull, we are down to 84 distinct tests, a level last seen on November 27, 2005.

I am pleased that the system has held together as well as it has.

Technorati Tags: ,

Notebook Lust: Archimedes Palimpsest

For history fiends, enthusiasts of lost treasures, and lovers of a good mystery, the discovery of the Archimedes Palimpsest has been one of those stories that must be followed.

The texts contained in the Palimpsest were lost to humanity for hundreds of years as a result of a common Medieval European tradition -- the re-use of parchment. To quote the site:
"The word Palimpsest comes from the Greek Palimpsestos, meaning "scraped again". Medieval manuscripts were made of parchment, especially prepared and scraped animal skin. Unlike paper, parchment is sufficiently durable that you can take a knife to it, and scrape off the text, and over write it with a new text. In this case, [the text of Archimedes'] five books were taken apart, the text was scraped off the leaves, which were then stacked in a pile, ready for reuse."

Using a new x-ray scanning technology, the original Greek text is exposed to the Western world for the first time since 1229.
A page from the Archimedes Palimpsest

This holds more than a passing interest to me, as one of the most influential history courses I took during my undergrad tenure was taught by a paleographer and historian at the University of Victoria, Michèle Mulchahey (she is now part of the faculty at the University of St., Andrews in my ancestral homeland, Scotland). I still often wish I had continued my Medieval European history studies, but my lack of latin prevented me from go much further than I did.

Glad to see a truly old classic resurrected.

Technorati Tags: , , ,

Notebook Lust: Minutiae to the max!

For those who capture the minutiae of their lives in a Moleskine or Rite in the Rain, I present the example to which we should all bow.

The Domesday Book is online.

To quote the site: 
"Domesday is our most famous and earliest surviving public record. It is a highly detailed survey and valuation of all the land held by the King and his chief tenants, along with all the resources that went with the land in late 11th century England. The survey was a massive enterprise, and the record of that survey, Domesday Book, was a remarkable achievement. There is nothing like it in England until the censuses of the 19th century."

It is a truly amazing work, and a goldmine for researchers and social historians around the world.

Technorati Tags: , , , , , , , ,

GrabPERF: TailRank back on, and behaving well

Got an e-mail from Kevin Burton of TailRank this weekend saying that he had found some issue with the system's infrastructure and that he should be back to normal performance.

Looking at this graph, I agree with him.

TailRank Performance (7 Days) -- Aug 07, 2006

Technorati Tags: , ,

Wednesday, August 2, 2006

GrabPERF: Some bad data leaked in

I was trying yesterday to debug an issue that was appeared to be affecting the PubSub Agent -- yes, I re-started it at the request of their sysadmin.

The issue was that it was showing data that appeared to have no relationship with the data appearing from all of the other measurement locations. I tried blocking it off using IPTables, MySQL restrictions, etc.

This afternoon, I figured out the problem.

Yesterday, I had been using this query to diagnose the problem:

    agent_id = 11
    and date between subdate(now(),interval 30 minute) and now()
order by
    date DESC

And everytime I looked, there was new data. I couldn't seem to stop the agent from delivering data. Then I had a flash.

    agent_id = 11
    and date > now()

157,000 rows of data. From the future.

Thankfully I know the guy who is responsible for keeping the PubSub servers running, and he is going to adjust the time, etc. But he makes no promises about how long the agent machine will stay running.

I apologize for the bad data.

Technorati Tags: , , ,

Server Stuff: Upgraded to Apache 2.2.3

I took the big leap today and upgraded the base Web server to Apache 2.2.3. This required some finagling with httpd.conf file, and a re-build of PHP, but once that was done, the app bounced up happy as a clam.

As usual, let me know if you see anything unusual -- with the server, not with the author.

Technorati Tags: , ,

Tuesday, August 1, 2006

Web Performance: Some posts of interest

This morning's bounty of posts brought in two that will make you think.

First was Port80 Software's comments on using the Cache-Control mechanism embedded in all browsers. This is interesting to read, as I have been trying to get companies to use this mechanism more intelligently for a number of years. I know that the Port80 team gets it, but it is always nice to have some outside validation of a position you have tried to evangelize for a long time.

The second was Tim O'Reilly's post on Cal Henderson's new book on Web scalability. While I am likely to purchase the book for a professional interest, I have one problem with Flickr's current configuration: does not use HTTP persistence, something I noted last week. This strikes me as weird.

It's always good to see Web performance rear its head.

Technorati Tags: , , , , , ,