mPulse

Wednesday, September 3, 2008

Hit Tracking with PHP and MySQL

Recently there was an outage at a hit-tracking vendor I was using to track the hits on my externally hosted blog, leaving me with a gap in my visitor data several hours long. While this was an inconvenience for me, I realized that this could be mission critical failure to an online business reliant on this data.

To resolve this, I used the PHP HTTP environment variables and the built-in function for converting IP addresses to IP numbers to create my own hit-tracker. It is a rudimentary tracking tool, but it provides me with the basic information I need to track visitors.

To begin, I wrote a simple PHP script to insert tracking data into a MySQL database. How do you do that? You use the gd features in PHP to draw an image, and insert the data into the database.




header ("Content-type: image/png");

include("dbconnect_logger.php");
$logtime = date("YmdHis");
$ipquery = sprintf("%u",ip2long($_SERVER['REMOTE_ADDR']));

$query2 = "INSERT into logger.blog_log values \
($logtime,$ipquery,'$HTTP_USER_AGENT','$HTTP_REFERER')";
mysql_query($query2) or die("Log Insert Failed");

mysql_close($link);

$im = @ImageCreate (1, 1)
or die ("Cannot Initialize new GD image stream");
$background_color = ImageColorAllocate ($im, 224, 234, 234);
$text_color = ImageColorAllocate ($im, 233, 14, 91);

// imageline ($im,$x1,$y1,$x2,$y2,$text_color);
imageline ($im,0,0,1,2,$text_color);
imageline ($im,1,0,0,2,$text_color);

ImagePng ($im);
?>



Next, I created the database table.




DROP TABLE IF EXISTS `blog_log`;
CREATE TABLE `blog_log` (
`date` timestamp NOT NULL default '0000-00-00 00:00:00',
`ip_num` double NOT NULL default '0',
`uagent` varchar(200) default NULL,
`visited_page` varchar(200) NOT NULL default '',
UNIQUE KEY `date` (`date`,`ip_num`,`visited_page`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;



It's done. I can now log any request I want using this embedded tracker.

Data should begin flowing to your database immediately. This sample snippet of code will allow you to pull data for a selected day and list each individual hit.




$query1 = "SELECT
bl.ip_num,
DATE_FORMAT(bl.date,'%d/%b/%Y %H:%i:%s') AS NEW_DATE,
bl.uagent,
bl.visited_page
FROM blog_log bl
WHERE
DATE_FORMAT(bl.date,'%Y%m%d') ='$YMD'
and uagent not REGEXP '(.*bot.*|.*crawl.*|.*spider.*|^-$|.*slurp.*|.*walker.*|.*lwp.*|.*teoma.*|.*aggregator.*|.*reader.*|.*libwww.*)'
ORDER BY bl.date ASC";

print "<table border=\"1\">\n";
print "<tr><td>IP</td><td>DATE</td><td>USER-AGENT</td><td>PAGE VIEWED</td></tr>";
while ($row = mysql_fetch_array($result1)) {
$visitor = long2ip($row[ip_num]);
print "<tr><td>$visitor</td><td nowrap>$row[NEW_DATE]</td><td nowrap>$row[uagent]</td><td>";

if ($row[visited_page] == ""){
print " --- </td></tr>\n";
} else {
print "<a href=\"$row[visited_page]\" target=\_blank\">$row[visited_page]</a></td></tr>\n";
}

}

mysql_close($link);



And that's it. A few lines of code and you're done. With a little tweaking, you can integrate the IP number data with a number of Geographic IP databases available for purchase to track by country and ISP, and using graphics applications for PHP, you can add graphs.

For my own purposes, this is an extension of the Geographic IP database I created a number of years ago. This application extracts IP address information from the five IP registrars, and inserts it into a database. Using the log data collected by the tracking bug above and the lookup capabilities of the Geographic IP database, I can quickly track which countries and ISP drive the most visitors to my site, and use this for general interest purposes, as well as the ability to isolate any malicious visitors to the site.

4 comments:

  1. Why do you convert the IP using ip2long - are there any benefits to it?

    ReplyDelete
  2. [...] Hit Tracking with PHP and MySQL | | | | | | close () [...]

    ReplyDelete
  3. What about scalability issues? The database size will expand quickly with traffic and it adds another hit on the server with each page load.

    ReplyDelete
  4. What about scalability issues? The database size will expand quickly with traffic and it adds another hit on the server with each page load.

    ReplyDelete