Tutorials - CGI > Setting up Webstats - Webalizer

Tutorials and FAQs: CGI: Setting up Webstats - Webalizer

In this tutorial, I will explain how to install and/or configure one of the most popular webstats applications currently available: Webalizer. Webalizer is pre-installed on the CGI server and is what VISP full name include as standard with all but the basic Home Surf and Biz Surf accounts.

In a related tutorial I also explain how to install and configure one of the other popular webstats applications: Awstats. It is possible to run both webstats packages on your cgi webspace as I will exlain later so it is possible to use both sets of web statistics reports and choose which one you prefer. But first, let-s get some of the basics out of the way.

Note: It is not possible to use Webalizer if you have a Home Surf or Biz Surf account. This is because they are very basic accounts primarily designed for websurfing accounts and do not come with any CGI webspace (which is necessary to process the webstats data) or have the option to generate webstats logfiles. Before you can do anything and if you have not already done so, you must activate your CGI webspace and understand how to access your CGI shell using telnet or ssh. Info on doing this can be found in CGI/Shell Server Basics.


What are webstats and where are they stored

Webstats are raw details about what pages and areas of your website people have visited. Each time your browser requests an html webpage or a graphic image from your website, a record of that request is stored by the webserver in an access or webstats log. It is then possible to collate all those requests using a logfile analyser and be able to view those statistics in graphical or condensed numerical form and thus, have some feedback on the popularity of your website and what parts are attracting or not attracting visitors.

Because the cgi and www servers are completely separate systems, a separate webstats log file is generated for each server. The raw access logs files are generated on a daily basis and are stored in a special directory called logs in your www webspace. The www webstats log files are named www.username.plus.com.gz and the cgi webstats are named cgi.username.plus.com.gz (with username being your name). The gz indicates the files have been stored in a compressed or gzipped format, this is to save space and to speed up file copying as some of the very popular websites can produce 10s of megabytes of raw data each day. If you have any domains registered under your username, it is possible to get webstats generated for those as well with the corresponding names of www.domainname.xxx and cgi.domainname.xxx (xxx being .co.uk, .com, .uk.ltd etc) and what server the domain is linked to. Upto 8 days of webstats are available at any one time in the logs directory (todays and the previous 7 days). When a new days worth of webstats is available, it is written as www.username.plus.com.gz, with each of the previous days logfiles renamed to www.username.plus.com.0.gz, www.username.plus.com.1.gz -> .6.gz (0 = yesterday, 1 = day before yesterday and so on).

Because the webstats logfile processing can only be done on your CGI server, it is not possible to process the webstats files directly from the logs directory, so they must be copied to your cgi webspace first. To simplify this operation I have written a couple of scripts that will do this for you. As the webstats are only created once per day, the copying must also be done daily prior to being processed so I will also explain how the scripts can be run automatically each day using crontab.

By default, webstats logfiles are not activated for your webspace. To enable them you must click on the web stats link under My Account and click the activate button. The domains that webstats are available for is shown in the View Your Webstats table. To view the default webalizer stats just click on the individual domain names listed. Note: you will need to wait at least 24 hours after activating webstats before any webstats logfiles will be available to view via the links in the table or to process yourself.


Webalizer - A logfile analyser

Webalizer is what is commonly refered to as a logfile analyser. It is able to read and process the raw webstats log files generated by the webserver and show the stats in a user friendly and visual way using graphics and tables. Webalizer can also process other types of logfiles like mail, ftp etc but this is outside the scope of this tutorial.

The following link will show you an example of what Webalizer can produce and it represents the stats for the tutorialsteam www webspace:

http://cgi.tutorialsteam.plus.com/webalizer/www (click on April 2004 to see more details)


Webalizer configuration

As I explained earlier, Webalizer is installed by default on the cgi servers so it is not necessary to actually install the software, it just needs to be configured to run. In the following sections I will explain how to setup your cgi webspace, use a perl script to copy the webstats logs to it, process the data and access the results webpage.

Note: I normally use a unix text file editor called vi (or vim) to edit files as this is the simplest editor available on *nix. If you are not familiar with vi and how it works, a simple guide to commands can be found at the end of CGI/Shell Server Advanced Topics. If you prefer to edit the files using notepad or some other editor on Windows you can do so. You will need to ftp the files to edit to your windows PC, make the changed and ftp the files back to their original locations. There are some important differences between Windows and Unix text file formats so please make sure you read CGI: Unix / Windows text file compatability to stop any problems when running the perl scripts after copying then back to the CGI server.

First lets create some directories for www and cgi - make sure you are in your home directory then enter the following commands:

$ mkdir webalizer webalizer/www webalizer/cgi
$ chmod 705 webalizer webalizer/www webalizer/cgi

This creates a main directory and 2 sub-directories where the stats graphics will be created.

Next we need the webalizer config file and the 2 copy/processing scripts we will use. I have created generic copies of these, which you can get with the following commands:

$ cd webalizer
$ wget http://www.tutorialsteam.plus.com/cgi/webstats/web_files.tgz
$ tar xvzf web_files.tgz
$ chmod 600 *.conf *.cron
$ chmod 700 *.pl

The chmod 700 (rwx------) is to stop access to the files from other users, this is especially true for the perl (.pl) scripts as they will contain login and password information.

You should now have the following files in $HOME/webalizer directory:

getwwwstats.pl - perl script to copy www webstats logfile to cgi server and process it with Webalizer
getcgistats.pl - perl script to copy www webstats logfile to cgi server and process it with Webalizer
wwwstats.conf - webalizer config file for processing www webstats
cgistats.conf - webalizer config file for processing cgi webstats
webalizer.cron - example crontab file for running above 2 perl scripts
both.cron - example crontab file containing entries for Webalizer and Awstats

Now we need to modify the files to add your information like username, password and domain name etc.

$ vi wwwstats.conf

and change the following:

OutputDir - The full path to the webalizer/www directory. To find this cd to webalizer/www and enter pwd.
Hostname - Change username to your actual name

Repeat the same edit for cgistats.conf except use cgi in place of www.

Next we need to update the two perl scripts to include the correct directory paths and to add your login and password details. These perl scripts are responsible for copying (using FTP) each of the webstats files from your www logs directory and processing them with webalizer.

$ vi getwwwstats.pl

username - enter your username you use to connect via FTP
password - enter your password for FTP (this will be the same as your portal login)
domain - enter the URL of your www domain (www.username.plus.com)
configFile - The name and full path of the webalizer config file for www
outputDir - The full path where the stats graphics will be written

Repeat the same edit for the getcgistats.pl but using cgi in place of www in paths.

We now have everything setup to copy the webstats from your www webspace, process the file and write the stats to the relevant directory (www or cgi). Before we setup things to run automatically, we must run it manually to make sure everything is setup correctly. As this relies on having webstats files you will have to wait until at least one file is created. Once you have the webstats files you can run the perl scripts as follows and you should see something like the following:

$ cd webalizer
$ ./getwwwstats.pl
www.tutorialsteam.plus.com WebStats analysis started at Tue Apr 13 09:00:02 2004

Last run Mon Apr 12 07:12:13 2004 (1081750333)
Getting www.tutorialsteam.plus.com.gz (2892 bytes)
Log file timestamp: Tue Apr 13 08:03:26 2004 (1081839806)
Running Webalizer...
Webalizer V2.01-10 (FreeBSD 4.10-BETA) English
Using logfile www.tutorialsteam.plus.com (clf)
Creating output in /files/home2/tutorialsteam/webalizer/www
Hostname for reports is 'www.tutorialsteam.plus.com'
Reading history file... wwwstats.hist
Reading previous run data.. wwwstats.current
Saving current run data... [04/12/2004 23:53:03]
Generating report for April 2004
Generating summary report
Saving history information...
169 records in 0.16 seconds
Deleting log file... www.tutorialsteam.plus.com

1 files processed in this run

www.tutorialsteam.plus.com WebStats analysis finished at Tue Apr 13 09:00:03 2004

$ ./getcgistats.pl

cgi.tutorialsteam.plus.com WebStats analysis started at Tue Apr 13 10:00:02 2004

First ever run - if you keep getting this then there are no webstats files
Getting cgi.tutorialsteam.plus.com.gz (999 bytes)
Log file timestamp: Tue Apr 13 08:34:07 2004 (1081841647)
Running Webalizer...
Webalizer V2.01-10 (FreeBSD 4.10-BETA) English
Using logfile cgi.tutorialsteam.plus.com (clf)
Creating output in /files/home2/tutorialsteam/webalizer/cgi
Hostname for reports is 'cgi.tutorialsteam.plus.com'
History file not found...
Previous run data not found...
Saving current run data... [04/12/2004 20:48:26]
Generating report for April 2004
Generating summary report
Saving history information...
69 records in 0.19 seconds
Deleting log file... cgi.tutorialsteam.plus.com

1 files processed in this run

cgi.tutorialsteam.plus.com WebStats analysis finished at Tue Apr 13 10:00:03 2004

If everything worked correctly you should now have some stats graphics available to view using the following URL (replace username with your own name):

http://cgi.username.plus.com/webalizer/www for www and
http://cgi.username.plus.com/webalizer/cgi for cgi.

Finally we can setup to run the perl scripts automatically every day so the stats will be updated. This is done using crontab, unix-s equivalent of the Windows Task Scheduler.

Note: If you intend to run both Webalizer and Awstats you must use the both.cron file in place of webalizer.cron shown below. This is because running crontab filename replaces (deletes) any existing crontab entries with what is in filename.

I have already created a crontab file for you called webalizer.cron. To add the commands to crontab just do the following:

$ crontab webalizer.cron
$ crontab -l

The first command adds the contents of webalizer.cron and the second lists what crontab entries have been added. It should look like the following:

$ cat webalizer.cron
00 9,18 * * * cd $HOME/webalizer; ./getwwwstats.pl >> wwwcron.output
30 9,18 * * * cd $HOME/webalizer; ./getcgistats.pl >> cgicron.output

For information on what the 30 9,18 etc mean see CGI: cron Task Scheduler. Basically it means run the command at 9:00 and 18:00 (1st line) and 9:30 and 18:30 (2nd line). It is necessary to run the command twice a day because sometimes the webstats may not always be available before 9:00 so the 2nd run catches any logfiles that are generated late.

To check that everything is working, look at the wwwcron.output and cgicron.output files to check the processing has occurred. At some time the .output files can be deleted to stop them using up a lot of disk space.


Adding additional domains to be processed

This tutorial described how to create webstats info for your default www and cgi websites. If you have additional domains associated with your account, you can process webstats data for them just as easily.

  • Create a separate directory in $HOME/webalizer (e.g. mydomain)
  • Make a copy of getwwwstats.pl and call it something like getmydomainstats.pl
  • Make a copy of wwwstats.conf file and call it wwwmydomain.conf
  • Modify the necessary parameters in the new .pl and .conf files to refer to the new directory and new domain name
  • Add an additional entry into crontab to run the new .pl script at a slightly different time to the rest.
  • The url to the webstats for mydomain will be http://cgi.username.plus.com/webalizer/mydomain

As before, run the new .pl command manually to make sure it works before adding it to crontab.


Further reading / information for Webalizer

It is possible to configure Webalizer to show different information via additional .conf file options. What I have given you is the basic config and is enough to get going. You may want to tweak the information to suit your purposes but this is beyong the scope of this tutorial. For more information on Webalizer and what config options are available please visit http://www.webalizer.com

And finally, if you really can-t be bothered to do any of the above, VISP full name generate a Webalizer stats information page for each of your registered domains, as well as your default ones, once you have enabled webstats. Just click on webstats link under My Account then click on the domain you want in the View Your Webstats table.

That completes the setting up of Webalizer and also concludes this tutorial. I hope this have proved useful to you.
Original Article by: petervaughan - Edited by: csogilvie