The program collects and analyses the statistics of visits of several sites
on one physical server. The data is collected from common Apache logs. The
results are presented as html-reports with graphs and ghistograms.
Click here to download
The program counts:
- the number of requests,
- traffic,
- the number of unique IP addresses of visitors
for:
- each site on server from list defined by administrator,
- each range of IP addresses from those defined in program configuration.
Each of these three values of last 7 days are shown on ghistograms. The sum values for all listed sites and part of site's values in that sum are alsaw shown. Here is the example of such ghistogram.
By clicking on the colorised ranges' names (in this example - names of providers and network groups of Samara region, Russia) it is possible to switch the last picture in ghistogram of the corresponding range. In this example this option is switched off.
The graphs on the values for last year are alsaw put into report. Those can be switched the same way as above. Here is an examle of year graph:
The figure indicates the hits number of all the sites taken into account for
the specified providers for year. Color definitions are shown on the figure to
the left. Show all graphs for Samara providers Show the sum graph Show the graphs for Samara and all the other providers Show the sum graph for Samara providers Show the graph for unknown providers Show the graph for HEAD requests Show the graph for provider: [Samara-Internet] [Samara-Telecom] [Craft-S] [Svyazinform (Samara)] [Svyazinform (Syzran)] [Trans-Telecom] [Radiant] [Infonet] [VolgoInformNet] [Infopac (Togliatti)] [LVS (Togliatti)] [Aist (Togliatti)] [HTA-net (Togliatti)] [Infoline (Togliatti)] [other providers] [Corporations nets] [SSU] [other universities] |
Every January 1st the program creates year report with year graphs and ghistograms with mean values for days of week and months. Unlike current report, year report is not overwritten every day but stored in individual directory.
Program version 1.4 is now available.
Download
Program consists of 2 files: collect.pl and drawpage.pl. First script does
the following:
- recieves list of site names from STDIN and configuration file name as a
parameter;
- reads configuration;
- reads Apache configuration file, unpacks all macros (if present), then
determines log formats and paths to logs for each site, builds regular
expressions for log parsing;
- reads archieved log files and parses logs;
- adds obtained information to main storage file.
Second scripts does the following:
- recieves list of site names from STDIN and configuration file name as a
parameter;
- reads configuration;
- reads the language file and sets all text strings of future reports;
- reads main storage file;
- calculates the "sum" and "part" values;
- creates pictures and html-reports.
Program requires the GD library and GD.pm module installed.
The program requires 2 files: configuration file and file with all the required text includes on chosen language.
Files' formats:
Configuration file.
Text file consisting of four sections:
'Configuration' - paths and names of files the program works with;
'Filter' and 'Exclude' - logs filtering;
'Providers' - table of IP addresses ranges, numbers and colors of networks.
Each section is started with opening tag <Name> and ended with closing tag
</Name>. Comments are permittable. Those are started with '#'
and longs up to the end of string.
Section 'Configuration'
In each string a parameter name is determined and followed by (after space) it's
value (without any quotes). The following parameters are needed:
ApacheConfigurationFile - full path and name of Apache configuration file;
MainStorageFile - full path and name of file with data collected by the program;
ProgramReportFile - full path and name of file with program work report;
TemporaryFile - full path and name of temporary file created by the program for
unpacking archieved logs;
TargetDirectory - full path to directory where html-reports will be placed;
LanguageFile - full path and name of language file;
LogFiles - list of archieved log files names delimitered by space.
Program reaction on mistakes in section: stop and error prompt.
Section 'Filter'
In each string a log field name is determined and followed by (after space)
it's value. If this falue must be compared as string it should be enclosed with
double quotes. If it must be treated as regular expression (with Perl syntax)
it should be enclosed with '/'. If several values for one field are needed,
those shold be detemined on separate strings. The section strings combined with
logical OR, i.e. log string will be taken into account with program if it sati-
sfies one of conditions in section. Fields' names:
HOST - visitor's IP address;
LOGIN - visitor's login;
USER - user name;
DATETIME - date and time of request;
REQUEST - request string;
OSTATUS - original status;
LSTATUS - last status;
BYTE - the amount of information sent (in bytes);
FILENAME - file name;
ADDR - IP address;
PORT - port;
PROC - process ID;
SEC - reply length in seconds;
URL - URL;
HOSTNAME - hostname;
REFERER - REFERER string;
UAGENT - user agent.
Example:
LSTATUS "200"
LSTATUS "206"
Program reaction on mistakes in section: ignoring wrong strings
Section 'Exclude'
The same format as of previous section. The program does not take into
account log string if one field of that corresponds to on of the described
conditions. Program reaction on mistakes is the same as above.
Example:
REQUEST /^(?:OPTIONS|PUT|DELETE|TRACE|CONNECT)/
Section 'Providers'
Each string describes one continuous range of IP addresses. String format:
startIP;endIP;number;R,G,B;name
where startIP - the starting address of the range,
endIP - the ending address of the range,
number - the number of group of networks,
R,G,B - decimal integer numbers from 0 to 255 describing the color in
which the graphs and ghistogram columns corresponding to this group of
networks will be painted,
name - the name of group/ISP (it is advisable to be less than 20 symbols).
The numbers of groups of networks are started from 1 and must be continuous
(no missing numbers allowed). If one group has more than one range of IP addresses,
those should be placed one after another and in every string the same number,
color and name should be specified the way format requires. Example:
194.135.0.0;194.135.255.255;5;204,172,32;Relcom
Program reaction for mistakes in section:
1) If the specified ranges overlap, the program will assign the checked address
to the last (by arrangement in the file) range which this address belongs.
2) Each group of networks is defined by it's number. If one number in different
strings corresponds to different names or colors, the first those (by arrangement
in the file) will be taken and all the others will be ignored.
3) The reaction of the program on mistakes in string format is unknown because
it has not been tested for those. Better don't risk. :)
The number of groups of networks is not limited. The IP ranges can be obtained here: http://www.ripe.net
Language file.
Each string of file has the following format:
key^Tstring
where key - the unique symbol index of string;
^T - the TAB symbol;
string - text string.
String can be commented by placing the '#' in it's beginning.
The easiest way to create new language file is to copy the existing file under new name and translate all it's text strings to the language needed.
Recommended file name - lang-##-ENCODING
where ## - 2-symbol language designation (en, ru, de, fr etc.)
ENCODING - the text encoding
for example, lang-ru-win1251
Program reaction on mistakes in configuration file: usually harmless, the missing or wrong key simply will not be interpreted and the corresponding text will not appear. But the mistakes in days of week or months designation can lead even to program 'hang' (infinite cycle).
Program must be run daily by logrotate script after log archieving.
collect.pl is run first, drawpage.pl - second.
The first string of files: "#!/usr/bin/perl". It should be changed to the actual path to Perl in your system if needed.
Both files are run the same way. Run string:
<full_path_to_script>/<script> <full_path_to_config_file>/<config_file>
where <full_path_to_script> - full path to the file;
<script> - collect.pl or drawpage.pl;
<full_path_to_config_file> - full path to program configuration file;
<config_file> - the name of program configuration file.
The site names list is taken by the program from STDIN. Names in list can be delimitered with end-of-string symbol, ',' or ';'.
Created at WebZavod