Skip to content. | Skip to navigation

Personal tools

>>> ''.join(word[:3].lower() for word in 'David Isaac Glick'.split())

‘davisagli’

Navigation

You are here: Home / Blog / Seeing a real-time breakdown of web traffic by vhost

Seeing a real-time breakdown of web traffic by vhost

by David Glick posted Oct 01, 2009 01:54 AM
Occasionally our servers are hit by traffic spikes. Since we typically host a number of websites per server, we need a way to quickly determine which site is receiving the bulk of incoming requests. (Then we can improve caching on that site, perhaps.) In order to see a real-time indication of what vhosts are being requested, we use the following awk script:

histo.awk

# creates a histogram of values in the first column of piped-in data
function max(arr, big) {
    big = 0;
    for (i in cat) {
        if (cat[i] > big) { big=cat[i]; }
    }
    return big
}

NF > 0 {
    cat[$1]++;
    if (!start) { start = $6 }
    end = $6
}
END {
    printf "from %s to %s\n", start, end
    maxm = max(cat);
    for (i in cat) {
        scaled = 60 * cat[i] / maxm;
        printf "%-25.25s  [%8d]:", i, cat[i]
        for (i=0; i<scaled; i++) {
            printf "#";
        }
        printf "\n";
    }
}

Which can be used like this:

watch 'tail -n 100 /var/log/apache2/access_log | awk -f histo.awk | sort -nrk3'

which will give a histogram of the occurence of vhosts in the last 100 lines of the apache log, updating every 2 seconds, sorted with the most frequent vhosts at the top. (Note that this assumes you are using an apache log format which includes the vhost as the first column.) It looks something like this:

Every 2.0s: tail -n 100 /var/log/apache2/access_log | awk -f histo.awk | sort -nrk3       Thu Oct  1 09:51:41 2009

www.dogwoodinitiative.org  [      49]:############################################################
www.wildliferecreation.or  [      24]:##############################
www.earthministry.org      [      14]:##################
blogs.onenw.org            [       3]:####
www.tilth.org              [       2]:###
www.oeconline.org          [       2]:###
www.audubonportland.org    [       1]:##
oraction.org               [       1]:##
oeconline.org              [       1]:##
dogwoodinitiative.org      [       1]:##
bandon.onenw.org           [       1]:##
209.40.194.148             [       1]:##
from [01/Oct/2009:09:51:21 to [01/Oct/2009:09:48:40

(Another useful variant of this is to produce a histogram of requests by IP address, which can help determine what to block in a DOS attack.)

Florian Schulze says:
Oct 01, 2009 06:02 PM
Really? I mean, 'awk'? You are a python developer, aren't you :D
John-Boy says:
Aug 26, 2011 02:01 PM
You mention you can use this to see who is DOS'ing - any suggestion on where to change the field from the access_log ?
Navigation