Governance by those who do the work.

Saturday, August 20, 2011

Script to Monitor Websites

As the maintainer of several websites, it is annoying to learn that a website has been down for hours or days only by email from a frustrated member of the browsing public.

What is needed is an automated method to periodically check websites and notify me when their status changes (up or down). This method must involve a computer other than the one hosting the website, because a stopped computer can't notify me of anything.

I have created a Unix script "wstatus" to be run periodically by crontab. Its first argument is a URL to retrieve from the website; the second is the email address to send notifications to. The wstatus script puts small files in a $(HOME)/.status/ directory (.status in your home directory).

The crontab entry:
*/10 * * * * $HOME/bin/wstatus jaffer@localhost
reads "server-status.txt" from "" every 10 minutes. If the content or error retrieved differs from the last, then the first line of that content or error is emailed to me (locally). "server-status.txt" contains just the line " is up!".

I wonder how often my ISP changes my IP-address:
25 * * * * $HOME/bin/wstatus jaffer@localhost
checkes once an hour (at 25 minutes past). Of course, if I move my computer to another network, I get an email notifying me of the change. If you use, be sure to observe their rule: We ask that automated files hit our site no more then once every five minutes or once every 300 seconds.

The early version of wstatus consisted of just the black code below. But it occasionally sent two email notices for a single change of status when the cron period was small. I believe this happens because cron starts its child processes with less (numerically higher) priority than users have; and these processes queue up while higher priority jobs are running.

The code in red was added to dismiss a wstatus process if another is already running. Among the basic Unix file commands, only rm (ReMove file) returns a status allowing it to be used as a mutex. Thus, to grab the mutex the script tries to remove a file with the extension ".idl". If it succeeded, the script continues to run; otherwise it exits immediately.

This will work except if the script is interrupted and does not restore the ".idl" file. To handle this case, the line in green was added to restore the ".idl" file when the script exits because it doesn't grab the mutex. Whereas the code without the green line will generate only one email per change no matter how many wstatus processes are running simultaneously, with the green line, only suppression of the second wstatus process is guaranteed. If it is the case that double processes occur more frequently than higher multiples, then this approach is reasonable.

#! /bin/sh
# Copyright (C) 2011 Aubrey Jaffer
if test -z "$1"; then
    echo "$0: Missing URL (first) argument" 1>&2
    exit 1
elif test -z "$2"; then
    echo "$0: Missing email (second) argument" 1>&2
    exit 1
host=`echo $1 | sed 's%.*//%%' | sed 's%/.*%%'`
mkdir -p ${HOME}/.status/
if test -f ${sfile}; then
    if ! rm ${mfile} > /dev/null 2>&1; then
        touch ${mfile}
        exit 0
    cp -f ${sfile} ${sfile}~
if ! wget -q -T 30 -O- $1 > ${sfile} 2>&1; then
    echo could not reach \"`echo $1 | sed 's%.*//%%'`\" > ${sfile}
if ! (test -f ${sfile}~ && diff -q ${sfile}~ ${sfile} >/dev/null 2>&1;) then
    mail -s "`head -1 ${sfile}`" $2 < ${sfile}
touch ${mfile}

The use of rm as a mutex is rare, perhaps novel. I have not tested whether rm is actually atomic; but even if it isn't, it reduces by an order of magnitude the window of opportunity for repeated messages from wstatus.