uccwiki

It would be nice if we had some external monitoring for the UCC, which could use the Twitter feed to alert Wheel members and others if systems and services go down.

In the first iteration Smokeping seems ideal for this. It can check for ICMP reachability, as well as checking the availability of a number of other service types (SSH, HTTP, POP3, IMAP, etc) through plugins.

An installation could be set up both within the UCC (which could also check LDAP and other internal-only services) and external to UWA.

Things that might suck:

Notification could be done through a script similar to the following:

ALERTNAME="$1"
TARGET="$2"
LOSS_PATTERN="$3"
RTT_PATTERN="$4"
ALERT_HOSTNAME="$5"
RAISE="$6"

if [ $RAISE -eq 1 ]; then
   MESSAGE="Smokeping: connectivity lost to $TARGET from $ALERT_HOSTNAME"
else
   MESSAGE="Smokeping: connectivity restored to $TARGET from $ALERT_HOSTNAME"
fi

curl -q --config - --data-ascii "status=`echo "$MESSAGE"|tr ' ' '+'`" "http://twitter.com/statuses/u
pdate.json" 1>/dev/null 2>&1 <<EOF
--basic
--user "ucc_status:PASSWORD"
EOF

An example config might look something like this:

*** General ***
owner = UCC Wheel Group
contact = [email protected]
sendmail = /usr/sbin/sendmail
imgcache = /var/www/smokeping
imgurl   = ../smokeping
datadir  = /var/lib/smokeping
dyndir   = /var/lib/smokeping/__cgi
piddir   = /var/run/smokeping
smokemail = /etc/smokeping/smokemail
tmail    = /etc/smokeping/tmail
precreateperms = 2775
cgi-url = http://somedomainhere.com/cgi-bin/smokeping.cgi

*** Database ***

-- use default debian config here

*** Presentation ***

-- use default debian config here

*** Probes ***
+ FPing
binary = /usr/bin/fping
packetsize = 1000

+ DNS
binary = /usr/bin/dig
lookup = ucc.gu.uwa.edu.au
pings = 5
step = 180

+ EchoPingHttp
pings = 5
url = /

*** Targets ***
probe = fping
alerts = lossdetection

menu = Top
title = Network Latency Grapher
remark = Welcome to this SmokePing website.

+ UCC
menu = UCC
title = UCC Machines & Services

++ madako
host = madako.ucc.gu.uwa.edu.au

FIXME MORE WORK HERE

*** Alerts ***
to = |twitterscript
from = smokeping@YOURHOST

+lossdetection
type = loss
edgetrigger = yes
# in percent
pattern = ==0%,==0%,==0%,==0%,>20%,>20%,>20%
comment = sudden packet loss

uccwiki: SmokeMonitor (last edited 2009-12-23 14:49:54 by localhost)