Differences between revisions 8 and 9
Revision 8 as of 2009-12-22 20:22:42
Size: 2780
Editor: BobAdamson
Comment: /me can't wait 'til the wiki upgrade...
Revision 9 as of 2009-12-23 14:49:54
Size: 2780
Editor: localhost
Comment: converted to 1.6 markup
No differences found!

It would be nice if we had some external monitoring for the UCC, which could use the Twitter feed to alert Wheel members and others if systems and services go down.

In the first iteration Smokeping seems ideal for this. It can check for ICMP reachability, as well as checking the availability of a number of other service types (SSH, HTTP, POP3, IMAP, etc) through plugins.

An installation could be set up both within the UCC (which could also check LDAP and other internal-only services) and external to UWA.

Things that might suck:

  • smokeping configuration is kinda verbose
  • smokeping has a concept of priority based on alert type, not host (so if an alert condition matches in multiple cases on a single host only one alert will be sent) which means if madako goes down it will alert on -all- unavailable services... which could get a bit noisy.

Notification could be done through a script similar to the following:

ALERTNAME="$1"
TARGET="$2"
LOSS_PATTERN="$3"
RTT_PATTERN="$4"
ALERT_HOSTNAME="$5"
RAISE="$6"

if [ $RAISE -eq 1 ]; then
   MESSAGE="Smokeping: connectivity lost to $TARGET from $ALERT_HOSTNAME"
else
   MESSAGE="Smokeping: connectivity restored to $TARGET from $ALERT_HOSTNAME"
fi

curl -q --config - --data-ascii "status=`echo "$MESSAGE"|tr ' ' '+'`" "http://twitter.com/statuses/u
pdate.json" 1>/dev/null 2>&1 <<EOF
--basic
--user "ucc_status:PASSWORD"
EOF

An example config might look something like this:

*** General ***
owner = UCC Wheel Group
contact = wheel@ucc.gu.uwa.edu.au
sendmail = /usr/sbin/sendmail
imgcache = /var/www/smokeping
imgurl   = ../smokeping
datadir  = /var/lib/smokeping
dyndir   = /var/lib/smokeping/__cgi
piddir   = /var/run/smokeping
smokemail = /etc/smokeping/smokemail
tmail    = /etc/smokeping/tmail
precreateperms = 2775
cgi-url = http://somedomainhere.com/cgi-bin/smokeping.cgi

*** Database ***

-- use default debian config here

*** Presentation ***

-- use default debian config here

*** Probes ***
+ FPing
binary = /usr/bin/fping
packetsize = 1000

+ DNS
binary = /usr/bin/dig
lookup = ucc.gu.uwa.edu.au
pings = 5
step = 180

+ EchoPingHttp
pings = 5
url = /

*** Targets ***
probe = fping
alerts = lossdetection

menu = Top
title = Network Latency Grapher
remark = Welcome to this SmokePing website.

+ UCC
menu = UCC
title = UCC Machines & Services

++ madako
host = madako.ucc.gu.uwa.edu.au

FIXME MORE WORK HERE

*** Alerts ***
to = |twitterscript
from = smokeping@YOURHOST

+lossdetection
type = loss
edgetrigger = yes
# in percent
pattern = ==0%,==0%,==0%,==0%,>20%,>20%,>20%
comment = sudden packet loss