It would be nice if we had some external monitoring for the UCC, which could use the Twitter feed to alert Wheel members and others if systems and services go down.
In the first iteration Smokeping seems ideal for this. It can check for ICMP reachability, as well as checking the availability of a number of other service types (SSH, HTTP, POP3, IMAP, etc) through plugins.
An installation could be set up both within the UCC (which could also check LDAP and other internal-only services) and external to UWA.
Things that might suck:
- smokeping configuration is kinda verbose
- smokeping has a concept of priority based on alert type, not host (so if an alert condition matches in multiple cases on a single host only one alert will be sent) which means if madako goes down it will alert on -all- unavailable services... which could get a bit noisy.
Notification could be done through a script similar to the following:
ALERTNAME="$1" TARGET="$2" LOSS_PATTERN="$3" RTT_PATTERN="$4" ALERT_HOSTNAME="$5" RAISE="$6" if [ $RAISE -eq 1 ]; then MESSAGE="Smokeping: connectivity lost to $TARGET from $ALERT_HOSTNAME" else MESSAGE="Smokeping: connectivity restored to $TARGET from $ALERT_HOSTNAME" fi curl -q --config - --data-ascii "status=`echo "$MESSAGE"|tr ' ' '+'`" "http://twitter.com/statuses/u pdate.json" 1>/dev/null 2>&1 <<EOF --basic --user "ucc_status:PASSWORD" EOF
An example config might look something like this:
*** General *** owner = UCC Wheel Group contact = [email protected] sendmail = /usr/sbin/sendmail imgcache = /var/www/smokeping imgurl = ../smokeping datadir = /var/lib/smokeping dyndir = /var/lib/smokeping/__cgi piddir = /var/run/smokeping smokemail = /etc/smokeping/smokemail tmail = /etc/smokeping/tmail precreateperms = 2775 cgi-url = http://somedomainhere.com/cgi-bin/smokeping.cgi *** Database *** -- use default debian config here *** Presentation *** -- use default debian config here *** Probes *** + FPing binary = /usr/bin/fping packetsize = 1000 + DNS binary = /usr/bin/dig lookup = ucc.gu.uwa.edu.au pings = 5 step = 180 + EchoPingHttp pings = 5 url = / *** Targets *** probe = fping alerts = lossdetection menu = Top title = Network Latency Grapher remark = Welcome to this SmokePing website. + UCC menu = UCC title = UCC Machines & Services ++ madako host = madako.ucc.gu.uwa.edu.au FIXME MORE WORK HERE *** Alerts *** to = |twitterscript from = [email protected] +lossdetection type = loss edgetrigger = yes # in percent pattern = ==0%,==0%,==0%,==0%,>20%,>20%,>20% comment = sudden packet loss