uccwiki

One day, MissionControl will be augmented or replaced with a real monitoring page, driven by Munin or Spong or something, which will make life happy and wonderful. One day, dispense rewrites will be finished. One day, ([DAA], [TRS], [GMB], [AHC], [The latest wheel member]) will graduate.

A dispense rewrite has been finished; [DAA] and [AHC] have graduated; let's make a start on the first item.

Thinking about monitoring

Monitoring covers both alerting (letting people know that stuff is broken) and trending (graphing utilisation or capacity). Both are important; an alert that we have run out of DHCP leases is useful but so is knowing that we are running close to 90% utilisation all the time.

So we want to monitor:

Basically what we actually care about is services, but most monitoring software is set up to think about machines. Sigh.

The original intention behind @ucc_status was to receive automatic alerts and disseminate them over SMS, but it has always been updated manually. So we could either hook the alerting software up to a new Twitter account or just push everything to ucc_status in the interest of transparency. We could also insert information into the Phonehome database or email it to hostmaster (e.g. disk utilisation).

uccwiki: MissionControl/OneDay (last edited 2012-03-09 16:55:32 by motsugo)