Traditionally, the UCC's backup system has left much to be desired, such as "existing" or "running on reliable hardware", let alone living up to the Rule of Three.

Backups at present run on Mollitz, which is housed in another location (off-site backups!). Contact <[email protected]> if you need to know where it is living.

Backups are run using rdiff-backup, a disk-based incremental backup system. These are managed by rdiff-manager, a Python wrapper written by DavidAdam.

Adding new machines

On the target system (the machine you want to back up):

  • Make sure the UCC backup key is installed (e.g. with authroot)
  • Install rdiff-backup packages.

On the backup server (mollitz):

  • Copy /backups/conf/example-copy-me to /backups/conf/<HOSTNAME>.conf

  • Edit /backups/conf/<HOSTNAME>.conf as required - the syntax is documented in rdiff-backup(1) under FILE SELECTION

  • Add the SSH host key using su -c 'ssh-keyscan HOSTNAME >> ~/.ssh/known_hosts' backups

Now wait until the nightly backup run. The output is:

  • sent by email to hostmasters
  • successful backups leave a log in /backups/<HOSTNAME>/rdiff-backup-data/backup.log and /backups/<HOSTNAME>/rdiff-backup-data/session_statistics.<TIMESTAMP>.data

  • partially successful backups leave a log in /backups/<HOSTNAME>/rdiff-backup-data/error_log.<TIMESTAMP>.data.gz

    • /!\ In some cases, e.g. if a particular file constantly changes during each and every backup run, a successful backup or update may never be possible

    • TODO: this could be mitigated by backing up a stable snapshot, instead of the live filesystem

Checking backup status

To list all the backups available for a particular host, or to see when it was last successful, on the backup server (Mollitz) run:

  • rdiff-backup --list-increments /backups/<HOSTNAME>

To list how much data is taken for each incremental backup (which is much slower), on the backup server (Mollitz) run:

  • rdiff-backup --list-increment-sizes /backups/<HOSTNAME>

Restoring a backup

To restore files from backup, on the backup server (Mollitz):

  • Run rdiff-backup --list-increments /backups/<HOSTNAME> and choose a backup to restore from

  • Copy the timestamp from the increment list - 2022-02-22T02:00:03+08:00

  • Decide where you are going to restore the files - locally (i.e. to Mollitz), where you can inspect them, or back to the remote host

Restoring files locally

  • Run mkdir /backups/tmp/<HOSTNAME>

  • Run rdiff-backup --restore <TIMESTAMP> /backups/<HOSTNAME>/path/to/file-or-directory/you/want/to/restore /backups/tmp/<HOSTNAME>

For example, using rdiff-backup -r  2022-02-22T02:00:03+08:00 /backups/merlo/etc /backups/tmp/merlo will restore the contents of merlo's /etc/ directory as of 22nd February to /backups/tmp/merlo.

Restoring files remotely

/!\ Be careful - this is easy to mess up, particularly if you are trying to restore to the original path. Note the double-colons in the restore path!

  • Run rdiff-backup --restore <TIMESTAMP> /backups/<HOSTNAME>/path/to/file-or-directory/you/want/to/restore root@<HOSTNAME>::/path/to/file-or-directory/you/want/to/restore

For example, rdiff-backup -r  2022-02-22T02:00:03+08:00 /backups/merlo/etc merlo::/restored/etc will restore the contents of merlo's /etc/ directory as of 22nd February to /restored/etc on Merlo.

Improvements to rdiff-manager

rdiff-manager is pretty simple but there is plenty of room for improvement. Check the TODO file in the distribution for ideas.


CategorySystemAdministration

Latest update on the backup system

[ROY] 20240930, 20241111

This section is about a temporary solution for rdiff-backup's version discrepancy issue which causes some machines can't be backup correctly.

molmol space dataset

It's a cronjob on molmol calling /root/zfs-send-script.sh everyday at 02:00, which take incremental snapshot on Space dataset first, then zfs send to dell-ph1 over ssh. Similar cronjob applies to dell-ph1->dell-ph2, 05:00 every Saturday.

Sample script could be found under wheel/docs.

good'ol rdiff-backup style backup

Also a cronjob, on dell-ph1, calling all .sh scripts under /etc/rsync-conf/ everyday 03:00; then take zfs snapshot (via /etc/rsync-conf/zfs-snapshot, on the whole dpool zfs pool) everyday 04:00 to create diffirential entries.

data retention

Snapshots older than 6mo will be destroyed on all 3 machines, by cronjobs.

TODO

  • ZFS-send all dataset: dell-ph1=>dell-ph2

PVE VMs

All machines are backup to wobbegong via PBS, scheduled every Saturday 00:00.