Proxmox VE is used by UCC as a virtual machine management infrastructure, and is a lot like VMWare. It is based on KVM/QEMU and LXC, and is built on Debian, which UCC likes because it's FREE.

Info for Users

Getting a VM

This will require wrangling a wheel member to help you create a machine through the Proxmox interface. Beware the Wheel member may say no to giving you a VM for any reason. Setting up a VM does take some time, so don't expect them to drop what they're doing and create it for you on the spot. The best way to do this is to jump on IRC, Discord, or email [email protected]. Assuming the wheel member gives you a full VM and not just a container, what you should end up with is essentially an empty computer - you will need to install an OS on it (if you don't use the default template) and SECURE IT yourself.

VM deletion

Like most member services, any VMs you create will be stopped once you're no longer a club member. However, due to limited resources, VMs may be deleted after a year of inactivity.

Setting Up Your VM

Logging Into The Interface

First, you must be on the UCC network - the web interface is fully firewalled off from the outside internet for security reasons. To get on the UCC network, use a clubroom machine, connect to the UCC wireless, or from anywhere else, connect to the UCC VPN.

For those who know ssh or don't want to connect using the other methods, you can also ssh-forward to proxmox: ssh -fN -L 8006:medico:8006 [your-username]@ssh.ucc.asn.au for example.

Once you are on the UCC network, browse to https://medico.ucc.asn.au:8006 (or https://localhost:8006 if using ssh-forward) from any modern browser and log in using your UCC username and password.

Installing an OS

To install the OS on your machine, you can either boot an installer from an ISO, or use UCC's netboot setup to install your OS. By default, all machines on the VM network without an OS will netboot, however installing from an ISO tends to be more reliable.

All gumby users (that's you if you're not on wheel) have upload privileges to a store of installer ISOs via the Proxmox web interface. You can mount an ISO from the ISOs storage location via your VM's Hardware tab by selecting the CD drive and clicking edit. If you have problems uploading to the ISOs storage area though the web interface, contact your friendly neighbourhood Wheel member and they can put it directly into /services/iso.

Once You Have Installed Your OS

There are a couple of things you need to tell the person who set up your machine - its hostname, its MAC address and its IP address. Then they will be able to set up DNS and a static DHCP entry. You will also need to let them know which firewall ports you need unblocked and which services you are running if you wish to run any externally accessible services on your machine - we generally don't let non-wheel members do their own firewalling. Gumby users must also nominate a wheel member who can have root ssh or console access on your machine for auditing purposes - for ssh their key then needs to be copied into /root/.ssh/authorized_keys or for console access they must have a fully privileged local account.

To find your MAC address, go to the Hardware tab of your VM in Proxmox, and double click on Network Device.

The hostname you can decide for yourself, but default will be $USERNAME-vm.ucc.asn.au.

  • PLEASE NOTE: It cannot be the same as your UCC username as this would clash with your member website. If you want to change that, talk to Wheel but it is an involved process.

It will likely be assigned an IP address automatically, but ask the wheel member what it is if you need it.

Securing Your Machine

TODO - ask a Wheel member what to do for now.

Info for Administrators

Authentication

Out of the box, the web interface uses the username root and the root password of the host. The LDAP implementation in Proxmox isn't "true" LDAP in that Proxmox only looks at LDAP for authentication and cannot consult LDAP for a list of users or group permissions. Other users can be added by creating their username in the web interface and setting the authentication realm to UCC's LDAP. The username must correspond to a UCC LDAP username.

To add yourself to the administrator's group, SSH to medico or any of the VM hosts, and run something like:

pveum useradd accmurphy@UCCDOMAYNE -group Administrator -firstname ACC -lastname Murphy -email [email protected]

Alternatively, get another administrator to create your user through the web interface. Contact wheel if you are unable to login twice in a row or you will be locked out.

Storage

Virtual machines should be stored as .raw or .qcow2 images in the appropriate vmstore area. Storage is managed at the cluster level, so every storage device is available to (or created on) every node, unless it is explicitly limited to a particular node or group of nodes. This is necessary in order to migrate machines to different nodes without having to backup and restore the VM's disk to a different path.

On the Atlantic cluster, there are several vmstores per node;

  • /vmstore-ssd (also known as /vmstore-ssd_vm) is a Ceph volume made up of many different SSDs across the cluster. This allows for quick live-migration between nodes and has very fast IO access. This (and vmstore-bigssd) are the preferred options for VMs these days.
  • /vmstore-bigssd is an extremely similar Ceph volume to /vmstore-ssd. I honestly do not know the difference, but they are two seperate volumes.
  • /nas-vmstore is a part of the /space ZFS volume on Molmol mounted over NFS. It can be used for VMs and containers as it has plenty of space, but will have slower performance. If they need a lot of space but not fast IO, place it here (or use it for additional disk attached to VMs).

  • localstorage is local on every node and is part of the default install and it cannot be removed. This permits storing VMs and containers locally on a node, but does not allow live migrations. Beware that each node may have its local disks set up with different levels of reliability and amounts of space. Avoid using local storage if at all possible.
  • loveday has additionally got its own single spinning disk as storage for VMs while the machine is at camp. The only way to get VMs on to it is to stop them and migrate them to this storage. It is not in raid - do NOT use it for day-to-day use.

Creating basic member VMs

We now have a template for basic member VMs that will allow you to very quickly setup a VM and get it going. It comes with a very basic Debian 12 install with some creature comforts, and also has cloud-init setup to easily set up SSH keys and logins.

  1. Search for VM105, memberVM-nogui-template. Click the More tab along the top of the VM page, and click Clone.

  2. Set Name as the hostname of the VM, set node to the desired host, and set the resource pool to Member-VMs.
  3. Set the Mode as Full Clone, and the target storage as the appropriate volume (if you don't know, use vmstore-ssd).
  4. This will then create a VM on the selected host with the set name. This will have the same description as the template, please update this according to the description outline in the non-standard VM setup guide below.
  5. Under the permissions tab:
    • Add a user permission for the owner of the VM and give them the role "UCC_VM_User". This allows the user to do anything you could do to a physical machine without taking the cover off. If the user doesn't exist yet, see the Authentication section on how to create it.

  6. In addition to updating the description, also create a DHCP and DNS record for the VM, using the MAC address found in the Network Device under the Hardware tab.

Cloud-init is a way of quickly changing many different settings and variables of the VM upon startup/creation. Under the Cloudinit tab, you can set the default username and password (will be members of sudo), add an SSH key for the main user, and to auto-upgrade packages on boot. For more info, see the Proxmox Wiki.

Adding non-standard member VMs

These instructions are for creating a VM for a general UCC member - do not blindly follow them for UCC servers.

To create the actual machine:

  1. Log in as an administrator
  2. Click on "Create VM" in the top right corner.
  3. Under General, set Name as the hostname of the VM, set node to the desired host, set the resource pool to Member-VMs and click Next.
  4. Under OS, select the desired OS for the VM and click Next.
  5. Under System, check the Qemu Agent box and click Next.
  6. Under Disks, set the following options for the hard disk:
    • Bus device: SCSI 0
    • Storage: vmstore-ssd
    • Disk size: 50
    • Format: raw
    • Cache: default (no cache)
    • Then click Next.
  7. Under CPU, set the number of cores to 2, and click Next.
  8. Under Memory, set memory to 4GB (4096 MiB) and click Next.
  9. Under Network, set the VLAN tag to 4 (the member VM VLAN). NB: for VLAN 2, set "No VLAN" or things will break.
  10. Under Confirm, check the details are correct, and click Finish.

The Qemu Agent should be set to Enabled, either in the setup options, before boot in the Options tab of the VM, or with: qm set NEW_VM_ID --agent 1.

Later, within the running machine: apt install qemu-guest-agent.

Under the summary tab of the newly created VM, go to the notes section and add a comment with the following information:

  • the member TLA

    • and/or (name and username of the owner, plus any additional contact email addresses)

  • date of creation
  • date that the VM can be deleted if it's just for testing
  • the VM's purpose
  • the hostname of the VM (if it's different to what it was named in proxmox)
  • the IP address of the VM (if the qemu-guest-agent is not installed and enabled)

  • any other pertinent information that may be helpful to management, such as extra SSH authorized_keys access

To edit the notes, triple-click the notes section.

Under the options tab:

  • Change "Start at boot" to yes

Under the permissions tab:

  • Add a user permission for the owner of the VM and give them the role "UCC_VM_User". This allows the user to do anything you could do to a physical machine without taking the cover off. If the user doesn't exist yet, see the Authentication section on how to create it.

The VM should now boot, however it is essentially a blank machine and will netboot.

The VM will get an IP address from DHCP, however this should be set to a static entry in Murasoi as soon as the MAC address of the VM is known (which will be when the machine is created before you start it for the first time) in order to avoid conflicts. Also add a DNS entry on Monnik as you would for a physical UCC machine.

Adding Containers

The information below has not been updated for Proxmox 4 (and we are now running Proxmox 8!!!), which notably uses LXC containers instead of OpenVZ. One major drawback of LXC containers is that they cannot currently be live migrated. Use containers at your own risk - you WILL have more outages and you WON'T be warned before they are turned off.

An OpenVZ Container, or CT, is a paravirtualised environment. It is more like a chroot on steroids than a full virtual machine, and it uses the host kernel but a separate userland environment. Container technology allows you to set a quota on disk, memory and CPU usage, but unused resources can be shared. If you just need a clean environment to run a few daemons or test something out, a container makes better use of our resources.

  1. Log in as an administrator.
  2. Click on "Create CT" in the top right corner
  3. Set the following general options:
    • Name: the hostname of the VM
    • Resource pool: Member-VMs (or as appropriate)
    • Storage: nas-vmstore (or as appropriate - see Storage above)
    • Password/confirm password: the root password for your new container
  4. Click Next.
  5. Select a template (base image for the operating system). debian-7.0-standard or similar is probably the way to go.

  6. Click Next.
  7. Set appropriate resource limits. Remember that these are maximums, not guaranteed minimums, so you can set them quite high.
    • Memory: 2048 MB
    • Swap: 512 MB
    • Disk size: 50 GB
    • CPUs: 2
  8. Click Next.
  9. Network: unfortunately the UI for setting network options in containers is not as full-featured as VMs; in particular, there is no way to set a VLAN tag through the web UI.

    • If a machine room IP is appropriate (probably not), you can add that straight in as a 'Routed mode' IP, or do static configuration with 'Bridged mode' to vmbr0.

    • To use the more appropriate clubroom or VM networks, we will have to come back later. Choose 'Bridged mode', and continue once the container is created to edit the configuration on the command line.
  10. Click Next.
  11. Leave the DNS settings alone; click Next.
  12. Click Finish once you are happy with the settings. The container will be created, the template unpacked and the appropriate settings applied.

Under the summary tab of the newly created CT (container), go to the notes section and add a comment with the following information:

  • the member TLA

    • and/or (name and username of the owner, plus any additional contact email addresses)

  • date of creation
  • date that the CT can be deleted if it's just for testing
  • the CT's purpose and reason for choosing a CT over a VM
  • the hostname of the CT (if it's different to what it was named in proxmox)
  • the IP address of the CT
  • any other pertinent information that may be helpful to management, such as the intended users and administrators

To edit the notes, triple-click the notes section.

Under the options tab, set 'Start at boot'.

To manually manage the network configuration, keep following these directions:

  1. Take note of the container ID - the number next to the hostname in the description at the top of the screen or in the left-hand server list. The number 999 is used below; replace this with the appropriate ID.
  2. Log on to medico as root via SSH.
  3. Run the following command (with the correct container ID) to wipe out the existing network configuration:

vzctl set 999 --netif_del all --ipdel all --save
  1. Choose the correct bridge interface. For VLAN 3 (clubroom), use vmbr0v3 and for the VM network use vmbr0v4.

  2. Run the following command to add a new bridge, with the appropriate bridge device and container ID:

vzctl set 999 --netif_add eth0,,,,vmbr0v4 --save

You can now start the container and log in using the console.

You will probably need to set up the interfaces as you normally would; add something like this to /etc/network/interfaces on Debian:

auto eth0
iface eth0 inet dhcp

Permissions

If a user has more than one VM, it is worth creating a dedicated resource pool for that user. A resource pool is just a way of grouping several VMs together and allows permissions to be applied to the pool, which then propagates to all VMs in that pool. Create a pool by going to Datacenter->Pools->Create. After the pool appears in the menu tree, click on the pool and add any existing VMs for the user to that pool. Don't forget to then add PVEVMUser permissions to the pool.

Resizing VMs

Resizing disks can be done through the web interface by going to the VMs hardware tab, selecting the hard disk, and then selecting resize. Note this only allows the growing of disks - there is no way to shrink a volume once it has been grown aside from copying the data to a new image. Only some OS's will recognise a size change online, so the VM may need to be rebooted. Also note that resizing the disk will not resize the partitions or file systems, this is extra and out of scope of this page. See http://pve.proxmox.com/wiki/Resizing_disks and https://pve.proxmox.com/wiki/Resize_disks for more info.

Command Line Management

Virtual machines are managed using the qm tool. Containers are managed using the pvectl tool (though you can use vzctl as well.

There is more information on command-line tools on the Proxmox wiki.

Troubleshooting

Corosync not working

Seemingly, if the network restarts abrutly, corosync can get really confused and start flooding the network (and importantly, stop working altogether) The solution to this condition is to stop the corosync service on all hosts, then bring each one up in sequence.

Info for Installers

Upgrades

From the upgrades from Proxmox 6 to Proxmox 8 (and the corresponding Ceph version), we simply followed the Proxmox wiki's update guide , which are usually made a few days after the update, and will be edited with pitfalls and advice over the next few months. The update guide is simple and comprehensive and is all you should need to upgrade the cluster (as long as you actually read and follow the steps). E.g. https://pve.proxmox.com/wiki/Upgrade_from_7_to_8

Installation

Proxmox can be installed using either a baremetal installer iso or an existing Debian installation (check kernel versions as Proxmox replaces the existing kernel). The problem with the baremetal installer is that it does not allow you to set up your own logical volumes and doesn't give you the option of software raid. IT WILL ALSO EAT ANY OTHER DISKS ATTACHED TO THE MACHINE, disconnect disks you don't want lost if using the baremetal installer. So machines such as Medico and Maltair had Proxmox installed on top of pre-installed Debian. Ensure the version of Debian you're installing is compatible with the version of Proxmox you want.

Installation is incredibly easy by following the instructions in the Proxmox VE Installation Page. Ensure that the Debian install follows the UCC almost-standard layout, with separate rootusr, var, boot, and home logical volumes. Put /var/lib/vz in its own logical volume, as this is where local VMs are stored by default.

Things missed by the manual installer

  • The notable instruction that is missing in the wiki page is to enable Kernel Samepage Merging (KSM) on the host, which is a memory de-duplication feature - google how to enable it and enable it with a line in /etc/rc.local (check Motsugo's for an example)

  • The proxmox installer fails to change the network configuration file to be suitable for virtual machines; check out the default configuration in Proxmox Network Model and modify /etc/network/interfaces to suit.

  • All the other items on the SOE page, with the exclusion of LDAP, NFS, dispense and most of the other user programs.

  • IPv6 configuration. Look at Motsugo's or Medico's config for an example.

  • Set up fail2ban on the web and ssh interfaces
  • Add entries for all other nodes and the storage server to the hosts file. We don't want the cluster dependent on DNS.
  • Uninstall rdnssd if you want dns to work. It clobbers the resolv.conf file with an ipv6 address and is generally a pain in the butt. Then set dns servers and search domains in /etc/network/interfaces
  • If using mirrored raid, install grub on both disks so things will still boot in the event of a disk failure.

Security

THIS IS CRITICAL FOR THE NODE TO FUNCTION IN THE CLUSTER, DO NOT IGNORE THIS STEP. Do not push wheel keys to the node - Corosync (the tool that replicates the cluster configuration across all nodes) will look after syncing wheel keys to the node once it's added to the cluster, and the push script is configured to only push to a single node in order to maintain wheel keys. Part of adding the machine to the cluster creates keys that allow every node root access to every other node, and these are appended to the authorized keys file, so you need to add the root key for the new node to the extra-maltair file before push.sh is run again, else the root key will get overwritten.

Security is paramount on a VM host because of the high potential for damage if the machine is compromised. Central fail2ban is set up to monitor the webpage and the ssh interface (see https://pve.proxmox.com/wiki/Fail2ban and http://blog.extremeshok.com/archives/810), however it is imperative that central logging is configured and TESTED for this to work. The web interface must not be unfirewalled to outside the UCC network under any circumstances.

NFS mounts are initially forbidden for containers by AppArmor: https://unix.stackexchange.com/questions/396678/access-denied-when-trying-to-mount-nfs-share https://forum.proxmox.com/threads/nfs-file-system-mount-problem-apparmor.31706/

$ dmesg -T
...
[Mon Feb 19 13:46:44 2018] audit: type=1400 audit(1519047865.509:122): apparmor="DENIED" operation="mount" info="failed type match" error=-13 profile="lxc-container-default-cgns" name="/away/" pid=19745 comm="mount.nfs" 
# vi /etc/apparmor.d/lxc/lxc-default-cgns
...
  mount fstype=nfs*,
  mount options=(rw, bind, ro),
# systemctl reload apparmor

Post-install Configuration

With a single VM host, you would have to configure the storage locations and authentication methods - this is now controlled by the cluster and will be automatically taken care of when you add the node to the cluster. So...not a lot to do except:

  • add the node to the cluster

  • Check if adding the ~/.ssh/id...pub key from the new host to the shared authorized_keys - appears needed with v6.4?

See Also

The SOE says how to do some of the things this page tells you to do.