Installing Gentoo with KVM Support and a Template VM

0.1 2011-07-10 Overview

Synopsis

My motivation was to replace four separate real systems that I run 24/7 as firewall (P3), web server (U10), mail server (XP 2500+) and test system (XP 3200+) with a single low power box that can do it all, without compromising the security provided by four physically separate boxes. My power meter shows this will save abut £40/month in running costs.

As the install on the host, the bare metal install will be minimal, we will use a Gentoo hardened system, pared to the bone. There is no point in not using hardened.

There is no reason at all to run any extra services on the bare metal. Its sole purpose is to support virtual machines. Should you need another service, make another virtual machine.

This document will use host provided logical volumes and the virtio hard disk and network drivers. At the time of writing, these drivers provide near native performance without any known security issues.

As there are no live CDs that provide the virtio drivers, using them in guests is a two step process. Get the guest running on its own kernel in the conventional way, then swap drivers.

This document was written around an install on an AMD Athlon(tm) II Neo N36L Dual-Core Processor, with odds and ends tested on AMD Phenom(tm) II X6 1090T Processor.

Installs on Intel CPUs are similar.

As we shall use LVM for the VM storage, we may as well use it for the host too.

To move with the times we shall abandon fdisk for parted and MSDOS Partition Tables for GPT. Thats in anticipation of hard drives bigger than 2TiB

Attributions

TIn setting up my own system, I drew heavily on The Fedora Virtualization Guide ... Fedora Project ... http://docs.fedoraproject.org/en-US/Fedora/13/html/Virtualization_Guide/index.html and Setting Up Virtual Machines with KVM http://pacita.org/books/server-setup/output/pdf/doc.pdf

System Requirements

A modern 64 bit Intel or AMD processor with hardware support for virtualisation. Hardware support is not strictly necessary. Exactly what you need depends on what your load will be

This document assumes you are installing KVM on a purpose built remote box. Remote may only be a few feet away but it is intended that everything after the bare metal can boot for itself will be done over ssh, or using Virtual Machine Manager.

As its normal to set up VM on a server, the use of kernel raid, and root on lvm over the raid will be described. The raid and lvm steps are optional for the host install.

lvm will be required for the VM storage pool even on single drive installs. Its perfectly possible to have VM storage in a file on the host but this is suboptimal and will not be described in this document.

Getting Started

Partitioning and Making Filesystems

Boot the live CD/DVD of your choice and use parted to partition all of your drives identically. The following partitions are required.

boot - 32M
host LVM space 30G
VM LVM space - rest of the drive

This allows some space for expansion in the host. LVM supports on-line resizing, so its possible to grow a partition without a reboot.

Before diving into parted to make disk labels (partition tables) and partitions Think about what is needed. This document uses an example of three drives. /boot is easy, it will be a three way raid1 set, so 32Mb from each drive is required. The 30G raid5 for the host install is not quite so straightforward. For a three drive raid5, each partition needs to be 15G. Keep that in mind as you use parted

parted /dev/sda
mklabel  gpt
mkpart primary 0% 32M
mkpart primary 32M 15G
mkpart primary 15G 100%
name 1 boot
name 2 host
name 3 virtual
set 1 boot on
quit

Repeat for /dev/sdb and /dev/scc

Modern large hard drives use a 4k byte physical sector size and fake 512 byte sectors by carrying out read/modify/write operations internally. This is very slow. In case of doubt, to minimise the effects of this partitions must be defined in integer multiples of 4k bytes

boot will be raid1, the other two will be raid5. This gives us root on raid5 and lvm, which compels the use of an initrd. Swap will also be on a logical volume.

Grub 1 will not boot from raid other than raid1 with version 0.90 raid superblocks

mdadm --create /dev/md0 --metadata=0.90 --level=1 --raid-devices=3 /dev/sda1 /dev/sdb1 /dev/sdc1
mdadm --create /dev/md1 --level 5 --raid-devices=3 /dev/sda2 /dev/sdb2 /dev/sdc2
mdadm --create /dev/md2 --level 5 --raid-devices=3 /dev/sda3 /dev/sdb3 /dev/sdc3

Donate the two raid5 sets to two lvm physical volumes. It is not essential to have separate volumes for the host and VMs but it avoids accidentally deleting a part of the host file system when you intended to delete a VM.

vgcreate host /dev/md1
vgcreate vm /dev/md2

With the volume groups created, they can be subdivided into logical volumes, which we can finally format as any other block devic and use for our install.

lvcreate --size 512M --name root host
lvcreate --size 4G --name var host
lvcreate --size 4G --name usr host
lvcreate --size 1G --name tmp host
lvcreate --size 512M --name portage host
lvcreate --size 2G --name distfiles host
lvcreate --size 2G --name packages host
lvcreate --size 8G --name swap host

This leaves about 8G of unallocated space in the host volume group for expansion at a later date.

Check your /dev/mapper. It should contain eight logical volumes of the format host-...

Readers of a nervous disposition may wonder about the use of so many partitions for the host. It allows for the efficient use of disk space. Only root and swap are really needed. (/boot os our /dev/md0)

We could use ext2 on tmp, portage, distfiles and packages as they contain things that are easily replaced. However, ext4 has an option to not create a Journal, so we will use ext4, so we will use that everywhere. The use of the dir_index option is really some gentle ricing on the host install but may come into its own later.

mkswap /dev/mapper/host-swap
mke4fs -O ^has_journal,dir_index /dev/md0
mke4fs -O dir_index /dev/mapper/host-root
mke4fs -O dir_index /dev/mapper/host-var
mke4fs -O dir_index /dev/mapper/host-usr
mke4fs -O ^has_journal,dir_index /dev/mapper/host-tmp
mke4fs -O ^has_journal,dir_index -b 1024 -i 1024 /dev/mapper/host-portage
mke4fs -O ^has_journal,dir_index /dev/mapper/host-distfiles
mke4fs -O ^has_journal,dir_index /dev/mapper/host-packages

Without the -b 1024 -i 1024 portage will not fit in the 512M space allocated

Mount the partitions, making the required directories as we go. Its not quate as simple as the three partition layout used by the Gentoo handbook.

swapon /dev/mapper/host-swap
mount /dev/mapper/host-root /mnt/gentoo
mkdir /mnt/gentoo/boot
mkdir /mnt/gentoo/tmp
mkdir /mnt/gentoo/usr
mkdir /mnt/gentoo/var
mount /dev/md0 /mnt/gentoo/boot
mount /dev/mapper/host-tmp /mnt/gentoo/tmp
mount /dev/mapper/host-usr /mnt/gentoo/usr
mount /dev/mapper/host-var /mnt/gentoo/var
mkdir /mnt/gentoo/usr/portage
mount /dev/mapper/host-portage /mnt/gentoo/usr/portage
mkdir /mnt/gentoo/usr/portage/distfiles
mkdir /mnt/gentoo/usr/portage/packages
mount /dev/mapper/host-distfiles /mnt/gentoo/usr/portage/distfiles
mount /dev/mapper/host-packages /mnt/gentoo/usr/portage/packages

Other mount points, like /dev/ and /proc will be created by the stage3. Fetch and install the hardened stage3 in the normal manner.

Kernel Raid and Logical Volume Manager are both extra layers of software between the hardware and the applications. Raid provides redundancy, if a disk fails, your system will beep running. Logical Volume Manager provides flexibility. Logical volumes can be grown and shrunk to move free space around as needed, provided you choose a filesystem that supports resizing.

Swap should be the same size as your RAM as this allows RAM in the VM to be overcommitted. VMs are just processes to the host. When you run of of RAM parts of VMs can be swapped. The alternative is to having too little swap is that the kernel Out Of Memory manager will kick in and maybe kill a VM, which is just like a power fail, only faster.

Installing The Host System

Installing the host system

With the drives partitioned filesystems made and mounted its time to do a normal Gentoo install by otllowing the handbook, with a few minor exceptions.

Tiding up make.conf - after the stage3 and portage snapshot are unpacked
Making package.use - after the stage3 and portage snapshot are unpacked
Making the kernel - additions to the Gentoo handbook

USE Flags and other Settings in make.conf

If you run emerge --info now you will see that it is full of things you will never need on a virtual machine host system. Most of the USE_EXPAND features can be set to the null string to get rid of the clutter. This has no effect on the install but it makes emerge --info easier to interpret. Add the following to /mount/gentoo/etc/make.conf

# Unset the following USE_Expands
ALSA_CARDS=""
ALSA_PCM_PLUGINS=""
APACHE2_MODULES=""
CALLIGRA_FEATURES=""
CAMERAS=""
COLLECTD_PLUGINS=""
GPSD_PROTOCOLS=""
INPUT_DEVICES=""
LCD_DEVICES=""
SANE_BACKENDS=""
VIDEO_CARDS=""

Remove some use flags we do not want by adding the following to the USE= in /mnt/gentoo/etc/make.conf

-X -cups -dri -gnome -kde

Add buildpkg to FEATURES in make.conf. This saves a tarball of every package that is built to /usr/portage/distfiles in a format that can be used by emerge. We shall use most of these packages later.

FEATURES="buildpkg"

If you want to use distcc, you must not use -march=native in CFLAGS unless the helper(s) have identical CPUs. If you don't know what distcc is, you won't be using it.

Creating package.use

Make the directory /mnt/gentoo/etc/portage and add the file package.use with the following contents

# for initrd use, these packages must be statically linked
sys-fs/lvm2 static
sys-fs/mdadm static
sys-apps/busybox static

# for virtual machine support
app-emulation/qemu-kvm sdl threads vde vhost-net

# for libvirt with parted support so we can use lvm storage pools for VM
sys-block/parted device-mapper

# to get consoles in an X window but we don't want an X server
media-libs/libsdl X

app-emulation/libvirt qemu virt-network numa lvm parted pcap phyp udev
# unset libvirt USE flags
# -avahi -caps -debug -iscsi -macvtap -nfs -numa -openvz -sasl (-selinux) -uml -virtualbox -xen

libvirt can support User Mode Linux guests, Virtual Box guests and Xen guests. The -uml -virtualbox -xen USE flag settings disables this support.

Making the Host Kernel

Hardened and LVM Host settings

The settings given here are in addition to your normal hardware support. Should you need option buy option support to build a kernel, kernel-seeds.org is recommended.

Making The Initrd

Introduction to Initial RAM Drives

An initrd is just a root file system in a file which is loaded by the boot loader and left where the kernel can find it at /dev/ram0. It needs some /dev nodes, so it can operate on devices, some programs to run and script to tell the kernel what to do. There are several tools to make initrd files but its easy to do it manually too.

Initrds can do anything the system can do but ours will do the bare minimam to get us booted. It will not load kernel modules or anything fancy. All the drivers you need to boot will still need to be compiled into the kernel.

When the initrd is actually in use, its just the kernel and the initrd in memory, there are no libraries and no dynamic linkers, so all the programs must be statically linked. The size of the initrd files does not really matter as the RAM it occupies is freed as soon as its done its job.

Creating the Initrd

Start by making some space to assemble the initrd. The entire content of this file will be made into a file as the last step in the process.

mkdir /root/initrd

Make directories in /root/initrd. These directories have exactly the same uses as their counterparts on the real root file system, except that /sbin and /bin have been combined into bin.

cd /root/initrd/
mkdir bin dev etc newroot proc sys

Now to populate these directories. bin needs three statically linked programs, busybox, lvm and mdadm. busybox is our shell, lvm manipulates logical volumes and mdadm manipulates raid devices.

cd /root/initrd/bin
cp /bin/busybox /sbin/mdadm /sbin/lvm.static ./
mv lvm.static lvm
ln -s busybox cat
ln -s busybox mount
ln -s busybox sh
ln -s busybox sleep
ln -s busybox switch_root
ln -s busybox umount
ln -s lvm vgchange
ln -s lvm vgscan

dev needs to contain console, null and all the partions donated to raid sets, plus some directories

cd /root/initrd/dev
cp -a /dev/null /dev/console /dev/sda1 /dev/sda2 /dev/sda3 /dev/sdb1 /dev/sdb2 /dev/sdb3 /dev/sdc1 /dev/sdc2 /dev/sdc3 ./
mkdir mapper vc

If your raid sets have more than three members add the other dev entries above

/root/initrd/dev/mapper is empty, /root/initrd/dev/vc contains a relative symlink called 0 (zero) to /dev/console

cd /root/initrd/dev/vc
ln -s ../console 0

That is a numeral 0 not an uppercase letter O in the symlink

etc, newroot, proc and sys are intentionally empty. We still need the init script. Use nano to copy the script below

#!/bin/sh

rescue_shell() {
    echo "Something went wrong. Dropping you to a shell."
    busybox --install -s
    exec /bin/sh
}

mount -t proc none /proc
CMDLINE=`cat /proc/cmdline`
mount -t sysfs none /sys

#wait a little to avoid trailing kernel output
sleep 3

#If you don't have a qwerty keyboard, uncomment the next line 
#loadkmap < /etc/kmap-fr

# raid - we dont really need to assemble boot but we are no longer using autodetect
/bin/mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 || rescue_shell

# must assemble md1 as root is on lvm there and its ver 1.2 metadata
/bin/mdadm --assemble /dev/md1 /dev/sda2 /dev/sdb2 /dev/sdc2 || rescue_shell

# may as well assemble the virtual machine space too
/bin/mdadm --assemble /dev/md2 /dev/sda3 /dev/sdb3 /dev/sdc3 || rescue_shell

#If you have a msg, show it: 
#cat /etc/msg

#lvm
#/bin/vgscan
# start the host lvm
/bin/vgchange -ay host || rescue_shell

# start the VM lvm
/bin/vgchange -ay vm || rescue_shell

#root filesystem
mount -r /dev/mapper/host-root /newroot || rescue_shell

#unmount pseudo FS
umount /sys
umount /proc

#root switch
exec /bin/busybox switch_root /newroot /sbin/init ${CMDLINE}

If your raid sets have more than three members add to the mdadm commads above

As the init script will be run, it must be executable

chmod +x /root/initrd/init

Thats all the prep work done, now to assembe everything to a file in /boo t

find . | cpio --quiet -o -H newc | gzip -9 > /boot/initramfs

The initrd is called initramfs in /boot. The identical name must be used in grub.conf

Rebooting and Rebuilding

Reboot into your new hardened host install.

Update portage

emerge --sync

Rebuild the toolchain

The next step must be completed with no breaks

cd /usr/portage
scripts/bootstrap.sh

Select the hardened compiler.

gcc-config -l

gcc-config 2

Update and rebuild system

Upate and rebuild world

Add the packages needed to manage VMs

Clear out any rubbish emerge --depclean -p eclean -d distfiles eclean -d packages

Now we have a nice clean lean host install which can be leaveraged for a template VM install

Installing a VM

Copy the Host Install