HowTo Single Disk to RAID

Da EigenWiki.

We found ourselves in the need of switching from a single disk root to a root backed by a RAID, here's how we did.

Context

We sat up a new/old server, BitArno, it has a nice /home of ~6TB backed by a RAID and a separate disk for the / partition. As the drive for the / was tested before getting to production resulting in no errors we thought we didn't need a RAID for it. We installed Debian stable 8.1, set up every service, went to bed just to wake up with a hung server refusing to boot.

An on site check was required, we got there and found out that the / drive was failing to reallocate sectors, we fsck'ed it and it booted. "Great!" that same evening it was failing again and so we decided to switch from a single disk root to a RAID1 backed root especially because we do use commodity hardware taken from old computers.

The general idea

We wanted a way to easily switch from a drive to the RAID without having to back everything up and copy it over again with fewer downtime possible and so we decided to:

  • Poweroff the server and attach a new disk of equal size
  • Power on the server
  • Create a RAID1 array composed by 2 drives but with a drive marked as missing (the new drive is the present one, the old one will be attached to the cluster later)
  • Set LVM up on top of it (as the old drive had LVM (it's a 120GB drive just for the `/` parition, we wanted to have some space available if we'll ever find in the need)
  • Boot a live distro and copy /
    • Install the bootloader on the new drive adding the necessary modules for RAID and LVM
  • Reboot the server from the RAID-member disk
  • Check if everything is OK
  • Add the old disk to the RAID and let mdadm resync the contents.

Step by step guide

Here I'll try to cover step by step the command we used, it has been a while since we did it so I'm not 100% sure I'll remember everything

First a layout of the partitions when it was single-disk

sda                            8:0    0 114.5G  0 disk  
 ├─sda1                        8:1    0   243M  0 part /boot 
 ├─sda2                        8:2    0     1K  0 part  
 └─sda5                        8:5    0 114.3G  0 part  
   ├─bitarno--system-root    253:0    0    30G  0 lvm   /
   └─bitarno--system-swap    253:3    0     4G  0 lvm   [SWAP]

And after the switch (newly created LVM has different volume group name but same size and structure)

 sda                        8:0    0 114.5G  0 disk  
 ├─sda1                     8:1    0   243M  0 part  
 │ └─md1                    9:1    0 242.8M  0 raid1 /boot
 ├─sda2                     8:2    0     1K  0 part  
 └─sda5                     8:5    0 114.3G  0 part  
   └─md2                    9:2    0 114.2G  0 raid1 
     ├─ba--system-root    253:0    0    30G  0 lvm   /
     └─ba--system-swap    253:3    0     4G  0 lvm   [SWAP]
 sde                        8:64   0 114.5G  0 disk  
 ├─sde1                     8:65   0   243M  0 part  
 │ └─md1                    9:1    0 242.8M  0 raid1 /boot
 ├─sde2                     8:66   0     1K  0 part  
 └─sde5                     8:69   0 114.3G  0 part  
   └─md2                    9:2    0 114.2G  0 raid1 
     ├─ba--system-root    253:0    0    30G  0 lvm   /
     └─ba--system-swap    253:3    0     4G  0 lvm   [SWAP]

The sdX2 partition is an extended one which contains the RAID one

In this way we have two arrays, one for the boot and one for the LVM which contains the root and the swap

First of all we need to wipe out every possible header from the new disk as it will become a RAID member

dd if=/dev/zero of=/dev/sde bs=100M count=1

Then we created the partitions with sfdisk about the same size as the first disk. It's not needed for them to be the same as it wasn't possible to copy the partitions with dd

We need:

  • one parition for the boot marked as Linux raid autodetect
  • the other for LVM marked as Linux raid autodetect that takes the free space left

We then proceeded to create the 2 RAIDS, /dev/sde is the new disk

mdadm --create /dev/md1 --level=1 --raid-devices=2 /dev/sde1 missing
mdadm --create /dev/md2 --level=1 --raid-devices=2 /dev/sde5 missing

We made the filesystem on the newly created /dev/md1

mkfs.ext2 /dev/md1

And copied the files

mount /dev/md1 /mnt
rsync -avPh /boot /mnt
sync && umount /mnt

We were done with /boot, and we needed to create the LVM volumes:

First we marked /dev/md2 as LVM member with fdisk then we told LVM /dev/md2 was a device it could use

pvcreate /dev/md2

We then created the virtual group that was going to host the logical volumes

vgcreate ba-system /dev/md2

And created the logical volumes over it

lvcreate -L30G -n root ba-system
lvcreate -L4G -n swap ba-system

We created the filesystem for root

mkfs.ext4 /dev/mapper/ba--system-root

And for swap

mkswap /dev/mapper/ba--system-swap

Then it was time to poweroff the server and boot from a live, we choose ArchLinux as it's lightweight and provides all the packages we need without forcefully bringing up an installation procedure

We mounted the root partitions

mkdir /mnt/old /mnt/new
mount /dev/mapper/ba--system-root /mnt/new
mount /dev/mapper/bitarno--system-root /mnt/old

And copied the data over:

rsync -avPh /mnt/old/ /mnt/new

We then proceeded to mount the boot partition

mount /dev/md1 /mnt/new/boot

And arch-chroot'ed into it. Arch-chroot is a wonderful piece of software that saves you from bind mounting /dev /proc and /sys in the chroot

arch-chroot /mnt/new

It was time to install grub on the new disk and regenerate grub configuration and the initram image to contain LVM and RAID modules, this was the hardest part as we wrongly supposed all the modules were already present.

We added "domdadm" to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub, the line should look something like

cat /etc/default/grub| grep -i cmdli
GRUB_CMDLINE_LINUX_DEFAULT="quiet domdadm"

And we added the modules needed to GRUB_PRELOAD_MODULES in the same file, if the line don't exist create it

GRUB_PRELOAD_MODULES="lvm diskfilter mdraid1x"

Then it was time to install grub on both HDD in the MBR, note that it wasn't needed to specify the boot partition as we were in a chroot

grub-install /dev/sda
grub-install /dev/sde

And just to be sure

update-grub

We also had to update the initramfs to actually contain the raid module ans so we edited

/etc/initramfs-tools/modules 

appending

raid1

to the list and executed

update-initramfs

to re-generate the initramfs

We exited from the chroot, unmounted everything, rebooted and kept our finger crossed till the password prompt of the root-on-RAID booted system.

This last step I simplified it, we made several errors and had to boot into live several times, as we kept forgetting to add the needed modules and command lines arguments

To be sure to boot from the raid root you can edit the grub entry before booting, make sure it's like the following

insmod mdraid1x

Have to be present inside the menuentry section of your os and the linux entry looks similar to

linux   /vmlinuz-3.16.0-4-amd64 root=/dev/mapper/ba--system-root ro  quiet domdadm

As we failed to achieve everything on the first try we edited the grub command line and the performed grub install from the booted system.

Once we managed to boot we had to finalize the process and add the old drive to the RAID array

We backed up the partition table of the new disk

sfdisk -d /dev/sde > /tmp/savetablesde

Unmounted the automatically-mounted /dev/sda[1,2,5] deactivating the swap and LVM

swapoff /dev/mapper/bitarno-system-swap
vgchange -an bitarno--system

Wiped out all the old drive first 100M

dd if=/dev/zero of=/dev/sda bs=100M count=1
sync

And in the end we copied the partition table on the old disk

sfdisk /dev/sda < /tmp/savetablesde

We added the partitions to the array

mdadm /dev/md1 --add /dev/sda1
mdadm /dev/md2 --add /dev/sda5

DONE!

We then checked the status of the resync of mdadm with

cat /proc/mdstat

Issues

We (of course) had some issues in the process:

  • Thought the drive was of the same model the capacity was a little different and so we couldnt copy the partition scheme as we tougth we could do
  • It's very important to generate the grub image inserting the modules and the boot parameter needed for lvm and RAID
  • The initramfs must be rebuilt with the right raid module present in order for the kernel to find the cluster
  • It's impossible to backup and restore the lvm too as for a while we found ourselves with the 2 LVMs disk together and names were conflicting.