HowTo Single Disk to RAID: differenze tra le versioni

Da EigenWiki.
Vai alla navigazione Vai alla ricerca
Nessun oggetto della modifica
Riga 3: Riga 3:


== Context ==
== Context ==
We sat up a new/old server, [[BitArno]], it has a nice /home of ~6TB backed by a RAID and a separate disk for the / partition. As the drive for the / was tested before getting to production resulting in no errors we tought we didn't need a RAID for it. We installed debian stable 8.1, sat up every service, went to bed just to wake up with a hung server refusing to boot.
We sat up a new/old server, [[BitArno]], it has a nice /home of ~6TB backed by a RAID and a separate disk for the / partition. As the drive for the / was tested before getting to production resulting in no errors we thought we didn't need a RAID for it. We installed Debian stable 8.1, set up every service, went to bed just to wake up with a hung server refusing to boot.


An on site check was required, we got there and found out that the / drive was failing to reallocate sectors, we fsck'ed it and it booted. "Great!" that same evening it was failing again and so we decided to switch from a single disk root to a RAID1 backed root expecially because we do use commodity hardware taken from old computers.
An on site check was required, we got there and found out that the / drive was failing to reallocate sectors, we fsck'ed it and it booted. "Great!" that same evening it was failing again and so we decided to switch from a single disk root to a RAID1 backed root especially because we do use commodity hardware taken from old computers.


== The general idea ==
== The general idea ==


We wanted a way to easily switch from a drive to the RAID without having to back everything up and copy it over again with fewer downtime possible and so we decided to:
We wanted a way to easily switch from a drive to the RAID without having to back everything up and copy it over again with fewer downtime possible and so we decided to:
* Poweroff the server and attach the new disk of an equal size
* Poweroff the server and attach a new disk of equal size
* Power on  the server
* Power on  the server
* Create a RAID1 array composed by 2 drives but with a drive marked as missing
* Create a RAID1 array composed by 2 drives but with a drive marked as missing (the new drive is the present one, the old one will be attached to the cluster later)
* Set LVM up on top of it (as the old drive had LVM (it's a 120GB drive just for the `/` parition, we wanted to have some space available if we'll ever find in the need)
* Set LVM up on top of it (as the old drive had LVM (it's a 120GB drive just for the `/` parition, we wanted to have some space available if we'll ever find in the need)
* Boot a live distro and copy /
* Boot a live distro and copy /  
** Install the bootloader on the new drive adding the necessary modules for RAID and LVM
** Install the bootloader on the new drive adding the necessary modules for RAID and LVM
* Reboot the server from the RAID-memeber disk
* Reboot the server from the RAID-member disk
* Check if everything is OK
* Check if everything is OK
* Add the old disk to the RAID and let mdadm resync the contents.
* Add the old disk to the RAID and let mdadm resync the contents.
Riga 23: Riga 23:
Here I'll try to cover step by step the command we used, it has been a while since we did it so I'm not 100% sure I'll remember everything
Here I'll try to cover step by step the command we used, it has been a while since we did it so I'm not 100% sure I'll remember everything


First a layout of the partition befor the RAID
First a layout of the partitions when it was single-disk
 
  sda                            8:0    0 114.5G  0 disk   
  sda                            8:0    0 114.5G  0 disk   
   ├─sda1                        8:1    0  243M  0 part /boot  
   ├─sda1                        8:1    0  243M  0 part /boot  
Riga 31: Riga 32:
     └─bitarno--system-swap    253:3    0    4G  0 lvm  [SWAP]
     └─bitarno--system-swap    253:3    0    4G  0 lvm  [SWAP]


And after the switch
And after the switch (newly created LVM has different volume group name but same size and structure)


   sda                        8:0    0 114.5G  0 disk   
   sda                        8:0    0 114.5G  0 disk   
Riga 83: Riga 84:
  lvcreate -L30G -n root ba-system
  lvcreate -L30G -n root ba-system
  lvcreate -L4G -n swap ba-system
  lvcreate -L4G -n swap ba-system
We createed the filesystem for root  
We created the filesystem for root  
  mkfs.ext4 /dev/mapper/ba--system-root
  mkfs.ext4 /dev/mapper/ba--system-root
And for swap
And for swap
  mkswap /dev/mapper/ba--system-swap
  mkswap /dev/mapper/ba--system-swap


Then it was time to poweroff the server and boot from a live, we choosed ArchLinux as it's lightweight and provides all the packages we need without forcefully bringing up an installation procedure
Then it was time to poweroff the server and boot from a live, we choose ArchLinux as it's lightweight and provides all the packages we need without forcefully bringing up an installation procedure


We mounted the root partitions
We mounted the root partitions
Riga 95: Riga 96:
  mount /dev/mapper/bitarno--system-root /mnt/old
  mount /dev/mapper/bitarno--system-root /mnt/old


And copied the datas over:
And copied the data over:
  rsync -avPh /mnt/old/ /mnt/new
  rsync -avPh /mnt/old/ /mnt/new


Riga 101: Riga 102:
  mount /dev/md1 /mnt/new/boot
  mount /dev/md1 /mnt/new/boot


And arch-chroot'ed into it. Arch-chroot is a wonderfull piece of software that saves you from bind mounting /dev /proc and /sys in the chroot
And arch-chroot'ed into it. Arch-chroot is a wonderful piece of software that saves you from bind mounting /dev /proc and /sys in the chroot


  arch-chroot /mnt/new
  arch-chroot /mnt/new


It was time to install grub and regenerate the image to contain LVM and RAID modules, this was the hardest part as we wrongly supposed all the modules were already present.
It was time to install grub on the new disk and regenerate grub configuration and the initram image to contain LVM and RAID modules, this was the hardest part as we wrongly supposed all the modules were already present.


We added "domdadm" to  GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub, the line should look something like
'''We added "domdadm"''' to  GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub, the line should look something like
  cat /etc/default/grub| grep -i cmdli
  cat /etc/default/grub| grep -i cmdli
  GRUB_CMDLINE_LINUX_DEFAULT="quiet domdadm"
  GRUB_CMDLINE_LINUX_DEFAULT="quiet domdadm"


And we added the modules needed to GRUB_PRELOAD_MODULES in the same file, if the line don't exist create it
And we '''added the modules needed to GRUB_PRELOAD_MODULES''' in the same file, if the line don't exist create it
  GRUB_PRELOAD_MODULES="lvm diskfilter mdraid1x"
  GRUB_PRELOAD_MODULES="lvm diskfilter mdraid1x"


Riga 121: Riga 122:
And just to be sure
And just to be sure
  update-grub
  update-grub
* TODO: tell about adding 'raid1' to the kernel modules available in initramfs and how to rebuild it


We exited from the chroot, unmounted everything, rebooted and kept our finger crossed till the password prompt of the root-on-RAID booted system.
We exited from the chroot, unmounted everything, rebooted and kept our finger crossed till the password prompt of the root-on-RAID booted system.
Riga 131: Riga 134:
  linux  /vmlinuz-3.16.0-4-amd64 root=/dev/mapper/ba--system-root ro  quiet domdadm
  linux  /vmlinuz-3.16.0-4-amd64 root=/dev/mapper/ba--system-root ro  quiet domdadm


As we failed to achieve everithing on the first try we edited the grub command line and the performed grub install from the booted system.
As we failed to achieve everything on the first try we edited the grub command line and the performed grub install from the booted system.


Once we managed to boot we had to finalize the process and add the old drive to the RAID array
Once we managed to boot we had to finalize the process and add the old drive to the RAID array
Riga 143: Riga 146:
  vgchange -an bitarno--system
  vgchange -an bitarno--system
   
   
Whiped out all the old drive first 100M  
Wiped out all the old drive first 100M  
  dd if=/dev/zero of=/dev/sda bs=100M count=1
  dd if=/dev/zero of=/dev/sda bs=100M count=1
  sync
  sync
Riga 162: Riga 165:
We (of course) had some issues in the process:
We (of course) had some issues in the process:


* Thougth the drive was of the same model the capacity was a little different and so we couldnt copy the partition scheme as we tougth we could do
* Thought the drive was of the same model the capacity was a little different and so we couldnt copy the partition scheme as we tougth we could do
* It's very important to generate the grub image inserting the modules and the boot parameter needed for lvm and RAID
* It's very important to generate the grub image inserting the modules and the boot parameter needed for lvm and RAID
* The initramfs must be rebuilt with the right raid module present in order for the kernel to find the cluster
* It's impossible to backup and restore the lvm too as for a while we found ourselves  with the 2 LVMs disk together and names were conflicting.
* It's impossible to backup and restore the lvm too as for a while we found ourselves  with the 2 LVMs disk together and names were conflicting.



Versione delle 18:49, 20 nov 2016

We found ourselves in the need of switching from a single disk root to a root backed by a RAID, here's how we did.

Context

We sat up a new/old server, BitArno, it has a nice /home of ~6TB backed by a RAID and a separate disk for the / partition. As the drive for the / was tested before getting to production resulting in no errors we thought we didn't need a RAID for it. We installed Debian stable 8.1, set up every service, went to bed just to wake up with a hung server refusing to boot.

An on site check was required, we got there and found out that the / drive was failing to reallocate sectors, we fsck'ed it and it booted. "Great!" that same evening it was failing again and so we decided to switch from a single disk root to a RAID1 backed root especially because we do use commodity hardware taken from old computers.

The general idea

We wanted a way to easily switch from a drive to the RAID without having to back everything up and copy it over again with fewer downtime possible and so we decided to:

  • Poweroff the server and attach a new disk of equal size
  • Power on the server
  • Create a RAID1 array composed by 2 drives but with a drive marked as missing (the new drive is the present one, the old one will be attached to the cluster later)
  • Set LVM up on top of it (as the old drive had LVM (it's a 120GB drive just for the `/` parition, we wanted to have some space available if we'll ever find in the need)
  • Boot a live distro and copy /
    • Install the bootloader on the new drive adding the necessary modules for RAID and LVM
  • Reboot the server from the RAID-member disk
  • Check if everything is OK
  • Add the old disk to the RAID and let mdadm resync the contents.

Step by step guide

Here I'll try to cover step by step the command we used, it has been a while since we did it so I'm not 100% sure I'll remember everything

First a layout of the partitions when it was single-disk

sda                            8:0    0 114.5G  0 disk  
 ├─sda1                        8:1    0   243M  0 part /boot 
 ├─sda2                        8:2    0     1K  0 part  
 └─sda5                        8:5    0 114.3G  0 part  
   ├─bitarno--system-root    253:0    0    30G  0 lvm   /
   └─bitarno--system-swap    253:3    0     4G  0 lvm   [SWAP]

And after the switch (newly created LVM has different volume group name but same size and structure)

 sda                        8:0    0 114.5G  0 disk  
 ├─sda1                     8:1    0   243M  0 part  
 │ └─md1                    9:1    0 242.8M  0 raid1 /boot
 ├─sda2                     8:2    0     1K  0 part  
 └─sda5                     8:5    0 114.3G  0 part  
   └─md2                    9:2    0 114.2G  0 raid1 
     ├─ba--system-root    253:0    0    30G  0 lvm   /
     └─ba--system-swap    253:3    0     4G  0 lvm   [SWAP]
 sde                        8:64   0 114.5G  0 disk  
 ├─sde1                     8:65   0   243M  0 part  
 │ └─md1                    9:1    0 242.8M  0 raid1 /boot
 ├─sde2                     8:66   0     1K  0 part  
 └─sde5                     8:69   0 114.3G  0 part  
   └─md2                    9:2    0 114.2G  0 raid1 
     ├─ba--system-root    253:0    0    30G  0 lvm   /
     └─ba--system-swap    253:3    0     4G  0 lvm   [SWAP]

The sdX2 partition is an extended one which contains the RAID one

In this way we have two arrays, one for the boot and one for the LVM which contains the root and the swap

First of all we need to wipe out every possible header from the new disk as it will become a RAID member

dd if=/dev/zero of=/dev/sde bs=100M count=1

Then we created the partitions with sfdisk about the same size as the first disk. It's not needed for them to be the same as it wasn't possible to copy the partitions with dd

We need:

  • one parition for the boot marked as Linux raid autodetect
  • the other for LVM marked as Linux raid autodetect that takes the free space left

We then proceeded to create the 2 RAIDS, /dev/sde is the new disk

mdadm --create /dev/md1 --level=1 --raid-devices=2 /dev/sde1 missing
mdadm --create /dev/md2 --level=1 --raid-devices=2 /dev/sde5 missing

We made the filesystem on the newly created /dev/md1

mkfs.ext2 /dev/md1

And copied the files

mount /dev/md1 /mnt
rsync -avPh /boot /mnt
sync && umount /mnt

We were done with /boot, and we needed to create the LVM volumes:

First we marked /dev/md2 as LVM member with fdisk then we told LVM /dev/md2 was a device it could use

pvcreate /dev/md2

We then created the virtual group that was going to host the logical volumes

vgcreate ba-system /dev/md2

And created the logical volumes over it

lvcreate -L30G -n root ba-system
lvcreate -L4G -n swap ba-system

We created the filesystem for root

mkfs.ext4 /dev/mapper/ba--system-root

And for swap

mkswap /dev/mapper/ba--system-swap

Then it was time to poweroff the server and boot from a live, we choose ArchLinux as it's lightweight and provides all the packages we need without forcefully bringing up an installation procedure

We mounted the root partitions

mkdir /mnt/old /mnt/new
mount /dev/mapper/ba--system-root /mnt/new
mount /dev/mapper/bitarno--system-root /mnt/old

And copied the data over:

rsync -avPh /mnt/old/ /mnt/new

We then proceeded to mount the boot partition

mount /dev/md1 /mnt/new/boot

And arch-chroot'ed into it. Arch-chroot is a wonderful piece of software that saves you from bind mounting /dev /proc and /sys in the chroot

arch-chroot /mnt/new

It was time to install grub on the new disk and regenerate grub configuration and the initram image to contain LVM and RAID modules, this was the hardest part as we wrongly supposed all the modules were already present.

We added "domdadm" to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub, the line should look something like

cat /etc/default/grub| grep -i cmdli
GRUB_CMDLINE_LINUX_DEFAULT="quiet domdadm"

And we added the modules needed to GRUB_PRELOAD_MODULES in the same file, if the line don't exist create it

GRUB_PRELOAD_MODULES="lvm diskfilter mdraid1x"

Then it was time to install grub on both HDD in the MBR, note that it wasn't needed to specify the boot partition as we were in a chroot

grub-install /dev/sda
grub-install /dev/sde

And just to be sure

update-grub
  • TODO: tell about adding 'raid1' to the kernel modules available in initramfs and how to rebuild it

We exited from the chroot, unmounted everything, rebooted and kept our finger crossed till the password prompt of the root-on-RAID booted system.

This last step I simplified it, we made several errors and had to boot into live several times, as we kept forgetting to add the needed modules and command lines arguments

To be sure to boot from the raid root you can edit the grub entry before booting, make sure it's like the following

insmod mdraid1x

Have to be present inside the menuentry section of your os and the linux entry looks similar to

linux   /vmlinuz-3.16.0-4-amd64 root=/dev/mapper/ba--system-root ro  quiet domdadm

As we failed to achieve everything on the first try we edited the grub command line and the performed grub install from the booted system.

Once we managed to boot we had to finalize the process and add the old drive to the RAID array

We backed up the partition table of the new disk

sfdisk -d /dev/sde > /tmp/savetablesde

Unmounted the automatically-mounted /dev/sda[1,2,5] deactivating the swap and LVM

swapoff /dev/mapper/bitarno-system-swap
vgchange -an bitarno--system

Wiped out all the old drive first 100M

dd if=/dev/zero of=/dev/sda bs=100M count=1
sync

And in the end we copied the partition table on the old disk

sfdisk /dev/sda < /tmp/savetablesde

We added the partitions to the array

mdadm /dev/md1 --add /dev/sda1
mdadm /dev/md2 --add /dev/sda5

DONE!

We then checked the status of the resync of mdadm with

cat /proc/mdstat

Issues

We (of course) had some issues in the process:

  • Thought the drive was of the same model the capacity was a little different and so we couldnt copy the partition scheme as we tougth we could do
  • It's very important to generate the grub image inserting the modules and the boot parameter needed for lvm and RAID
  • The initramfs must be rebuilt with the right raid module present in order for the kernel to find the cluster
  • It's impossible to backup and restore the lvm too as for a while we found ourselves with the 2 LVMs disk together and names were conflicting.