Distributed Replicated Block Device (DRBD) mirrors block devices between multiple hosts. You can think of this loosely as network Raid 1.
DRBD is meant to run in a Active / Passive setup, meaning, you can only mount the disk on one node at a time. This is not a DRBD limitation, but rather a limitation of the common file systems (ext3, ext4, xfs, etc), since they cannot account for 2 or more servers accessing a single disk.
As with any form of data replication, always ensure you have good backups before you begin, and ensure that you have good backups throughout the life cycle of the setup. There is always a chance of data corruption or complete data loss due to some unforeseen situation, so make sure you have backups, and you have tested restoring from those backups!
Requirements
There are a few requirements that need to be met for DRBD to function properly and securely:
1. 2x servers with similar block devices
2. DRBD kernel module and userspace utilities
3. Private network between the servers
4. iptables port 7788 open between servers on the Private network
5. /etc/hosts configured
6. NTP synchronized
Preparation
For the purposes of this article, my two servers running CentOS 6 will be:
drbd01 192.168.5.2 | Cloud Block Storage 50G SSD drbd01 192.168.5.3 | Cloud Block Storage 50G SSD
First, ensure that /etc/hosts are setup properly on both servers:
cat /etc/hosts 192.168.5.2 drbd01 192.168.5.3 drbd02
Next, open up iptables on both servers to allow communications across the private network:
cat /etc/sysconfig/iptables -A INPUT -i eth2 -s 192.168.5.0/24 -p tcp --dport 7788 -m comment --comment "Allow DRBD on private interface" -j ACCEPT ... service iptables restart
Finally, prep your block devices, but do not format them with a filesystem! For this guide, I am going to assume you are using separate disks for this, which are setup on /dev/xvdb:
fdisk /dev/xvdb N P 1 enter enter t (choose 83) w (write) fdisk -l /dev/xvdb1 (confirm all looks well)
Install DRBD
CentOS requires the use of the RPM packages found in the repo, http://www.elrepo.org. This will provide the DKMS-based kernel module and userspace tools.
On both nodes:
rpm -Uvh http://www.elrepo.org/elrepo-release-6-6.el6.elrepo.noarch.rpm yum repolist yum install drbd83-utils kmod-drbd83 dkms ntp ntpdate service ntpd restart && chkconfig ntpd on reboot
Configure DRBD
First, configure the global_common.conf
vi /etc/drbd.d/global_common.conf # Change usage-count no; # To usage-count yes;
Then search for syncer {, and add rate 10M;. An example is posted below:
syncer { # rate after al-extents use-rle cpu-mask verify-alg csums-alg rate 10M; }
Some important notes:
1. usage-count. The DRBD project keeps statistics about the usage of various DRBD versions. This is done by contacting an HTTP server every time a new DRBD version is installed on a system. This can be disabled by setting usage-count no;. The default is usage-count ask; which will prompt you everytime you upgrade DRBD.
2. rate 10M: This throttles the total bandwidth that DRBD will use to perform its tasks between the 2 nodes. A good rule of thumb for this value is to use about 30% of the available replication bandwidth. Thus, if you had an I/O subsystem capable of sustaining write throughput of 180MB/s, and a Gigabit Ethernet network capable of sustaining 110 MB/s network throughput (the network being the bottleneck), you would calculate: 110 x 0.3 = 33MB/s. I opted to go with 10M for this article. 10M is a bit on the low side, so read the following guide and increased your limits as needed depending on your available bandwidth: https://drbd.linbit.com/users-guide/s-configure-sync-rate.html
Resource Settings
Configure the 2 nodes so they can communicate with each other. On both servers, setup:
vi /etc/drbd.d/cent00.res resource cent00 { protocol C; startup { wfc-timeout 0; degr-wfc-timeout 120; } disk { on-io-error detach; } net { cram-hmac-alg "sha1"; shared-secret "4ftl421dg987d33gR"; } on drbd01 { device /dev/drbd0; disk /dev/xvdb1; meta-disk internal; address 192.168.5.2:7788; } on drbd02 { device /dev/drbd0; disk /dev/xvdb1; meta-disk internal; address 192.168.5.3:7788; } }
Now initialize the resources DRBD will be using, and set drbd01 to be primary. This is done by:
[root@drbd01 ~]# drbdadm create-md cent00 [root@drbd02 ~]# drbdadm create-md cent00 [root@drbd01 ~]# service drbd start; chkconfig drbd on [root@drbd02 ~]# service drbd start; chkconfig drbd on [root@drbd01 ~]# drbdadm -- --overwrite-data-of-peer primary cent00
Once this is done, the disks will begin to sync up. This could take several hours. You can check the status by:
[root@drbd01 ~]# cat /proc/drbd version: 8.3.16 (api:88/proto:86-97) GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2013-09-27 16:00:43 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----- ns:1124352 nr:0 dw:0 dr:1125016 al:0 bm:68 lo:0 pe:1 ua:0 ap:0 ep:1 wo:f oos:19842524 [>...................] sync'ed: 5.4% (19376/20472)M finish: 0:31:21 speed: 10,536 (10,312) K/sec
Setup Filesystem
It is recommended to wait until the initial synchronization is complete. It simply depends on the size of the block storage, and the speed of the internal network connecting the 2 servers. You can check the status by running
[root@drbd01 ~]# cat /proc/drbd
Then before continuing, make sure you are on the primary node first:
** Due to WordPress, I had to put a space in the opening tags to avoid it being processed as markup.
[root@drbd01 ~]# drbdadm -- status cent00 < drbd-status version="8.3.16" api="88"> < resources config_file="/etc/drbd.conf"> < resource minor="0" name="cent00" cs="Connected" ro1="Primary" ro2="Secondary" ds1="UpToDate" ds2="UpToDate" /> < /resources> < /drbd-status>
We’ll use the standard ext4 file system for this:
[root@drbd01 ~]# mkfs.ext4 /dev/drbd0 [root@drbd01 ~]# mkdir /data [root@drbd01 ~]# mount -t ext4 /dev/drbd0 /data
Testing Scenarios
Below are some basic test scenarios you can simulate pretty easily. This goes without saying, but do not experiment with these scenarios on your production environment! Know what they do before you run them in production since they can cause problems if your not ready for it!
These are broken down into the following tests:
Test 1: Promote drbd02 to become primary
Test 2: Testing secondary node failure
Test 3: Testing primary node failure
Test 4: Recovering from split-brain
Test 1: Promote drbd02 to become primary
Unmount the partition and demote the current primary (drbd01) to secondary:
[root@drbd01 ~]# umount /data [root@drbd01 ~]# drbdadm secondary cent00
On other server, drbd02, promote it to primary and mount the drbd device: [root@drbd02 ~]# drbdadm primary cent00 [root@drbd02 ~]# mkdir /data [root@drbd02 ~]# mount -t ext4 /dev/drbd0 /data [root@drbd02 ~]# ls -d /data/*
At this time, drbd02 will now be the primary, and drdb01 will now be the secondary node.
Test 2: Testing secondary node failure
To see what happens when the secondary server goes offline:
Shutdown your secondary node, which in this case, is drbd02:
[root@drbd02 ~]# shutdown -h now
Now, back on the primary node drbd01, add a few files to the volume:
[root@drbd01 ~]# mkdir -p /data/test/ [root@drbd01 ~]# cp /etc/hosts /data/test/
Power back on the secondary node drbd02, and watch the system sync back up. Note, depending on how much data was written, it may take a bit of time for the volumes to become consistent again. You can check the status with:
[root@drbd01 ~]# cat /proc/drbd
Test 3: Testing primary node failure
This tests what happens when primary node goes offline, and someone promotes the secondary node before the primary comes online and can be demoted (split-brain).
If you want to simulate this worst case scenario, and you don’t care about your data, then perform the following:
[root@drbd01 ~]# echo 1 > /proc/sys/kernel/sysrq ; echo b > /proc/sysrq-trigger [root@drbd01 ~]# reboot -f -n
Or just shutdown drbd01 (primary), and then log into drbd02 (secondary), and promote it to master:
[root@drbd02 ~]# drbdadm primary cent00 [root@drbd02 ~]# mkdir /data [root@drbd02 ~]# mount -t ext4 /dev/drbd0 /data
Then boot drbd01 again and enjoy the split-brain scenario! For obvious reasons, do NOT do this on drives containing any data you need for anything! If the primary node loses the replication link, and you made the other node primary BEFORE connectivity is restored, you WILL have split-brain. Avoid that at all costs.
Test 4: Recovering from split-brain
In the event of split-brain, you may be able to correct it by performing the following, but do not do this blindly! Make sure you understand what this is doing before you run it on your production data, otherwise you may lose data you wanted! More information can be found at http://drbd.linbit.com/docs/working/
For reference:
– drbd01 : Primary node
– drbd02 : Secondary node
On secondary node
[root@drbd02 ~]# drbdadm secondary cent00 [root@drbd02 ~]# drbdadm -- --discard-my-data connect cent00
And back on the primary node
[root@drbd01 ~]# drbdadm connect cent00 [root@drbd01 ~]# cat /proc/drbd