Any day you learn something new is a good day


Creating redundant, clustered & scalable storage - a DIY guide

, posted: 11-May-2018 11:20

I thought it’s about time to dust off this blog and post about something that’s been in the back of my mind for a few years now, clustered redundant scalable storage.

 

A bit of history:

Here at Burnside High School where I am a member of the IT Dept we’ve accumulated over time a bunch of older-but-still-working desktop PC’s as they got replaced in the labs with newer machines.  We’ve also quietly built up a large number of 500+ GB SATA hard drives.



The goal:

I’d really like to utilise those spare machines and hard drives to extend our historical Veeam backups past the three to six months that they currently are. As the older backups would be a tertiary onsite backup, I don’t want to spend up on a larger NAS to store them on.

A couple of years ago I tripped over this chap’s blog post (https://www.virtualtothecore.com/en/adventures-ceph-storage-part-1-introduction/) on creating a CEPH clustered filesystem and using it to store Veeam backups on.  Needless to say I followed the instructions and had a test cluster up & running fairly quickly.

While everything worked as expected, building linux machines by hand and using ceph-deploy to manage them one-by-one is time consuming and doesn’t scale all that well.  If other members of the IT Dept are to manage this, it needs to be much easier to deploy.

Since then I’ve had a MASS/Openstack cluster up and running, and dabbled with Mirantis’ FUEL to deploy an Openstack cluster - both solutions worked but are a bit unwieldy (and in FUEL’s hands slightly fragile). Accessing the underlying CEPH storage was also not that easy as it’s designed for Openstack’s use, not mine.



Finding a solution:

Fast forward to a few months ago when I discovered Croit (http://croit.io) as a CEPH deployment and management tool. This pretty cool product ticks all the boxes for us - simple deployment of nodes (PXEboot) and a nice easy to understand GUI (for IT staff to learn). Building a trial cluster was very straightforward, so we moved on to a proof-of-concept build, documented below. I’ve been able to back Veeam up at ~120MB/s into the CEPH storage, and to increase the amount of available storage we just add another machine full of hard disks. The available storage space expands on the fly.



Building your own:

Here’s the instructions I’ve written for our deployment based on Ubuntu 16.04 server for anyone to follow to build their own;

 

Croit installation from scratch @ BHS

 

MASTER LAN: enp5s1: 172.16.0.0/16, IP 172.16.0.183, GW 172.16.0.1, DNS1 172.16.0.1, NTP 172.16.0.1

MASTER STORAGE: enp0s25: 10.99.0.0/24, IP 10.99.0.1  (needs to be isolated as PXE and DHCP are served onto it from Croit)

HARDWARE: 1x desktop 8GB, 2x NIC (MASTER), 1x desktop, 4GB, 2x NIC (BRIDGE)

3x laptop, single NIC (MONITOR, MDS)

3x desktop, single NIC, 4x HDD (STORAGE/OSD)

MASTER & BRIDGE set to boot from internal hard drives

MONITORS & STORAGE/OSD set to PXE boot only

Gigabit switch for STORAGE cluster LAN/PXE boot for nodes

 

    • Install Ubuntu 16.04 server accepting appropriate defaults (just ssh server required);
      • bhsadmin for user account
      • Static IP as above
    • Log in as bhsadmin
    • Update box

 

  • sudo apt-get update
  • sudo apt-get dist-upgrade

 

    •  
    • Disable systemd’s timesync and install ntp & assorted utilities

 

  • sudo systemctl disable systemd-timesyncd
  • sudo systemctl stop systemd-timesyncd
  • sudo apt-get install htop screen ntp

 

    •  
    • Configure ntp.conf for local time server 172.16.0.1

 

  • sudo nano /etc/ntp.conf

 

        • Rem out pools & add our server in this bit;
        • #pool 0.ubuntu.pool.ntp.org iburst
        • #pool 1.ubuntu.pool.ntp.org iburst
        • #pool 2.ubuntu.pool.ntp.org iburst
        • #pool 3.ubuntu.pool.ntp.org iburst
        • server 172.16.0.1 iburst
        •  
        • # Use Ubuntu's ntp server as a fallback.
        • #pool ntp.ubuntu.com

 

  • sudo service ntp restart

 

      • Check that ntp is working correctly

 

  • ntpq –p

 

        •     remote           refid st t when poll reach   delay offset jitter
        • ==============================================================================
        • *firewalker4.bur 122.252.184.186  3 u 3 256 377 0.580 -0.581  13.987
    • Configure second nic for storage lan

 

  • sudo nano /etc/network/interfaces

 

        • auto enp0s25
        • iface enp0s25 inet static
        •        address 10.99.0.1
        •        netmask 255.255.255.0
        •        network 10.99.0.0
        •        broadcast 10.99.0.255

 

  • sudo ifup enp0s25

 

    •  
    • Check ntp listening on all interfaces

 

  • netstat -antu | grep 123

 

        • udp        0 0 10.99.0.1:123           0.0.0.0:*
        • udp        0 0 172.16.0.183:123        0.0.0.0:*
        • udp        0 0 127.0.0.1:123           0.0.0.0:*
        • udp        0 0 0.0.0.0:123             0.0.0.0:*
        • udp6       0 0 :::123                  :::*
    • Install docker

 

  • sudo apt-get install docker.io

 

 

    • Install Croit container (documentation from https://croit.io/production/)

 

  • sudo docker create --name croit-data croit/croit:latest

 

 

    • Start Croit container

 

  • sudo docker run --net=host --restart=always --volumes-from croit-data --name croit -d croit/croit:latest

 

 

  • Configure Croit
    • Browse to IP:8080 (172.16.0.183:8080)
    • Login admin/admin
    • Accept EULA
    • Change admin password
    • Set up cluster network
      • Pick 10.99.0.1 @ enp0s25 off list & click save
      • Click +Create and add network, DHCP start 10.99.0.10, end 10.99.0.200, type = generic network
      • Click Create to save
      • Click Save
    • Wait for image to download (Step 4) and then PXE boot first monitor node (single disk, laptop in this case)
      • Click on node, click edit and adjust name for easy identification
      • Click Disks, click on drive, click on wipe disk, click delete on popup
      • Once disk Role is ‘unassigned’, click on ‘Set to MON’, and ‘Set to MON’ on popup to confirm.
    • Click ‘>’ next icon to proceed to stop 5
    • With the first monitor (10.99.0.10) selected, click ‘Create Cluster’
    • Boot second monitor node, go to ‘SERVERS’ on menu, when booted
      • Click on node, click edit and adjust name for easy identification
      • Click Disks, click on drive, click on wipe disk, click delete on popup
      • Once disk Role is ‘unassigned’, click on ‘Set to MON’, and ‘Set to MON’ on popup to confirm.
    • Boot first storage node, , go to ‘SERVERS’ on menu, when booted (showing ‘running’ in server list)
      • Click on node, click edit and adjust name for easy identification
      • Click Disks, shift-select all drives, click wipe disk, click delete on popup
      • Pick a disk, click ‘Set to Journal’, number of partitions = number of remaining disks in machine, click ‘Set to Journal’ to confirm
      • For remaining disks, select disk, click ‘Set to OSD’, Store backend = Filestore, Journal Disk = the journal disk prepared earlier, click ‘Set to OSD’
    • Select Crushmap on menu
      • Drag first storage node under ‘Root’ to add it to the crushmap
      • Click Save & Execute on popup to confirm
    • Boot second storage node, , go to ‘SERVERS’ on menu, when booted (showing ‘running’ in server list)
      • Click on node, click edit and adjust name for easy identification
      • Click Disks, shift-select all drives, click wipe disk, click delete on popup
      • Pick a disk, click ‘Set to Journal’, number of partitions = number of remaining disks in machine, click ‘Set to Journal’ to confirm
      • For remaining disks, select disk, click ‘Set to OSD’, Store backend = Filestore, Journal Disk = the journal disk prepared earlier, click ‘Set to OSD’
    • Select Crushmap on menu
      • Drag second storage node under ‘Root’ to add it to the crushmap
      • Click Save & Execute on popup to confirm
    • Boot third monitor node, , go to ‘SERVERS’ on menu, when booted
      • Click on node, click edit and adjust name for easy identification
      • Click Disks, click on drive, click on wipe disk, click delete on popup
      • Once disk Role is ‘unassigned’, click on ‘Set to MON’, and ‘Set to MON’ on popup to confirm.
    • Pick a monitor from the ‘SERVERS’ menu
      • Click ‘Services’, click ‘+ MDS’
    • Pick another monitor from the ‘SERVERS’ menu
      • Click ‘Services’, click ‘+ MDS’
    • At this point a shared filesystem has been created. Click ‘Pools’ to see cephfs_data and cephfs_metadata present. Clicking ‘Status’ will show that the redundancy is degraded as the default is three copies of the data. Either add another storage node, or change the size/min size for these pools to 2/2 to make it a dual-copy.
    • Boot third storage node, , go to ‘SERVERS’ on menu, when booted (showing ‘running’ in server list)
      • Click on node, click edit and adjust name for easy identification
      • Click Disks, shift-select all drives, click wipe disk, click delete on popup
      • Pick a disk, click ‘Set to Journal’, number of partitions = number of remaining disks in machine, click ‘Set to Journal’ to confirm
      • For remaining disks, select disk, click ‘Set to OSD’, Store backend = Filestore, Journal Disk = the journal disk prepared earlier, click ‘Set to OSD’
    • Select Crushmap on menu
      • Drag third storage node under ‘Root’ to add it to the crushmap
      • Click Save & Execute on popup to confirm
    • Click ‘Status’ and it will shortly say that Health is ‘OK’ as Ceph balances the small number of PG’s around the three storage nodes.

 

  • Build the BRIDGE to get data in/out of the Ceph cluster without messing with the MASTER node

 

    • Install Ubuntu 16.04 server accepting appropriate defaults (just ssh server required);
      • bhsadmin for user account
    • Static IP enp4s0: 172.16.0.0/16, IP 172.16.0.184, GW 172.16.0.1, DNS1 172.16.0.1, NTP 172.16.0.1
    • Log in as bhsadmin
    • Update box

 

  • sudo apt-get update
  • sudo apt-get dist-upgrade

 

    •  
    • Disable systemd’s timesync and install ntp & assorted utilities

 

  • sudo systemctl disable systemd-timesyncd
  • sudo systemctl stop systemd-timesyncd
  • sudo apt-get install htop screen ntp curl

 

    •  
    • Configure ntp.conf for local time server 172.16.0.1

 

  • sudo nano /etc/ntp.conf

 

        • Rem out pools & add our server in this bit;
        • #pool 0.ubuntu.pool.ntp.org iburst
        • #pool 1.ubuntu.pool.ntp.org iburst
        • #pool 2.ubuntu.pool.ntp.org iburst
        • #pool 3.ubuntu.pool.ntp.org iburst
        • server 172.16.0.1 iburst
        •  
        • # Use Ubuntu's ntp server as a fallback.
        • #pool ntp.ubuntu.com

 

  • sudo service ntp restart

 

      • Check that ntp is working correctly

 

  • ntpq –p

 

        •     remote           refid st t when poll reach   delay offset jitter
        • ==============================================================================
        • *firewalker4.bur 122.252.184.186  3 u 3 256 377 0.580 -0.581  13.987
    • Configure second nic for storage lan

 

  • sudo nano /etc/network/interfaces

 

        • auto enp0s25
        • iface enp0s25 inet static
        •        address 10.99.0.2
        •        netmask 255.255.255.0
        •        network 10.99.0.0
        •        broadcast 10.99.0.255

 

  • sudo ifup enp0s25

 

    •  
    • Install ceph utilities

 

  • sudo apt-get install ceph-common ceph-fs-common

 

    •  
    • Allow root to ssh in directly (required because this is Ubuntu server)

 

  • sudo nano /etc/ssh/sshd_config

 

        • change this line
          • PermitRootLogin prohibit-password
        • To
          • PermitRootLogin yes

 

  • sudo service sshd restart

 

    •  
    • Change root password

 

  • sudo su
  • passwd

 

    •  
    • Relogin to bridge as root
    • Grab the ceph configuration files from Croit
      • Log into Croit management portal (172.16.0.183:8080)
        • Go to Keys
        • Select client.admin
          • Click ‘client ceph.conf’
          • Copy the URL at the top of the popup
          • From root’s ssh session (paste the URL after ceph.conf)

 

  • Click ‘Get Key’
  • Copy the url at the top of the popup
  • From root’s ssh session (paste the url after .keyring)
  • curl -k -o /etc/ceph/ceph.client.admin.keyring http://172.16.0.183:8080/api/download/KEY/ceph.client.admin.keyring

 

          •  
      • Test connection to ceph from bridge

 

  • ceph health
  • If the cluster is accessible, should get back HEALTH_OK or something like that
  • Set ceph settings to allow Ubuntu 16.04 to connect using kernel driver (YMMV, I’ve found 16.04 needs this)
  • ceph osd crush tunables hammer
  • Prepare for and mount ceph filesystem
  • Create mountpoint
  • mkdir /var/storage/ceph
  • Check mon IP’s
  • cat /etc/ceph/ceph.conf
  • mon host = 10.99.0.10, 10.99.0.11, 10.99.0.14
  • Check secret password
  • cat /etc/ceph/ceph.client.admin.keyring

 

        • key = {KEY==}

 

  • Mount the filesystem
  • mount -t ceph 10.99.0.10,10.99.0.11,10.99.0.14:/ /var/storage/ceph -o name=admin,secret={KEY==}

 

      •  
    • Check the mountpoint

 

  • df –h

 

    • Filesystem                          Size Used Avail Use% Mounted on
    • udev                                1.9G 0 1.9G 0% /dev
    • tmpfs                               388M 5.6M 383M 2% /run
    • /dev/sda1                           146G 1.5G 137G 2% /
    • tmpfs                               1.9G 0 1.9G 0% /dev/shm
    • tmpfs                               5.0M 0 5.0M 0% /run/lock
    • tmpfs                               1.9G 0 1.9G 0% /sys/fs/cgroup
    • tmpfs                               388M 0 388M 0% /run/user/0
    • 10.99.0.10,10.99.0.11,10.99.0.14:/  5.3T 2.4G 5.3T 1% /var/storage/ceph



    • Add to fstab so it mounts on boot (note: this will be very slow reboots due to ceph needing Ethernet up to mount, but Ubuntu not bring it up until after fstab, but at least mounting the filesystem won’t be forgotten on reboot)
      • nano /etc/fstab
      • Add this line
        • 10.99.0.10,10.99.0.11,10.99.0.14:/ /var/storage/ceph ceph name=admin,secret={KEY==}
      • Unmount existing mountpoint & remount from fstab

 

  • umount /var/storage/ceph
  • mount –a
  • df –h

 

    •  
  • At this point the bridge into Ceph is working. Can be used by anything that can talk to a mounted folder on Linux.

 

Now to hook Veeam up to ceph as a backup location

From Veeam B&R Console

  • Go to Backup Infrastructure / Linux
    • Right-click – Add server
      • 172.16.0.184
      • Add credentials, or use existing saved ones
  • Go to Backup Repositories
    • Right click – Add backup repository
      • Name – CEPH
      • Linux Server
      • Pick new server, click populate, select /var/storage/ceph
      • Accept the rest of the defaults
  • Now go nuts backing up



Where to for the future:

Now that the proof of concept is in place and working, the next step is to virtualise the master PC so that it can be backed up via Veeam (to another repository). I’ll also virtualise a MONITOR node too, just because I can.

 

Points of note: removing timesyncd and replacing it with ntp is because the PXEbooted nodes need really accurate timesync to the master, and I was having trouble getting them to pick the time up from timesyncd.  I also found that not all Intel motherboard NIC’s are equal - we have a bunch of C2D and early i5 motherboards where their inbuilt NIC refuses to complete the Linux PXEboot process for Croit. They work under our normal PXEboot for Windows, but not Croit’s. This means that we’ve had to use storage nodes with four SATA ports instead of the ones I wanted to with six. Not a big deal in the end.

Theoretically the master could also be the bridge between the main LAN and the storage LAN, but as we have plenty of spare machines and I like to silo tasks off to spread the risk, using a separate machine works well.  I may even virtualise the bridge PC too.

 

Other related posts:
Extending the CEPH cluster, things we've learnt
Building A Win8.1 based Chromebook - A How To
OKI B411n & how to reset the NIC




nzsouthernman's profile

Dael 
Christchurch
New Zealand


This blog is mainly going to be for writing down things when I work them out so when I have to try and do it again I don't have to think too hard.  And also to comment on stuff.  Hopefully not too much rant /rant involved.

My latest finished and successful home project;

QNAS NAS/SAN Appliance
8x 750GB 2.5" SATA in R6 array, running PLEX and providing additional storage for MythTV


Toys in the attic;
PS3
PSP
iPhone 7+ (2D)
MythTV separated backend with 2 DVB-S encoders & 2TB disk space & two frontends

Follow me on twitter; http://twitter.com/nzsouthernman