Any day you learn something new is a good day


Creating redundant, clustered & scalable storage - a DIY guide

, posted: 11-May-2018 11:20

I thought it’s about time to dust off this blog and post about something that’s been in the back of my mind for a few years now, clustered redundant scalable storage.

 

A bit of history:

Here at Burnside High School where I am a member of the IT Dept we’ve accumulated over time a bunch of older-but-still-working desktop PC’s as they got replaced in the labs with newer machines.  We’ve also quietly built up a large number of 500+ GB SATA hard drives.



The goal:

I’d really like to utilise those spare machines and hard drives to extend our historical Veeam backups past the three to six months that they currently are. As the older backups would be a tertiary onsite backup, I don’t want to spend up on a larger NAS to store them on.

A couple of years ago I tripped over this chap’s blog post (https://www.virtualtothecore.com/en/adventures-ceph-storage-part-1-introduction/) on creating a CEPH clustered filesystem and using it to store Veeam backups on.  Needless to say I followed the instructions and had a test cluster up & running fairly quickly.

While everything worked as expected, building linux machines by hand and using ceph-deploy to manage them one-by-one is time consuming and doesn’t scale all that well.  If other members of the IT Dept are to manage this, it needs to be much easier to deploy.

Since then I’ve had a MASS/Openstack cluster up and running, and dabbled with Mirantis’ FUEL to deploy an Openstack cluster - both solutions worked but are a bit unwieldy (and in FUEL’s hands slightly fragile). Accessing the underlying CEPH storage was also not that easy as it’s designed for Openstack’s use, not mine.



Finding a solution:

Fast forward to a few months ago when I discovered Croit (http://croit.io) as a CEPH deployment and management tool. This pretty cool product ticks all the boxes for us - simple deployment of nodes (PXEboot) and a nice easy to understand GUI (for IT staff to learn). Building a trial cluster was very straightforward, so we moved on to a proof-of-concept build, documented below. I’ve been able to back Veeam up at ~120MB/s into the CEPH storage, and to increase the amount of available storage we just add another machine full of hard disks. The available storage space expands on the fly.



Building your own:

Here’s the instructions I’ve written for our deployment based on Ubuntu 16.04 server for anyone to follow to build their own;

 

Croit installation from scratch @ BHS

 

MASTER LAN: enp5s1: 172.16.0.0/16, IP 172.16.0.183, GW 172.16.0.1, DNS1 172.16.0.1, NTP 172.16.0.1

MASTER STORAGE: enp0s25: 10.99.0.0/24, IP 10.99.0.1  (needs to be isolated as PXE and DHCP are served onto it from Croit)

HARDWARE: 1x desktop 8GB, 2x NIC (MASTER), 1x desktop, 4GB, 2x NIC (BRIDGE)

3x laptop, single NIC (MONITOR, MDS)

3x desktop, single NIC, 4x HDD (STORAGE/OSD)

MASTER & BRIDGE set to boot from internal hard drives

MONITORS & STORAGE/OSD set to PXE boot only

Gigabit switch for STORAGE cluster LAN/PXE boot for nodes

 

    • Install Ubuntu 16.04 server accepting appropriate defaults (just ssh server required);
      • bhsadmin for user account
      • Static IP as above
    • Log in as bhsadmin
    • Update box

 

  • sudo apt-get update
  • sudo apt-get dist-upgrade

 

    •  
    • Disable systemd’s timesync and install ntp & assorted utilities

 

  • sudo systemctl disable systemd-timesyncd
  • sudo systemctl stop systemd-timesyncd
  • sudo apt-get install htop screen ntp

 

    •  
    • Configure ntp.conf for local time server 172.16.0.1

 

  • sudo nano /etc/ntp.conf

 

        • Rem out pools & add our server in this bit;
        • #pool 0.ubuntu.pool.ntp.org iburst
        • #pool 1.ubuntu.pool.ntp.org iburst
        • #pool 2.ubuntu.pool.ntp.org iburst
        • #pool 3.ubuntu.pool.ntp.org iburst
        • server 172.16.0.1 iburst
        •  
        • # Use Ubuntu's ntp server as a fallback.
        • #pool ntp.ubuntu.com

 

  • sudo service ntp restart

 

      • Check that ntp is working correctly

 

  • ntpq –p

 

        •     remote           refid st t when poll reach   delay offset jitter
        • ==============================================================================
        • *firewalker4.bur 122.252.184.186  3 u 3 256 377 0.580 -0.581  13.987
    • Configure second nic for storage lan

 

  • sudo nano /etc/network/interfaces

 

        • auto enp0s25
        • iface enp0s25 inet static
        •        address 10.99.0.1
        •        netmask 255.255.255.0
        •        network 10.99.0.0
        •        broadcast 10.99.0.255

 

  • sudo ifup enp0s25

 

    •  
    • Check ntp listening on all interfaces

 

  • netstat -antu | grep 123

 

        • udp        0 0 10.99.0.1:123           0.0.0.0:*
        • udp        0 0 172.16.0.183:123        0.0.0.0:*
        • udp        0 0 127.0.0.1:123           0.0.0.0:*
        • udp        0 0 0.0.0.0:123             0.0.0.0:*
        • udp6       0 0 :::123                  :::*
    • Install docker

 

  • sudo apt-get install docker.io

 

 

    • Install Croit container (documentation from https://croit.io/production/)

 

  • sudo docker create --name croit-data croit/croit:latest

 

 

    • Start Croit container

 

  • sudo docker run --net=host --restart=always --volumes-from croit-data --name croit -d croit/croit:latest

 

 

  • Configure Croit
    • Browse to IP:8080 (172.16.0.183:8080)
    • Login admin/admin
    • Accept EULA
    • Change admin password
    • Set up cluster network
      • Pick 10.99.0.1 @ enp0s25 off list & click save
      • Click +Create and add network, DHCP start 10.99.0.10, end 10.99.0.200, type = generic network
      • Click Create to save
      • Click Save
    • Wait for image to download (Step 4) and then PXE boot first monitor node (single disk, laptop in this case)
      • Click on node, click edit and adjust name for easy identification
      • Click Disks, click on drive, click on wipe disk, click delete on popup
      • Once disk Role is ‘unassigned’, click on ‘Set to MON’, and ‘Set to MON’ on popup to confirm.
    • Click ‘>’ next icon to proceed to stop 5
    • With the first monitor (10.99.0.10) selected, click ‘Create Cluster’
    • Boot second monitor node, go to ‘SERVERS’ on menu, when booted
      • Click on node, click edit and adjust name for easy identification
      • Click Disks, click on drive, click on wipe disk, click delete on popup
      • Once disk Role is ‘unassigned’, click on ‘Set to MON’, and ‘Set to MON’ on popup to confirm.
    • Boot first storage node, , go to ‘SERVERS’ on menu, when booted (showing ‘running’ in server list)
      • Click on node, click edit and adjust name for easy identification
      • Click Disks, shift-select all drives, click wipe disk, click delete on popup
      • Pick a disk, click ‘Set to Journal’, number of partitions = number of remaining disks in machine, click ‘Set to Journal’ to confirm
      • For remaining disks, select disk, click ‘Set to OSD’, Store backend = Filestore, Journal Disk = the journal disk prepared earlier, click ‘Set to OSD’
    • Select Crushmap on menu
      • Drag first storage node under ‘Root’ to add it to the crushmap
      • Click Save & Execute on popup to confirm
    • Boot second storage node, , go to ‘SERVERS’ on menu, when booted (showing ‘running’ in server list)
      • Click on node, click edit and adjust name for easy identification
      • Click Disks, shift-select all drives, click wipe disk, click delete on popup
      • Pick a disk, click ‘Set to Journal’, number of partitions = number of remaining disks in machine, click ‘Set to Journal’ to confirm
      • For remaining disks, select disk, click ‘Set to OSD’, Store backend = Filestore, Journal Disk = the journal disk prepared earlier, click ‘Set to OSD’
    • Select Crushmap on menu
      • Drag second storage node under ‘Root’ to add it to the crushmap
      • Click Save & Execute on popup to confirm
    • Boot third monitor node, , go to ‘SERVERS’ on menu, when booted
      • Click on node, click edit and adjust name for easy identification
      • Click Disks, click on drive, click on wipe disk, click delete on popup
      • Once disk Role is ‘unassigned’, click on ‘Set to MON’, and ‘Set to MON’ on popup to confirm.
    • Pick a monitor from the ‘SERVERS’ menu
      • Click ‘Services’, click ‘+ MDS’
    • Pick another monitor from the ‘SERVERS’ menu
      • Click ‘Services’, click ‘+ MDS’
    • At this point a shared filesystem has been created. Click ‘Pools’ to see cephfs_data and cephfs_metadata present. Clicking ‘Status’ will show that the redundancy is degraded as the default is three copies of the data. Either add another storage node, or change the size/min size for these pools to 2/2 to make it a dual-copy.
    • Boot third storage node, , go to ‘SERVERS’ on menu, when booted (showing ‘running’ in server list)
      • Click on node, click edit and adjust name for easy identification
      • Click Disks, shift-select all drives, click wipe disk, click delete on popup
      • Pick a disk, click ‘Set to Journal’, number of partitions = number of remaining disks in machine, click ‘Set to Journal’ to confirm
      • For remaining disks, select disk, click ‘Set to OSD’, Store backend = Filestore, Journal Disk = the journal disk prepared earlier, click ‘Set to OSD’
    • Select Crushmap on menu
      • Drag third storage node under ‘Root’ to add it to the crushmap
      • Click Save & Execute on popup to confirm
    • Click ‘Status’ and it will shortly say that Health is ‘OK’ as Ceph balances the small number of PG’s around the three storage nodes.

 

  • Build the BRIDGE to get data in/out of the Ceph cluster without messing with the MASTER node

 

    • Install Ubuntu 16.04 server accepting appropriate defaults (just ssh server required);
      • bhsadmin for user account
    • Static IP enp4s0: 172.16.0.0/16, IP 172.16.0.184, GW 172.16.0.1, DNS1 172.16.0.1, NTP 172.16.0.1
    • Log in as bhsadmin
    • Update box

 

  • sudo apt-get update
  • sudo apt-get dist-upgrade

 

    •  
    • Disable systemd’s timesync and install ntp & assorted utilities

 

  • sudo systemctl disable systemd-timesyncd
  • sudo systemctl stop systemd-timesyncd
  • sudo apt-get install htop screen ntp curl

 

    •  
    • Configure ntp.conf for local time server 172.16.0.1

 

  • sudo nano /etc/ntp.conf

 

        • Rem out pools & add our server in this bit;
        • #pool 0.ubuntu.pool.ntp.org iburst
        • #pool 1.ubuntu.pool.ntp.org iburst
        • #pool 2.ubuntu.pool.ntp.org iburst
        • #pool 3.ubuntu.pool.ntp.org iburst
        • server 172.16.0.1 iburst
        •  
        • # Use Ubuntu's ntp server as a fallback.
        • #pool ntp.ubuntu.com

 

  • sudo service ntp restart

 

      • Check that ntp is working correctly

 

  • ntpq –p

 

        •     remote           refid st t when poll reach   delay offset jitter
        • ==============================================================================
        • *firewalker4.bur 122.252.184.186  3 u 3 256 377 0.580 -0.581  13.987
    • Configure second nic for storage lan

 

  • sudo nano /etc/network/interfaces

 

        • auto enp0s25
        • iface enp0s25 inet static
        •        address 10.99.0.2
        •        netmask 255.255.255.0
        •        network 10.99.0.0
        •        broadcast 10.99.0.255

 

  • sudo ifup enp0s25

 

    •  
    • Install ceph utilities

 

  • sudo apt-get install ceph-common ceph-fs-common

 

    •  
    • Allow root to ssh in directly (required because this is Ubuntu server)

 

  • sudo nano /etc/ssh/sshd_config

 

        • change this line
          • PermitRootLogin prohibit-password
        • To
          • PermitRootLogin yes

 

  • sudo service sshd restart

 

    •  
    • Change root password

 

  • sudo su
  • passwd

 

    •  
    • Relogin to bridge as root
    • Grab the ceph configuration files from Croit
      • Log into Croit management portal (172.16.0.183:8080)
        • Go to Keys
        • Select client.admin
          • Click ‘client ceph.conf’
          • Copy the URL at the top of the popup
          • From root’s ssh session (paste the URL after ceph.conf)

 

  • Click ‘Get Key’
  • Copy the url at the top of the popup
  • From root’s ssh session (paste the url after .keyring)
  • curl -k -o /etc/ceph/ceph.client.admin.keyring http://172.16.0.183:8080/api/download/KEY/ceph.client.admin.keyring

 

          •  
      • Test connection to ceph from bridge

 

  • ceph health
  • If the cluster is accessible, should get back HEALTH_OK or something like that
  • Set ceph settings to allow Ubuntu 16.04 to connect using kernel driver (YMMV, I’ve found 16.04 needs this)
  • ceph osd crush tunables hammer
  • Prepare for and mount ceph filesystem
  • Create mountpoint
  • mkdir /var/storage/ceph
  • Check mon IP’s
  • cat /etc/ceph/ceph.conf
  • mon host = 10.99.0.10, 10.99.0.11, 10.99.0.14
  • Check secret password
  • cat /etc/ceph/ceph.client.admin.keyring

 

        • key = {KEY==}

 

  • Mount the filesystem
  • mount -t ceph 10.99.0.10,10.99.0.11,10.99.0.14:/ /var/storage/ceph -o name=admin,secret={KEY==}

 

      •  
    • Check the mountpoint

 

  • df –h

 

    • Filesystem                          Size Used Avail Use% Mounted on
    • udev                                1.9G 0 1.9G 0% /dev
    • tmpfs                               388M 5.6M 383M 2% /run
    • /dev/sda1                           146G 1.5G 137G 2% /
    • tmpfs                               1.9G 0 1.9G 0% /dev/shm
    • tmpfs                               5.0M 0 5.0M 0% /run/lock
    • tmpfs                               1.9G 0 1.9G 0% /sys/fs/cgroup
    • tmpfs                               388M 0 388M 0% /run/user/0
    • 10.99.0.10,10.99.0.11,10.99.0.14:/  5.3T 2.4G 5.3T 1% /var/storage/ceph



    • Add to fstab so it mounts on boot (note: this will be very slow reboots due to ceph needing Ethernet up to mount, but Ubuntu not bring it up until after fstab, but at least mounting the filesystem won’t be forgotten on reboot)
      • nano /etc/fstab
      • Add this line
        • 10.99.0.10,10.99.0.11,10.99.0.14:/ /var/storage/ceph ceph name=admin,secret={KEY==}
      • Unmount existing mountpoint & remount from fstab

 

  • umount /var/storage/ceph
  • mount –a
  • df –h

 

    •  
  • At this point the bridge into Ceph is working. Can be used by anything that can talk to a mounted folder on Linux.

 

Now to hook Veeam up to ceph as a backup location

From Veeam B&R Console

  • Go to Backup Infrastructure / Linux
    • Right-click – Add server
      • 172.16.0.184
      • Add credentials, or use existing saved ones
  • Go to Backup Repositories
    • Right click – Add backup repository
      • Name – CEPH
      • Linux Server
      • Pick new server, click populate, select /var/storage/ceph
      • Accept the rest of the defaults
  • Now go nuts backing up



Where to for the future:

Now that the proof of concept is in place and working, the next step is to virtualise the master PC so that it can be backed up via Veeam (to another repository). I’ll also virtualise a MONITOR node too, just because I can.

 

Points of note: removing timesyncd and replacing it with ntp is because the PXEbooted nodes need really accurate timesync to the master, and I was having trouble getting them to pick the time up from timesyncd.  I also found that not all Intel motherboard NIC’s are equal - we have a bunch of C2D and early i5 motherboards where their inbuilt NIC refuses to complete the Linux PXEboot process for Croit. They work under our normal PXEboot for Windows, but not Croit’s. This means that we’ve had to use storage nodes with four SATA ports instead of the ones I wanted to with six. Not a big deal in the end.

Theoretically the master could also be the bridge between the main LAN and the storage LAN, but as we have plenty of spare machines and I like to silo tasks off to spread the risk, using a separate machine works well.  I may even virtualise the bridge PC too.

 



Building A Win8.1 based Chromebook - A How To

, posted: 1-May-2014 11:32

Background: We've got quite a few Netbooks of varying horsepower here, and for the slower ones I've been converting them to a fast booting locked down EduBuntu build - works fine.

Except for one model of HP Netbook where the WiFi driver wants authentication every time it roams between AP's - not cool in a student environment.  Win7 on them is far too slooooow to use...

So, I decided to have a quick look at Win8.1 and see how it copes.  8.1 has a new feature - assigned application to user that pretty much takes care of fiddling, and after a bit of thought I figured out how to staticise Chrome and turn the Win8.1 build Netbook into a Chromebook.  Here's the process for those who might want to try this themselves.

This process is how to build a Windows 8.1 based Chromebook.

Install Win8.1 onto the machine.  (Use the demo/eval Win8.1 ISO)
Language: English (US)
Time & Currency: English (NZ)

Custom Install, wipe the drive (delete any partitions the installer finds) and let Win8.1 use the whole thing.

Give it a name, pick the ethernet network, choose Customize for settings;
‘No’ for networks in public places
Windows Update - Automatic
Auto Drivers - off
Autoupdate Apps - off
SmartScreen - on
Do Not Track - on
Windows Error Reporting - off
Compatibility Lists - on
Location data - off
Customer Experience - off
Help Experience - off
Use Bing - off
IE Page Prediction - off
Use my name - off
Use my advertising ID - off
Request Location - off
Active Protection - off

Create a local admin account (Toshiba or BHS Admin) and give it the usual password.
(Scroll the page down, create new account, scroll down again, sign in without a microsoft account)

Fully Windows Update + additions to pick up any required drivers. (PC settings, Update & Recovery, go into ‘Choose how updates get installed & turn on Microsoft Update)

Download & Install the ‘Chrome Browser for Education’ from here; http://www.google.co.nz/intl/en/chrome/education/browser/admin/

(This is a Chrome install for all users).

Create the student account;
Press the windows key, type ‘users’ and select ‘Add, delete, and manage other user accounts’
Add account called ‘BHS Students’, select child account. No password required.

Log out as Toshiba, then log in as BHS Students. (To log out, press the Windows key, click account name and log out under that)
Fire up Chrome, set it as the default browser, go into Chrome settings;
‘On startup’ - ‘Open a specific page’ set to learn.burnside.school.nz
Expand out advanced settings
Passwords & forms - disable both
HTTPS/SSL - tick ‘Check for server certificate revocation’

Close Chrome and re-open to double-check it goes to Moodle

Log out of BHS Students and back in as Toshiba

Grab Chrome Defaults & make them static

Make c:\users\static folder
Copy c:\users\BHS Students\appdata\local\google\chrome\user data\default to c:\users\static\default
Make c:\users\static\Netlogon folder

Set up the share (from an Administrative command prompt)

net share Netlogon=c:\users\static\netlogon

Use notepad to create in c:\users\static\netlogon a file called reset_chrome.cmd

Put the following line in it;

robocopy /mir c:\users\static\default “c:\users\bhs students\appdata\local\google\chrome\user data\default”


Bring up explorer (Windows-E) and right-click ‘This PC’ and select manage
Expand out Local Users and Groups, Users, double-click BHS Students
Tick ‘User cannot change password’, then go to the profile tab and enter reset_chrome.cmd into the login script field.


Log out as Toshiba and back in as BHS Students. If you’re vigilant you might see the script running briefly.  To test it, bookmark something, then log out and back in and see if the bookmark exists.  As long as it’s gone the script is doing it’s job.

User the power options to set lid-close-power off and power-button-power off.  Also hook the device up to WiFi as well. Disable sleep. Set up time sync.

Set up the kiosk mode

Log back in as Toshiba, run the users management app as you did at first to create the student account
Click on the ‘Set up an account for assigned access’.
Pick ‘BHS Students’ as the user, and then ‘Google Chrome’ as the application. (Chrome will only show up if you’re been in as the user and set it as the default browser.)

Get out of account management and reboot the device. Windows 8.1 will remember the last user you logged in as so it’ll prompt you to sign back in as Toshiba - go in as BHS Students and viola you should have a Chromebook like interface.

Once you’re happy, image the machine with your imager-of-choice (Clonezilla in our case) and deploy to subsequent machines as normal.  Just need to change the machine name & activate Windows by entering your own key.  


Note: on older devices where you might have a sub 1024x768 resolution (hint: netbooks) Win8.x apps may not launch as they need a minimum of 1024x768 resolution. To work around this; search for Display1_DownScalingSupported in the registry and set it to 1.  In the same place you find that you’ll also find a DynamicScaling entry - set that to 1 as well. Look for all instances of these two and change accordingly.

Reboot, go back in as Toshiba and set the resolution to 1024x768 and Chrome will now launch as the BHS Students account.



OKI B411n & how to reset the NIC

, posted: 9-Sep-2013 15:51

We've just had a real problem attempting to change the IP address on an OKI B411dn printer - spent half a day on the damn thing.

For those of you unfamiliar with this model it doesn't have a nice menu, and when (somehow, we still don't know how) someone changes the IP address & sets TCP/IP to disabled it's very, very hard to resolve.  All the configuration utilities depend on TCP/IP being active.  Once disabled it packs up its toys and goes home.

On any other model of printer you'd just reset the NIC to factory and be on your way in 5 min.  The B411 not having anything other than an Online button is not so easy.  Googling didn't find anything, and calling Oki support in NZ wasn't any good either as they'd not struck this either.

Without further ado, here's how we eventually found out how to reset the NIC;

1) Turn printer off
2) Open the lid
3) Hold down the 'Online' button
4) Turn printer on while still holding the button
5) Eventually the printer will tell you that the lid is open. keep holding the button for another 5 sec
6) Let go of button & close lid.
7) If everything goes well you should see the display say that it's resetting the NIC

If it doesn't, rinse and repeat and hold step 5 a bit longer.

:)



Imaging Edubuntu

, posted: 21-Aug-2013 15:50

Mental Notes: when imaging Edubuntu with pre-configured wireless configurations (ie ones done on install so that they become system configurations, not user created ones)

Ubuntu stores the MAC address of the original machine's wireless adapter in /etc/NetworkManager/system-connections/{wireless lan name}

Removing the line
mac-address=xx:xx:xx:xx:xx:xx

will allow it to bind to any wireless adapter found on a future machine.

Also, when installing your reference machine, do not encrypt the first user's home directory as this will enable an encrypted swap volume which will play hell with your imaged clients later.

Lastly, the utility ofris (called gofris) will allow you to lock down a particular user account so that when the machine reboots all changes go away.

I've created a script that I download via wget to a reference machine when I build it that itself wgets down some files and updates Edubuntu, installed Chromium & a few other things we want, removes features we don't want, creates our student (limited access) user, sets up the auto-logon for the student.

The only things I have to do by hand for the machine is lock the student down, disable notification of errors & fix the logged-out wallpaper.

Then I can clone the reference machine with clonezilla and roll it to any machines we want - with one caveat - the reference machines HD has to be smaller than any of the target machines if you want this done easy & quick.

We're hopefully going to get rid of Win7 netbooks and make them all Edubuntu ones. :) Yee ha.



Have people never heard of ad blockers?

, posted: 16-Aug-2013 15:08

While perusing Stuff.co.nz at lunch today I come across this article http://www.stuff.co.nz/technology/digital-living/9046428/The-most-annoying-ads-in-your-Facebook-feed complaining about ads in Facebook.

It isn't really rocket science to install AdBlock Plus into {insert browser of choice, but probably not Internet Explorer} and once done, those ads are a thing of the past.  Continued whining about ads in the media annoys the hell out of me when the solution is half a dozen clicks away.

Just remember to put in an exclusion for geekzone.co.nz as @freitasm can use the revenue. :)



Reason #703 of why I love VMware

, posted: 27-Jul-2013 15:36

A tale of two dead servers - or as I prefer to call it 'holy cr*p why won't these b*stards boot?'

Yesterday we had a scheduled power outage for our whole site that was planned to last longer than our data center UPS could stay up for. Being the paranoid engineers that we are we carefully went through all our physical & virtual boxes dotted around the campus and shut them all down.

Several of our ESXi hosts had uptime over 500 days and this was their first power off in a very long time. Needless to say I was a tad nervous about bringing them back online - even though our disaster recovery offsite backups are all good, having to recover from them would ruin my weekend.

Well, after a 50min power cut we're ready for powering everything back on.  Everything bar two ESXi hosts (of the six on site) booted fine - one host booting ESXi 4.1 off a USB stick failed on vga.z, and the other, our newest IBM x3650, wouldn't go past a blinking cursor on the screen. We eventually worked out that the fault for both of them was that the USB sticks they were booting off had decided that they'd had their last boot. It was the previous one.

Thankfully VMware is prepared for this kind of problem, especially if you're booting ESX off a USB stick - the hypervisor & the VM's themselves are mutually exclusive. (We can't afford a SAN so have local storage on each host).

For the older ESX4.1 server all I had to do was find another USB stick we had lying around already prepared and boot off it. The x3650 was ESX5 so I had to find my ESX5 cd in my drawer and install it to a USB stick, then boot off it.

Then it's just a case of recreating the appropriate virtual switches, find the datastores (ESX5 found them for us, 4.1 had to be told to go looking), find & add the VM's to the inventories and start them.  We did have a quick look at the .vmx files to confirm that we'd named the vSwitches correctly, but other than that it was find, add & start all the VM's across those two hosts.

The last thing to do was remove and re-add the hosts to vCenter, which was one of the VM's on the x3650, but that went painlessly as well.

Weekend saved.  We spent more time confirming that it was the USB sticks at fault then we did recovering from them.

Next week's jobs - purchase some 'certified for VMware' usb sticks to boot from that will last the distance, and placate Veeam backup & recovery which failed to back up last night due to the fact that while everything's the same name, they're not the same VM's anymore and it won't find them.



Novopay - my thoughts as to why this has been a debacle

, posted: 23-Jan-2013 11:15

Disclaimer: I now work for a high school, but am not involved with payroll.  My thoughts are my own and are based on 20 years of experience as a systems Engineer for various IT companies.

From what I’ve been able to glean from the different payroll people I interact with, the way payroll staff interact with Novopay is to grab a PDF off their site, fill it in (either in Adobe Reader or suchlike) or print it out & fill it in, then scan it and send it back to Novopay for processing.

This is where the human processing errors are introduced - I believe that Novopay is getting their data entry processed in countries where the daily wage is low and English is not their first language. Peanuts = monkeys kind of thing.

Xero has been able to develop a fully functional world leading Web based accounting system - you would think that in this day and age a first world designed Payroll program could be easily web based so that the only time the data is entered into it is by the people whom it originates from.

From then on it would be totally programmatic business process rules that manipulate the data without human’s to screw it up.

Just my $.02 worth.



Playing with Python

, posted: 24-Oct-2012 22:07

For the last few years I've had a hankering to learn another programming language, and since I've had a little bit of time over the last couple of weeks free I've been playing with Python.

First impression - wow, what a productive tool.

During my life I've learned Basic (on the Vic20 & C64 continuing into GW Basic and the like on Dos/Windows, then Pascal, then C, then Visual Basic.  I ended up settling on KIX (a Microsoft written scripting language) that did pretty much everything I needed to write little tools that did stuff quick & dirty.

Well, now that I've dived into Python I'm sorry KIX, but your days are over as my general purpose go-to-language.

The kind of tools I usually need to write these days are ones that either a) take some config from an .INI file and do stuff at a regular time, or b) parse output file(s) from something else and do stuff with them so that other things can be done. KIX was OK at both of these - native .INI support, Windows registry support & AD aware made it a great general purpose language to know (yes, I know PowerShell can do all this and more, but PowerShell's not the easiest thing to find good & easy to follow documentation for learning on).

Python however has all the goodness of KIX + a huge library of well documented & easy to use functions that are far more productive than anything I've used previously.  The string handling functions alone are amazing.  And since python.org has a great tutorial to follow through for learning I got running immediately.

So far it's been a week and a half since I first started in Python and I've got two file processing tools & one fully fledged GUI app up and running in production.

One tool takes a *huge* .csv that's spat out of one of our systems that contains information that I need to separate out into a big bunch of individual files - but the .csv isn't continuous.  It's got lot of different sections separated by line feeds.

Parsing this file in KIX took about 100 lines of code to do what I needed, and took me a week to get the logic just right. Doing it in Python takes 11 lines and took me about 30min.  And I didn't even use the same logic I used for KIX - I decided to do it differently.

I recommend Python to anyone who's wondering what to learn next - PHP/Ruby will be next on my list of cross platform languages I think.



iPhone apps I've recently found and love.

, posted: 6-Aug-2011 23:45

How did I survive before finding these?

Viber  (text & cell calling via data. Think Skype but easier to use)
HeyTell (treat the iPhone like an RT)
Songify (make yourself sing like a complete loser, but it's a real giggle)
MythRemote & RRgh (turns iPhone into MythTV remote controls over wifi)
yxplayer2 lite (stream recordings from MythTV backend to the iPhone)



Life in the fast lane, not!

, posted: 6-Aug-2011 23:25

It's been a pretty strange year this year.  The ground's been shaking a bit here in Canterbury, and I made the decision to leave the reseller game and jump the fence.

I spent 18 years as a systems engineer for a couple of resellers, nine at Axon and nine at ShapeIT.

Now I've joined Burnside High School as the 5th member of their IT Department.  And I couldn't be happier!

What? I hear you say.  Won't you get bored?

Never.  BHS is the 3rd or 4th largest high school in NZ and has a network that's bigger and more complex than all my old customers added together.  Around 1000 student machines & 200 odd staff machines on the network in one site. (And that's just BHS.  Avonside Girls is here too....)

All the chunky goodness of my past customers without the travel! :)

I'll blog now and then on the projects that we're undertaking - I don't have time now to go into any detail, but here's a few that we've done in the last couple of months;

* Extend the wifi network so that all students can bring their home computers to school and 'do stuff' with them on the internet. All logged, proxied & authenticated without the students having to do anything but add a new wifi config to their computer.
* Set up driverless printing so that students can upload their document to a webpage, pick their appropriate printer and have whatever it is come out regardless of whatever device they're printing from yet still record & bill their printing.
* Replaced all 70 odd switches around the school with brand new ones doing gig to the desktop + link aggregation to the core.  We can sustain 500mbps from one side of the campus to the other with the new gear in place.

Coming up;

* Win7 hardware agnostic rollout with automated application installation.
* Replacement of the student computer management & monitoring software to a different vendor's products
* Offsite backup & replication
* VoIP phone system implementation



Busy busy busy.  Makes the weeks go very fast I must say.



nzsouthernman's profile

Dael 
Christchurch
New Zealand


This blog is mainly going to be for writing down things when I work them out so when I have to try and do it again I don't have to think too hard.  And also to comment on stuff.  Hopefully not too much rant /rant involved.

My latest finished and successful home project;

QNAS NAS/SAN Appliance
8x 750GB 2.5" SATA in R6 array, running PLEX and providing additional storage for MythTV


Toys in the attic;
PS3
PSP
iPhone 7+ (2D)
MythTV separated backend with 2 DVB-S encoders & 2TB disk space & two frontends

Follow me on twitter; http://twitter.com/nzsouthernman