I was excited to put this new server to the test, and consolidate various aging servers onto it. I wanted to have a great first blog post about it, but sadly I cannot rave about this server. It has brought me pain.
Processors / Memory
The new Nehalem processors are fast, very very fast. But reports of serious bugs with the timers/interrupts Microsoft went to publish a kb article about recommending not using these for Hyper-V before Intel stepped in and stopped them. The workaround was to disable the power saving features (turning off cores) but now there is a hotfix out which works around the processor bugs.
Slashdot | Microsoft Advice Against Nehalem Xeons Snuffed Out
The server is stacked with 16 RAM slots (8 per processor, i.e. if you don’t have a second processor then only 8 slots are usable.)
If you have ever installed RAM in one of these and are puzzled by the odd RAM installation numbering it’s to do with cooling.
IBM has had some serious data loss bugs with the ServeRAID 7k/8k RAID cards, and so IBM changed course and dumped Adaptec which made the old ServeRAID controllers and brought out their new RAID controllers which use LSI chips.
There’s three downfalls, the first is the famous, lovely, easy to use ServeRAID software has been replaced with a uglier, slow, buggy cousin "MegaRAID", and the price of the RAID card has increased by 50% (%100 if you count the cache battery.)
The other gotcha which caught me out is that the old RAID cards came with cache batteries (without which RAID runs at speeds 10x less than desktop hard drives in some operations, and the cache batteries are half the cost of the RAID card again. HP got this right, they make their own RAID chipsets (from what I can work out), and their RAID controllers on even their cheap servers come with RAID5 built in. With IBM you throw away the bundled RAID controller which only does RAID 1 and buy a new controller, and the cache battery. I hear the newer higher end RAID controllers will now ship with the batteries again. This was a silly mistake on IBM's part. The reasoning IBM give is that the batteries are now classified as consumables.
UEFI - The Replacement for the 20 year old BIOS
The X3650 M2 is one of the first servers from IBM with the UEFI (Unified Extensible Firmware Interface) which is the replacement for the aging BIOS. I love the concept, and it needed to be done, you can configure ALL your adapter’s in the main ‘BIOS’ screen, including RAID cards, network adapter firmware settings etc instead of waiting for the correct time in the boot sequence and choosing the specific adapter firmware to configure.
The problem is IBM's implementation is painfully slow. It takes around 4 minutes between plugging power into the server and being able to turn it on (the power light flashes rapidly during this period), it then takes approximately another 4 minutes to go through the firmware initialisation before you can then boot the OS. The ‘BIOS’ Raid Configuration screens are also so painfully slow that I was stunned how it was possible to take such a fast server and make it display graphics so slowly! The shipping version of the "WebBIOS" for the RAID also had bugs where if you didn't click on a button the whole BIOS locked up hard. Time for another 4 minute power cycle. It was like playing Minesweeper!
I have a feeling there will be many firmware updates to improve the performance of the UEFI of this server over its lifecycle.
There is also a discussion going on about the finer points and other problems found such as the IMM Windows interface (which is USB) dropping domain controllers firewalls to Public mode causing all kinds of grief.
First real world experiences with IBM’s x3650 M2
Integrated Management Module (IMM)
The IMM is much improved, and now has a web interface standard, instead of having to upgrade to the RSA II Adapter, it has some cool features like taking a ‘picture’ of the server as it blue screens which it keeps a few of in the IMM so you can diagnose those hard to find problems, and a ploretha of data such as temp and fan speeds etc.
To note you still have to purchase the key to unlock remote screen control using the IMM when deploying servers remotely. This is still the same price as the RSA II adapters.
To note, we did have a problem with the cooling in our server room a few weeks ago, and the x3650 M2 is the only server which locked up hard. Older IBM boxes and Apple xserve's kept ticking over fine. The x3650 M2 likes it cool and freaks out when things get a little bit warm.
Microsoft Hyper-V VHD Bugs
Another issue we have stuck is a bug in Microsoft's vhdmp.sys which handles VHD files which blue screens on backup using Volume Shadow Copy.
We have one big data drive on our Hyper-V box with all the VHD's, we ALSO have a VHD which boots the physical Hyper-V machine so any software updates, windows upgrades are easier. This is something Microsoft supports but has a critical bug. There are discussions over here
vhdmp.sys BSOD 0x000000ca - StorageCraft Support Center
at the ShadowProtect forums and Microsoft said it was on the top 5 list for that product group which usually means a hotfix is out within 7 days but we are 3 weeks and counting.
While I wouldn't claim this is IBM's finest hour, the x3650 M2 is still a solid box, I think they x3650 M2 will get better over time with firmware updates but right now there are a lot of rough edges with this box.
When this server is deployed with the correct RAID cards and cache batteries and enough RAM it makes a great Hyper-V virtualisation platform. We have had ours in production which has half a dozen in production servers and about that again test machines and screams along.
My next post will cover some of the Hyper-V and Virtualization goodness.
Other related posts:
All Xserve'ed out. Apple exits the enterprise server market.
The case of the missing Cache…
Comment by Cameron Rangeley, on 8-Dec-2009 19:55
I run three of these servers 2 * windows and 1 running vmware esx and they do run great!
Comment by ng, on 9-Dec-2009 03:23
Comment by ng, on 9-Dec-2009 03:24
Comment by ng, on 9-Dec-2009 03:51
Comment by nzsouthernman, on 9-Dec-2009 08:43
Why run Hyper-V when you can run ESX and have a more stable hypervisor underneath the VM's? Aside from being able to use ShadowProtect to image your VHD's that is... :)
Comment by nathan, on 10-Dec-2009 00:52
nzsouthernman Dael, did you have some stability problems with Hyper-V?
Comment by nathan mercer, on 10-Dec-2009 00:56
Tyler, I don't have your email address, but can you please flick me an email to email@example.com and I'll make sure the vhdmp.sys is getting sorted through support in the best way