Geekzone: technology news, blogs, forums
Guest
Welcome Guest.
You haven't logged in yet. If you don't have an account you can register now.




1547 posts

Uber Geek


#114451 20-Feb-2013 11:50
Send private message

Anyone in the know as to what caused IBMs outage yesterday ?

[Moderator edit (MF): Moved to correct forum, changed subject to something meaningful]

View this topic in a long page with up to 500 replies per page Create new topic
 1 | 2
Awesome
4880 posts

Uber Geek

Trusted
Subscriber

  #766332 20-Feb-2013 11:53
Send private message

Is it fixed yet? I haven't seen an updated article.

I'd like to know this myself.




Twitter: ajobbins


2078 posts

Uber Geek

Subscriber

  #766334 20-Feb-2013 11:56
Send private message

According to ComputerWorld it looks like even IBM don't know :)

 
 
 
 


58 posts

Master Geek


  #766341 20-Feb-2013 12:09
Send private message

Watch out for the Big Blue Curtain of Silence.

Took a while for the reason behind the Newton Data Centre falling over to come out as well.

Awesome
4880 posts

Uber Geek

Trusted
Subscriber

  #766345 20-Feb-2013 12:15
Send private message

PIERCD: Took a while for the reason behind the Newton Data Centre falling over to come out as well.


The full story about these kind of things never makes the public domain. I'm quite sure there is a good reason Air NZ shut up very quickly after Fyfe's initial outburst.




Twitter: ajobbins


58 posts

Master Geek


  #766356 20-Feb-2013 12:23
Send private message

ajobbins:
PIERCD: Took a while for the reason behind the Newton Data Centre falling over to come out as well.


The full story about these kind of things never makes the public domain. I'm quite sure there is a good reason Air NZ shut up very quickly after Fyfe's initial outburst.


Agreed, but the industry is so small, it won't take long for it to leak amongst those who know people.

Of course whether that information gets posted to a public forum is a different story.  I know my employer would not be happy if I posted about it here (and we didn't even supply any product to that Datacentre).



1547 posts

Uber Geek


  #766378 20-Feb-2013 12:57
Send private message

PIERCD:

Of course whether that information gets posted to a public forum is a different story.  I know my employer would not be happy if I posted about it here (and we didn't even supply any product to that Datacentre).


Create a disposable account and post it up ;-)

Awesome
4880 posts

Uber Geek

Trusted
Subscriber

 
 
 
 




1547 posts

Uber Geek


  #766432 20-Feb-2013 13:55
Send private message

Can't wait to see what the root cause turns out to be :-)

gzt

11541 posts

Uber Geek

Lifetime subscriber

  #766469 20-Feb-2013 14:52
Send private message

Reading the press my feeling is multiple failures not just one cause.

I will hazard a blind guess that logistics also played a part in the long delay.

2416 posts

Uber Geek

Trusted
Subscriber

  #768422 24-Feb-2013 10:29
Send private message

Given it was just the virtual platform which fell over and not the whole DC, and that it happened at 3am I'd imagine a storage firmware update or failure. From experience if power is not an issue, then only a storage issues will take you completely out like that.

Reading some of the comments by others in the media about why IBM didn't have another DC etc is hilarious. If customers were wiling to pay they would have bought a DR service and been covered, but they didn't, they bought services from a single site and guess what, sometimes stuff goes wrong.






*EDIT*

Someone over at computer world wrote this:

Fact #1. It wasn't XIV storage.
Fact #2. It was not related to planned work or changes - it was a fault.
Fact #3. No data was lost.
Fact #4. There was some corruption with a small subset of data which was successfully restored from recent backup.


gjm

757 posts

Ultimate Geek


  #768426 24-Feb-2013 10:43
Send private message

insane makes some reasonable assumptions ...




[Amstrad CPC 6128: 128k Memory: 3 inch floppy drive: Colour Screen]

3607 posts

Uber Geek

Trusted

  #768436 24-Feb-2013 11:10
Send private message

insane: Given it was just the virtual platform which fell over and not the whole DC, and that it happened at 3am I'd imagine a storage firmware update or failure. From experience if power is not an issue, then only a storage issues will take you completely out like that.

Reading some of the comments by others in the media about why IBM didn't have another DC etc is hilarious. If customers were wiling to pay they would have bought a DR service and been covered, but they didn't, they bought services from a single site and guess what, sometimes stuff goes wrong.






*EDIT*

Someone over at computer world wrote this:

Fact #1. It wasn't XIV storage.
Fact #2. It was not related to planned work or changes - it was a fault.
Fact #3. No data was lost.
Fact #4. There was some corruption with a small subset of data which was successfully restored from recent backup.



Agree completely. People should have always put in a complete seperate DR.




Speedtest 2019-10-14


1245 posts

Uber Geek


  #768463 24-Feb-2013 12:33
Send private message

You'd be surprised how many clients have little regard for DR, or clients who only want to pay for a SLA and then subsequently add loading beyond what they are willing to pay for

1200 posts

Uber Geek

Trusted

  #768492 24-Feb-2013 13:38
Send private message


This was posted over in the comments at Computer World.

Ok, So here are the real facts....
Their IBM XIV storage was due an upgrade, a new cash card was installed that was faulty and it took the entire array down. It then took 3 Days to re-store the array from back-ups. This has come from a customer effected by the outage.

So much for XIV being Tier 1 Storage.... In my opinion, there are only 3 suppliers that produce Enterprise Grade Storage in the IT market: HDS - VSP; EMC - VMAX & IBM - DS800 range... Everything else is midrange with single points of failure.


If this is indeed true, and it fits from my experience (only Storage can take everything offline for nearly 3 days, with complex VLAN Networking problems second.) Then it looks like this was not a well engineered solution, or alternatively  hit bugs in the XIV Storage Array which they needed to get programmers to resolve. 

If the comment from Computer World is true, then I would have expected IBM to have a second XIV Storage Array which they would have been doing snapshots to, and could have failed over to, before they did the hardware upgrade on the primary XIV Storage array, then after testing they could have fallen back to the primary and upgraded the secondary array.

It appears IBM only run 1 storage XIV SAN Array in that case. 

I know of clients in the graphics industry which buy two of these for exactly these reasons, so if IBM didn't have a second one you wonder how cheap they really are.

This is of course all speculation - as I have no insider information, just experience working on a range of SAN's. 




Tyler - Parnell Geek - iPhone 3G - Lenovo X301 - Kaseya - Great Western Steak House, these are some of my favourite things.

gzt

11541 posts

Uber Geek

Lifetime subscriber

  #768502 24-Feb-2013 14:06
Send private message

Remember that IBM indicated two faults. One which caused the original issue and another which was discovered or which occurred when they planned to resume service. It is possible there are several sets of correct information floating around relating to each fault. Your quote and insane's quote may be consistent. Both were anonymous.

One thing is for sure, the continuing speculation is not healthy for IBM reputation.

A conspiracy theorist might suspect this calculation has been made already and details of the fault and the issue found in resuming service may be more damaging than continuing speculation. ; ).

The other perspective is this issue is hardly anyone's business and we should just do the proverbial.

 1 | 2
View this topic in a long page with up to 500 replies per page Create new topic




News »

Freeview On Demand app launches on Sony Android TVs
Posted 6-Aug-2020 13:35


UFB hits more than one million connections
Posted 6-Aug-2020 09:42


D-Link A/NZ extends COVR Wi-Fi EasyMesh System series with new three-pack
Posted 4-Aug-2020 15:01


New Zealand software Rfider tracks coffee from Colombia all the way to New Zealand businesses
Posted 3-Aug-2020 10:35


Logitech G launches Pro X Wireless gaming headset
Posted 3-Aug-2020 10:21


Sony Alpha 7S III provides supreme imaging performance
Posted 3-Aug-2020 10:11


Sony introduces first CFexpress Type A memory card
Posted 3-Aug-2020 10:05


Marsello acquires Goody consolidating online and in-store marketing position
Posted 30-Jul-2020 16:26


Fonterra first major customer for Microsoft's New Zealand datacentre
Posted 30-Jul-2020 08:07


Everything we learnt at the IBM Cloud Forum 2020
Posted 29-Jul-2020 14:45


Dropbox launches native HelloSign workflow and data residency in Australia
Posted 29-Jul-2020 12:48


Spark launches 5G in Palmerston North
Posted 29-Jul-2020 09:50


Lenovo brings speed and smarter features to new 5G mobile gaming phone
Posted 28-Jul-2020 22:00


Withings raises $60 million to enable bridge between patients and healthcare
Posted 28-Jul-2020 21:51


QNAP integrates Catalyst Cloud Object Storage into Hybrid Backup solution
Posted 28-Jul-2020 21:40



Geekzone Live »

Try automatic live updates from Geekzone directly in your browser, without refreshing the page, with Geekzone Live now.


Support Geekzone »

Our community of supporters help make Geekzone possible. Click the button below to join them.

Support Geezone on PressPatron



Are you subscribed to our RSS feed? You can download the latest headlines and summaries from our stories directly to your computer or smartphone by using a feed reader.

Alternatively, you can receive a daily email with Geekzone updates.