Geekzone: technology news, blogs, forums
Guest
Welcome Guest.
You haven't logged in yet. If you don't have an account you can register now.




720 posts

Ultimate Geek
+1 received by user: 95


Topic # 114451 20-Feb-2013 11:50 Send private message

Anyone in the know as to what caused IBMs outage yesterday ?

[Moderator edit (MF): Moved to correct forum, changed subject to something meaningful]




View this topic in a long page with up to 500 replies per page Create new topic
 1 | 2
Awesome
3704 posts

Uber Geek
+1 received by user: 311

Trusted
Subscriber

  Reply # 766332 20-Feb-2013 11:53 Send private message

Is it fixed yet? I haven't seen an updated article.

I'd like to know this myself.




Twitter: ajobbins

1315 posts

Uber Geek
+1 received by user: 38

Subscriber

  Reply # 766334 20-Feb-2013 11:56 Send private message

According to ComputerWorld it looks like even IBM don't know :)

58 posts

Master Geek

Subscriber

  Reply # 766341 20-Feb-2013 12:09 Send private message

Watch out for the Big Blue Curtain of Silence.

Took a while for the reason behind the Newton Data Centre falling over to come out as well.

Awesome
3704 posts

Uber Geek
+1 received by user: 311

Trusted
Subscriber

  Reply # 766345 20-Feb-2013 12:15 Send private message

PIERCD: Took a while for the reason behind the Newton Data Centre falling over to come out as well.


The full story about these kind of things never makes the public domain. I'm quite sure there is a good reason Air NZ shut up very quickly after Fyfe's initial outburst.




Twitter: ajobbins

58 posts

Master Geek

Subscriber

  Reply # 766356 20-Feb-2013 12:23 Send private message

ajobbins:
PIERCD: Took a while for the reason behind the Newton Data Centre falling over to come out as well.


The full story about these kind of things never makes the public domain. I'm quite sure there is a good reason Air NZ shut up very quickly after Fyfe's initial outburst.


Agreed, but the industry is so small, it won't take long for it to leak amongst those who know people.

Of course whether that information gets posted to a public forum is a different story.  I know my employer would not be happy if I posted about it here (and we didn't even supply any product to that Datacentre).



720 posts

Ultimate Geek
+1 received by user: 95


  Reply # 766378 20-Feb-2013 12:57 Send private message

PIERCD:

Of course whether that information gets posted to a public forum is a different story.  I know my employer would not be happy if I posted about it here (and we didn't even supply any product to that Datacentre).


Create a disposable account and post it up ;-)




Awesome
3704 posts

Uber Geek
+1 received by user: 311

Trusted
Subscriber

  Reply # 766395 20-Feb-2013 13:30 Send private message





Twitter: ajobbins



720 posts

Ultimate Geek
+1 received by user: 95


  Reply # 766432 20-Feb-2013 13:55 Send private message

Can't wait to see what the root cause turns out to be :-)




gzt

3774 posts

Uber Geek
+1 received by user: 112

Subscriber

  Reply # 766469 20-Feb-2013 14:52 Send private message

Reading the press my feeling is multiple failures not just one cause.

I will hazard a blind guess that logistics also played a part in the long delay.

1743 posts

Uber Geek
+1 received by user: 149

Trusted
Subscriber

  Reply # 768422 24-Feb-2013 10:29 Send private message

Given it was just the virtual platform which fell over and not the whole DC, and that it happened at 3am I'd imagine a storage firmware update or failure. From experience if power is not an issue, then only a storage issues will take you completely out like that.

Reading some of the comments by others in the media about why IBM didn't have another DC etc is hilarious. If customers were wiling to pay they would have bought a DR service and been covered, but they didn't, they bought services from a single site and guess what, sometimes stuff goes wrong.






*EDIT*

Someone over at computer world wrote this:

Fact #1. It wasn't XIV storage.
Fact #2. It was not related to planned work or changes - it was a fault.
Fact #3. No data was lost.
Fact #4. There was some corruption with a small subset of data which was successfully restored from recent backup.


gjm

619 posts

Ultimate Geek
+1 received by user: 40

Subscriber

  Reply # 768426 24-Feb-2013 10:43 Send private message

insane makes some reasonable assumptions ...




[Amstrad CPC 6128: 128k Memory: 3 inch floppy drive: Colour Screen]

2869 posts

Uber Geek
+1 received by user: 131

Trusted
Subscriber

  Reply # 768436 24-Feb-2013 11:10 Send private message

insane: Given it was just the virtual platform which fell over and not the whole DC, and that it happened at 3am I'd imagine a storage firmware update or failure. From experience if power is not an issue, then only a storage issues will take you completely out like that.

Reading some of the comments by others in the media about why IBM didn't have another DC etc is hilarious. If customers were wiling to pay they would have bought a DR service and been covered, but they didn't, they bought services from a single site and guess what, sometimes stuff goes wrong.






*EDIT*

Someone over at computer world wrote this:

Fact #1. It wasn't XIV storage.
Fact #2. It was not related to planned work or changes - it was a fault.
Fact #3. No data was lost.
Fact #4. There was some corruption with a small subset of data which was successfully restored from recent backup.



Agree completely. People should have always put in a complete seperate DR.





758 posts

Ultimate Geek
+1 received by user: 17


  Reply # 768463 24-Feb-2013 12:33 Send private message

You'd be surprised how many clients have little regard for DR, or clients who only want to pay for a SLA and then subsequently add loading beyond what they are willing to pay for

1200 posts

Uber Geek
+1 received by user: 3

Trusted

  Reply # 768492 24-Feb-2013 13:38 Send private message


This was posted over in the comments at Computer World.

Ok, So here are the real facts....
Their IBM XIV storage was due an upgrade, a new cash card was installed that was faulty and it took the entire array down. It then took 3 Days to re-store the array from back-ups. This has come from a customer effected by the outage.

So much for XIV being Tier 1 Storage.... In my opinion, there are only 3 suppliers that produce Enterprise Grade Storage in the IT market: HDS - VSP; EMC - VMAX & IBM - DS800 range... Everything else is midrange with single points of failure.


If this is indeed true, and it fits from my experience (only Storage can take everything offline for nearly 3 days, with complex VLAN Networking problems second.) Then it looks like this was not a well engineered solution, or alternatively  hit bugs in the XIV Storage Array which they needed to get programmers to resolve. 

If the comment from Computer World is true, then I would have expected IBM to have a second XIV Storage Array which they would have been doing snapshots to, and could have failed over to, before they did the hardware upgrade on the primary XIV Storage array, then after testing they could have fallen back to the primary and upgraded the secondary array.

It appears IBM only run 1 storage XIV SAN Array in that case. 

I know of clients in the graphics industry which buy two of these for exactly these reasons, so if IBM didn't have a second one you wonder how cheap they really are.

This is of course all speculation - as I have no insider information, just experience working on a range of SAN's. 




Tyler - Parnell Geek - iPhone 3G - Lenovo X301 - Kaseya - Great Western Steak House, these are some of my favourite things.

gzt

3774 posts

Uber Geek
+1 received by user: 112

Subscriber

  Reply # 768502 24-Feb-2013 14:06 Send private message

Remember that IBM indicated two faults. One which caused the original issue and another which was discovered or which occurred when they planned to resume service. It is possible there are several sets of correct information floating around relating to each fault. Your quote and insane's quote may be consistent. Both were anonymous.

One thing is for sure, the continuing speculation is not healthy for IBM reputation.

A conspiracy theorist might suspect this calculation has been made already and details of the fault and the issue found in resuming service may be more damaging than continuing speculation. ; ).

The other perspective is this issue is hardly anyone's business and we should just do the proverbial.

 1 | 2
View this topic in a long page with up to 500 replies per page Create new topic








Twitter »
Follow us to receive Twitter updates when new discussions are posted in our forums:



Follow us to receive Twitter updates when news items and blogs are posted in our frontpage:



Follow us to receive Twitter updates when new jobs are posted to our jobs board:



Follow us to receive Twitter updates when tech item prices are listed in our price comparison site:




News »

Trending now »
Hot discussions in our forums right now:

Telecom introduces unlimited broadband data plan
Created by freitasm, last reply by plambrechtsen on 25-Apr-2014 10:48 (110 replies)
Pages... 6 7 8


Stonedine
Created by Lizard1977, last reply by mattwnz on 24-Apr-2014 15:45 (67 replies)
Pages... 3 4 5


Windows 8 System Mechanics
Created by eme, last reply by eme on 24-Apr-2014 21:10 (20 replies)
Pages... 2


Using my Mac to ring family in the UK
Created by Geektastic, last reply by nakedmolerat on 24-Apr-2014 11:28 (19 replies)
Pages... 2


Telecom has started metering their TiVo customers' broadband usage (WITHOUT PRENOTIFICATION)
Created by Peteriv, last reply by mattwnz on 24-Apr-2014 15:11 (74 replies)
Pages... 3 4 5


Parallel imported product
Created by Wills1, last reply by joker97 on 23-Apr-2014 21:01 (53 replies)
Pages... 2 3 4


MH370 - Call for Search & Rescue Help
Created by DS248, last reply by joker97 on 25-Apr-2014 10:14 (748 replies)
Pages... 48 49 50


Forms of government for New Zealand
Created by charsleysa, last reply by Sidestep on 25-Apr-2014 10:00 (179 replies)
Pages... 10 11 12



Geekzone Live »
Try automatic live updates from Geekzone directly in your browser, without refreshing the page, with Geekzone Live now.

Are you subscribed to our RSS feed? You can download the latest headlines and summaries from our stories directly to your computer or smartphone by using a feed reader.

Alternatively, you can receive a daily email with Geekzone updates.