Geekzone: technology news, blogs, forums
Guest
Welcome Guest.
You haven't logged in yet. If you don't have an account you can register now.




803 posts

Ultimate Geek
+1 received by user: 115


Topic # 114451 20-Feb-2013 11:50 Send private message

Anyone in the know as to what caused IBMs outage yesterday ?

[Moderator edit (MF): Moved to correct forum, changed subject to something meaningful]




View this topic in a long page with up to 500 replies per page Create new topic
 1 | 2
Awesome
3841 posts

Uber Geek
+1 received by user: 364

Trusted
Subscriber

  Reply # 766332 20-Feb-2013 11:53 Send private message

Is it fixed yet? I haven't seen an updated article.

I'd like to know this myself.




Twitter: ajobbins

1417 posts

Uber Geek
+1 received by user: 54

Subscriber

  Reply # 766334 20-Feb-2013 11:56 Send private message

According to ComputerWorld it looks like even IBM don't know :)

58 posts

Master Geek

Subscriber

  Reply # 766341 20-Feb-2013 12:09 Send private message

Watch out for the Big Blue Curtain of Silence.

Took a while for the reason behind the Newton Data Centre falling over to come out as well.

Awesome
3841 posts

Uber Geek
+1 received by user: 364

Trusted
Subscriber

  Reply # 766345 20-Feb-2013 12:15 Send private message

PIERCD: Took a while for the reason behind the Newton Data Centre falling over to come out as well.


The full story about these kind of things never makes the public domain. I'm quite sure there is a good reason Air NZ shut up very quickly after Fyfe's initial outburst.




Twitter: ajobbins

58 posts

Master Geek

Subscriber

  Reply # 766356 20-Feb-2013 12:23 Send private message

ajobbins:
PIERCD: Took a while for the reason behind the Newton Data Centre falling over to come out as well.


The full story about these kind of things never makes the public domain. I'm quite sure there is a good reason Air NZ shut up very quickly after Fyfe's initial outburst.


Agreed, but the industry is so small, it won't take long for it to leak amongst those who know people.

Of course whether that information gets posted to a public forum is a different story.  I know my employer would not be happy if I posted about it here (and we didn't even supply any product to that Datacentre).



803 posts

Ultimate Geek
+1 received by user: 115


  Reply # 766378 20-Feb-2013 12:57 Send private message

PIERCD:

Of course whether that information gets posted to a public forum is a different story.  I know my employer would not be happy if I posted about it here (and we didn't even supply any product to that Datacentre).


Create a disposable account and post it up ;-)




Awesome
3841 posts

Uber Geek
+1 received by user: 364

Trusted
Subscriber

  Reply # 766395 20-Feb-2013 13:30 Send private message





Twitter: ajobbins



803 posts

Ultimate Geek
+1 received by user: 115


  Reply # 766432 20-Feb-2013 13:55 Send private message

Can't wait to see what the root cause turns out to be :-)




gzt

4135 posts

Uber Geek
+1 received by user: 156

Subscriber

  Reply # 766469 20-Feb-2013 14:52 Send private message

Reading the press my feeling is multiple failures not just one cause.

I will hazard a blind guess that logistics also played a part in the long delay.

1877 posts

Uber Geek
+1 received by user: 188

Trusted
Subscriber

  Reply # 768422 24-Feb-2013 10:29 Send private message

Given it was just the virtual platform which fell over and not the whole DC, and that it happened at 3am I'd imagine a storage firmware update or failure. From experience if power is not an issue, then only a storage issues will take you completely out like that.

Reading some of the comments by others in the media about why IBM didn't have another DC etc is hilarious. If customers were wiling to pay they would have bought a DR service and been covered, but they didn't, they bought services from a single site and guess what, sometimes stuff goes wrong.






*EDIT*

Someone over at computer world wrote this:

Fact #1. It wasn't XIV storage.
Fact #2. It was not related to planned work or changes - it was a fault.
Fact #3. No data was lost.
Fact #4. There was some corruption with a small subset of data which was successfully restored from recent backup.


gjm

646 posts

Ultimate Geek
+1 received by user: 54

Subscriber

  Reply # 768426 24-Feb-2013 10:43 Send private message

insane makes some reasonable assumptions ...




[Amstrad CPC 6128: 128k Memory: 3 inch floppy drive: Colour Screen]

2938 posts

Uber Geek
+1 received by user: 153

Trusted
Subscriber

  Reply # 768436 24-Feb-2013 11:10 Send private message

insane: Given it was just the virtual platform which fell over and not the whole DC, and that it happened at 3am I'd imagine a storage firmware update or failure. From experience if power is not an issue, then only a storage issues will take you completely out like that.

Reading some of the comments by others in the media about why IBM didn't have another DC etc is hilarious. If customers were wiling to pay they would have bought a DR service and been covered, but they didn't, they bought services from a single site and guess what, sometimes stuff goes wrong.






*EDIT*

Someone over at computer world wrote this:

Fact #1. It wasn't XIV storage.
Fact #2. It was not related to planned work or changes - it was a fault.
Fact #3. No data was lost.
Fact #4. There was some corruption with a small subset of data which was successfully restored from recent backup.



Agree completely. People should have always put in a complete seperate DR.





787 posts

Ultimate Geek
+1 received by user: 21


  Reply # 768463 24-Feb-2013 12:33 Send private message

You'd be surprised how many clients have little regard for DR, or clients who only want to pay for a SLA and then subsequently add loading beyond what they are willing to pay for

1200 posts

Uber Geek
+1 received by user: 3

Trusted

  Reply # 768492 24-Feb-2013 13:38 Send private message


This was posted over in the comments at Computer World.

Ok, So here are the real facts....
Their IBM XIV storage was due an upgrade, a new cash card was installed that was faulty and it took the entire array down. It then took 3 Days to re-store the array from back-ups. This has come from a customer effected by the outage.

So much for XIV being Tier 1 Storage.... In my opinion, there are only 3 suppliers that produce Enterprise Grade Storage in the IT market: HDS - VSP; EMC - VMAX & IBM - DS800 range... Everything else is midrange with single points of failure.


If this is indeed true, and it fits from my experience (only Storage can take everything offline for nearly 3 days, with complex VLAN Networking problems second.) Then it looks like this was not a well engineered solution, or alternatively  hit bugs in the XIV Storage Array which they needed to get programmers to resolve. 

If the comment from Computer World is true, then I would have expected IBM to have a second XIV Storage Array which they would have been doing snapshots to, and could have failed over to, before they did the hardware upgrade on the primary XIV Storage array, then after testing they could have fallen back to the primary and upgraded the secondary array.

It appears IBM only run 1 storage XIV SAN Array in that case. 

I know of clients in the graphics industry which buy two of these for exactly these reasons, so if IBM didn't have a second one you wonder how cheap they really are.

This is of course all speculation - as I have no insider information, just experience working on a range of SAN's. 




Tyler - Parnell Geek - iPhone 3G - Lenovo X301 - Kaseya - Great Western Steak House, these are some of my favourite things.

gzt

4135 posts

Uber Geek
+1 received by user: 156

Subscriber

  Reply # 768502 24-Feb-2013 14:06 Send private message

Remember that IBM indicated two faults. One which caused the original issue and another which was discovered or which occurred when they planned to resume service. It is possible there are several sets of correct information floating around relating to each fault. Your quote and insane's quote may be consistent. Both were anonymous.

One thing is for sure, the continuing speculation is not healthy for IBM reputation.

A conspiracy theorist might suspect this calculation has been made already and details of the fault and the issue found in resuming service may be more damaging than continuing speculation. ; ).

The other perspective is this issue is hardly anyone's business and we should just do the proverbial.

 1 | 2
View this topic in a long page with up to 500 replies per page Create new topic




Twitter »
Follow us to receive Twitter updates when new discussions are posted in our forums:



Follow us to receive Twitter updates when news items and blogs are posted in our frontpage:



Follow us to receive Twitter updates when tech item prices are listed in our price comparison site:




News »

Trending now »
Hot discussions in our forums right now:

Geekzone giveaway: Thecus N2310 NAS
Created by freitasm, last reply by Dunnersfella on 24-Jul-2014 23:17 (81 replies)
Pages... 4 5 6


Hierarchy of a mistake: Gerry Brownlee
Created by joker97, last reply by wasabi2k on 25-Jul-2014 10:53 (30 replies)
Pages... 2


MH 17 "shot down" all dead
Created by joker97, last reply by ScuL on 24-Jul-2014 21:40 (203 replies)
Pages... 12 13 14


Is chorus going to deliberately slow adsl internet down
Created by rugrat, last reply by sbiddle on 25-Jul-2014 09:39 (40 replies)
Pages... 2 3


Huge Fuss, didn't even make it a year.
Created by networkn, last reply by Glassboy on 22-Jul-2014 19:50 (121 replies)
Pages... 7 8 9


Sickening floral smell in car, What next?
Created by TimA, last reply by KiwiNZ on 25-Jul-2014 10:43 (36 replies)
Pages... 2 3


Skinny's new aggressive ad campaign
Created by Yabanize, last reply by Yabanize on 22-Jul-2014 23:35 (52 replies)
Pages... 2 3 4


Giving notice
Created by IlDuce, last reply by kharris on 24-Jul-2014 17:36 (15 replies)


Geekzone Live »
Try automatic live updates from Geekzone directly in your browser, without refreshing the page, with Geekzone Live now.

Are you subscribed to our RSS feed? You can download the latest headlines and summaries from our stories directly to your computer or smartphone by using a feed reader.

Alternatively, you can receive a daily email with Geekzone updates.