Geekzone: technology news, blogs, forums
Guest
Welcome Guest.
You haven't logged in yet. If you don't have an account you can register now.




1546 posts

Uber Geek


#114451 20-Feb-2013 11:50
Send private message

Anyone in the know as to what caused IBMs outage yesterday ?

[Moderator edit (MF): Moved to correct forum, changed subject to something meaningful]

View this topic in a long page with up to 500 replies per page Create new topic
 1 | 2
Awesome
4878 posts

Uber Geek

Trusted
Subscriber

  #766332 20-Feb-2013 11:53
Send private message

Is it fixed yet? I haven't seen an updated article.

I'd like to know this myself.




Twitter: ajobbins


2078 posts

Uber Geek

Subscriber

  #766334 20-Feb-2013 11:56
Send private message

According to ComputerWorld it looks like even IBM don't know :)

 
 
 
 


58 posts

Master Geek


  #766341 20-Feb-2013 12:09
Send private message

Watch out for the Big Blue Curtain of Silence.

Took a while for the reason behind the Newton Data Centre falling over to come out as well.

Awesome
4878 posts

Uber Geek

Trusted
Subscriber

  #766345 20-Feb-2013 12:15
Send private message

PIERCD: Took a while for the reason behind the Newton Data Centre falling over to come out as well.


The full story about these kind of things never makes the public domain. I'm quite sure there is a good reason Air NZ shut up very quickly after Fyfe's initial outburst.




Twitter: ajobbins


58 posts

Master Geek


  #766356 20-Feb-2013 12:23
Send private message

ajobbins:
PIERCD: Took a while for the reason behind the Newton Data Centre falling over to come out as well.


The full story about these kind of things never makes the public domain. I'm quite sure there is a good reason Air NZ shut up very quickly after Fyfe's initial outburst.


Agreed, but the industry is so small, it won't take long for it to leak amongst those who know people.

Of course whether that information gets posted to a public forum is a different story.  I know my employer would not be happy if I posted about it here (and we didn't even supply any product to that Datacentre).



1546 posts

Uber Geek


  #766378 20-Feb-2013 12:57
Send private message

PIERCD:

Of course whether that information gets posted to a public forum is a different story.  I know my employer would not be happy if I posted about it here (and we didn't even supply any product to that Datacentre).


Create a disposable account and post it up ;-)

Awesome
4878 posts

Uber Geek

Trusted
Subscriber

 
 
 
 




1546 posts

Uber Geek


  #766432 20-Feb-2013 13:55
Send private message

Can't wait to see what the root cause turns out to be :-)

gzt

11454 posts

Uber Geek

Lifetime subscriber

  #766469 20-Feb-2013 14:52
Send private message

Reading the press my feeling is multiple failures not just one cause.

I will hazard a blind guess that logistics also played a part in the long delay.

2416 posts

Uber Geek

Trusted
Subscriber

  #768422 24-Feb-2013 10:29
Send private message

Given it was just the virtual platform which fell over and not the whole DC, and that it happened at 3am I'd imagine a storage firmware update or failure. From experience if power is not an issue, then only a storage issues will take you completely out like that.

Reading some of the comments by others in the media about why IBM didn't have another DC etc is hilarious. If customers were wiling to pay they would have bought a DR service and been covered, but they didn't, they bought services from a single site and guess what, sometimes stuff goes wrong.






*EDIT*

Someone over at computer world wrote this:

Fact #1. It wasn't XIV storage.
Fact #2. It was not related to planned work or changes - it was a fault.
Fact #3. No data was lost.
Fact #4. There was some corruption with a small subset of data which was successfully restored from recent backup.


gjm

757 posts

Ultimate Geek


  #768426 24-Feb-2013 10:43
Send private message

insane makes some reasonable assumptions ...




[Amstrad CPC 6128: 128k Memory: 3 inch floppy drive: Colour Screen]

3595 posts

Uber Geek

Trusted

  #768436 24-Feb-2013 11:10
Send private message

insane: Given it was just the virtual platform which fell over and not the whole DC, and that it happened at 3am I'd imagine a storage firmware update or failure. From experience if power is not an issue, then only a storage issues will take you completely out like that.

Reading some of the comments by others in the media about why IBM didn't have another DC etc is hilarious. If customers were wiling to pay they would have bought a DR service and been covered, but they didn't, they bought services from a single site and guess what, sometimes stuff goes wrong.






*EDIT*

Someone over at computer world wrote this:

Fact #1. It wasn't XIV storage.
Fact #2. It was not related to planned work or changes - it was a fault.
Fact #3. No data was lost.
Fact #4. There was some corruption with a small subset of data which was successfully restored from recent backup.



Agree completely. People should have always put in a complete seperate DR.




Speedtest 2019-10-14


1245 posts

Uber Geek


  #768463 24-Feb-2013 12:33
Send private message

You'd be surprised how many clients have little regard for DR, or clients who only want to pay for a SLA and then subsequently add loading beyond what they are willing to pay for

1200 posts

Uber Geek

Trusted

  #768492 24-Feb-2013 13:38
Send private message


This was posted over in the comments at Computer World.

Ok, So here are the real facts....
Their IBM XIV storage was due an upgrade, a new cash card was installed that was faulty and it took the entire array down. It then took 3 Days to re-store the array from back-ups. This has come from a customer effected by the outage.

So much for XIV being Tier 1 Storage.... In my opinion, there are only 3 suppliers that produce Enterprise Grade Storage in the IT market: HDS - VSP; EMC - VMAX & IBM - DS800 range... Everything else is midrange with single points of failure.


If this is indeed true, and it fits from my experience (only Storage can take everything offline for nearly 3 days, with complex VLAN Networking problems second.) Then it looks like this was not a well engineered solution, or alternatively  hit bugs in the XIV Storage Array which they needed to get programmers to resolve. 

If the comment from Computer World is true, then I would have expected IBM to have a second XIV Storage Array which they would have been doing snapshots to, and could have failed over to, before they did the hardware upgrade on the primary XIV Storage array, then after testing they could have fallen back to the primary and upgraded the secondary array.

It appears IBM only run 1 storage XIV SAN Array in that case. 

I know of clients in the graphics industry which buy two of these for exactly these reasons, so if IBM didn't have a second one you wonder how cheap they really are.

This is of course all speculation - as I have no insider information, just experience working on a range of SAN's. 




Tyler - Parnell Geek - iPhone 3G - Lenovo X301 - Kaseya - Great Western Steak House, these are some of my favourite things.

gzt

11454 posts

Uber Geek

Lifetime subscriber

  #768502 24-Feb-2013 14:06
Send private message

Remember that IBM indicated two faults. One which caused the original issue and another which was discovered or which occurred when they planned to resume service. It is possible there are several sets of correct information floating around relating to each fault. Your quote and insane's quote may be consistent. Both were anonymous.

One thing is for sure, the continuing speculation is not healthy for IBM reputation.

A conspiracy theorist might suspect this calculation has been made already and details of the fault and the issue found in resuming service may be more damaging than continuing speculation. ; ).

The other perspective is this issue is hardly anyone's business and we should just do the proverbial.

 1 | 2
View this topic in a long page with up to 500 replies per page Create new topic





Twitter and LinkedIn »



Follow us to receive Twitter updates when new discussions are posted in our forums:



Follow us to receive Twitter updates when news items and blogs are posted in our frontpage:



Follow us to receive Twitter updates when tech item prices are listed in our price comparison site:





News »

Chorus completes the build and commissioning of two new core Ethernet switches
Posted 8-Jul-2020 09:48


National Institute for Health Innovation develops treatment app for gambling
Posted 6-Jul-2020 16:25


Nokia 2.3 to be available in New Zealand
Posted 6-Jul-2020 12:30


Menulog change colours as parent company merges with Dutch food delivery service
Posted 2-Jul-2020 07:53


Techweek2020 goes digital to make it easier for Kiwis to connect and learn
Posted 2-Jul-2020 07:48


Catalyst Cloud launches new Solutions Hub to support their kiwi Partners and Customers
Posted 2-Jul-2020 07:44


Microsoft to help New Zealand job seekers acquire new digital skills needed for the COVID-19 economy
Posted 2-Jul-2020 07:41


Hewlett Packard Enterprise introduces new HPE GreenLake cloud services
Posted 24-Jun-2020 08:07


New cloud data protection services from Hewlett Packard Enterprise
Posted 24-Jun-2020 07:58


Hewlett Packard Enterprise unveils HPE Ezmeral, new software portfolio and brand
Posted 24-Jun-2020 07:10


Apple reveals new developer technologies to foster the next generation of apps
Posted 23-Jun-2020 15:30


Poly introduces solutions for Microsoft Teams Rooms
Posted 23-Jun-2020 15:14


Lenovo launches new ThinkPad P Series mobile workstations
Posted 23-Jun-2020 09:17


Lenovo brings Linux certification to ThinkPad and ThinkStation Workstation portfolio
Posted 23-Jun-2020 08:56


Apple introduces new features for iPhone iOS14 and iPadOS 14
Posted 23-Jun-2020 08:28



Geekzone Live »

Try automatic live updates from Geekzone directly in your browser, without refreshing the page, with Geekzone Live now.


Support Geekzone »

Our community of supporters help make Geekzone possible. Click the button below to join them.

Support Geezone on PressPatron



Are you subscribed to our RSS feed? You can download the latest headlines and summaries from our stories directly to your computer or smartphone by using a feed reader.

Alternatively, you can receive a daily email with Geekzone updates.