Geekzone: technology news, blogs, forums
Guest
Welcome Guest.
You haven't logged in yet. If you don't have an account you can register now.


Mark

1653 posts

Uber Geek


#114451 20-Feb-2013 11:50
Send private message

Anyone in the know as to what caused IBMs outage yesterday ?

[Moderator edit (MF): Moved to correct forum, changed subject to something meaningful]

View this topic in a long page with up to 500 replies per page Create new topic
 1 | 2
ajobbins
5052 posts

Uber Geek

Trusted

  #766332 20-Feb-2013 11:53
Send private message

Is it fixed yet? I haven't seen an updated article.

I'd like to know this myself.




Twitter: ajobbins




graemeh
2078 posts

Uber Geek


  #766334 20-Feb-2013 11:56
Send private message

According to ComputerWorld it looks like even IBM don't know :)

PIERCD
58 posts

Master Geek


  #766341 20-Feb-2013 12:09
Send private message

Watch out for the Big Blue Curtain of Silence.

Took a while for the reason behind the Newton Data Centre falling over to come out as well.



ajobbins
5052 posts

Uber Geek

Trusted

  #766345 20-Feb-2013 12:15
Send private message

PIERCD: Took a while for the reason behind the Newton Data Centre falling over to come out as well.


The full story about these kind of things never makes the public domain. I'm quite sure there is a good reason Air NZ shut up very quickly after Fyfe's initial outburst.




Twitter: ajobbins


PIERCD
58 posts

Master Geek


  #766356 20-Feb-2013 12:23
Send private message

ajobbins:
PIERCD: Took a while for the reason behind the Newton Data Centre falling over to come out as well.


The full story about these kind of things never makes the public domain. I'm quite sure there is a good reason Air NZ shut up very quickly after Fyfe's initial outburst.


Agreed, but the industry is so small, it won't take long for it to leak amongst those who know people.

Of course whether that information gets posted to a public forum is a different story.  I know my employer would not be happy if I posted about it here (and we didn't even supply any product to that Datacentre).

Mark

1653 posts

Uber Geek


  #766378 20-Feb-2013 12:57
Send private message

PIERCD:

Of course whether that information gets posted to a public forum is a different story.  I know my employer would not be happy if I posted about it here (and we didn't even supply any product to that Datacentre).


Create a disposable account and post it up ;-)

ajobbins
5052 posts

Uber Geek

Trusted

 
 
 

Move to New Zealand's best fibre broadband service (affiliate link). Note that to use Quic Broadband you must be comfortable with configuring your own router.
Mark

1653 posts

Uber Geek


  #766432 20-Feb-2013 13:55
Send private message

Can't wait to see what the root cause turns out to be :-)

gzt

gzt
17110 posts

Uber Geek

Lifetime subscriber

  #766469 20-Feb-2013 14:52
Send private message

Reading the press my feeling is multiple failures not just one cause.

I will hazard a blind guess that logistics also played a part in the long delay.

insane
3239 posts

Uber Geek

ID Verified
Trusted

  #768422 24-Feb-2013 10:29
Send private message

Given it was just the virtual platform which fell over and not the whole DC, and that it happened at 3am I'd imagine a storage firmware update or failure. From experience if power is not an issue, then only a storage issues will take you completely out like that.

Reading some of the comments by others in the media about why IBM didn't have another DC etc is hilarious. If customers were wiling to pay they would have bought a DR service and been covered, but they didn't, they bought services from a single site and guess what, sometimes stuff goes wrong.






*EDIT*

Someone over at computer world wrote this:

Fact #1. It wasn't XIV storage.
Fact #2. It was not related to planned work or changes - it was a fault.
Fact #3. No data was lost.
Fact #4. There was some corruption with a small subset of data which was successfully restored from recent backup.


gjm

gjm
808 posts

Ultimate Geek


  #768426 24-Feb-2013 10:43
Send private message

insane makes some reasonable assumptions ...




Do surveys for Beer money (referral link) - Octopus Group 

 

Link for buying beer (not affiliated, just like beer) - Good George


Zeon
3916 posts

Uber Geek

Trusted

  #768436 24-Feb-2013 11:10
Send private message

insane: Given it was just the virtual platform which fell over and not the whole DC, and that it happened at 3am I'd imagine a storage firmware update or failure. From experience if power is not an issue, then only a storage issues will take you completely out like that.

Reading some of the comments by others in the media about why IBM didn't have another DC etc is hilarious. If customers were wiling to pay they would have bought a DR service and been covered, but they didn't, they bought services from a single site and guess what, sometimes stuff goes wrong.






*EDIT*

Someone over at computer world wrote this:

Fact #1. It wasn't XIV storage.
Fact #2. It was not related to planned work or changes - it was a fault.
Fact #3. No data was lost.
Fact #4. There was some corruption with a small subset of data which was successfully restored from recent backup.



Agree completely. People should have always put in a complete seperate DR.




Speedtest 2019-10-14


khull
1245 posts

Uber Geek


  #768463 24-Feb-2013 12:33
Send private message

You'd be surprised how many clients have little regard for DR, or clients who only want to pay for a SLA and then subsequently add loading beyond what they are willing to pay for

exportgoldman
1202 posts

Uber Geek

Trusted

  #768492 24-Feb-2013 13:38
Send private message


This was posted over in the comments at Computer World.

Ok, So here are the real facts....
Their IBM XIV storage was due an upgrade, a new cash card was installed that was faulty and it took the entire array down. It then took 3 Days to re-store the array from back-ups. This has come from a customer effected by the outage.

So much for XIV being Tier 1 Storage.... In my opinion, there are only 3 suppliers that produce Enterprise Grade Storage in the IT market: HDS - VSP; EMC - VMAX & IBM - DS800 range... Everything else is midrange with single points of failure.


If this is indeed true, and it fits from my experience (only Storage can take everything offline for nearly 3 days, with complex VLAN Networking problems second.) Then it looks like this was not a well engineered solution, or alternatively  hit bugs in the XIV Storage Array which they needed to get programmers to resolve. 

If the comment from Computer World is true, then I would have expected IBM to have a second XIV Storage Array which they would have been doing snapshots to, and could have failed over to, before they did the hardware upgrade on the primary XIV Storage array, then after testing they could have fallen back to the primary and upgraded the secondary array.

It appears IBM only run 1 storage XIV SAN Array in that case. 

I know of clients in the graphics industry which buy two of these for exactly these reasons, so if IBM didn't have a second one you wonder how cheap they really are.

This is of course all speculation - as I have no insider information, just experience working on a range of SAN's. 




Tyler - Parnell Geek - iPhone 3G - Lenovo X301 - Kaseya - Great Western Steak House, these are some of my favourite things.

gzt

gzt
17110 posts

Uber Geek

Lifetime subscriber

  #768502 24-Feb-2013 14:06
Send private message

Remember that IBM indicated two faults. One which caused the original issue and another which was discovered or which occurred when they planned to resume service. It is possible there are several sets of correct information floating around relating to each fault. Your quote and insane's quote may be consistent. Both were anonymous.

One thing is for sure, the continuing speculation is not healthy for IBM reputation.

A conspiracy theorist might suspect this calculation has been made already and details of the fault and the issue found in resuming service may be more damaging than continuing speculation. ; ).

The other perspective is this issue is hardly anyone's business and we should just do the proverbial.

 1 | 2
View this topic in a long page with up to 500 replies per page Create new topic





News and reviews »

Air New Zealand Starts AI adoption with OpenAI
Posted 24-Jul-2025 16:00


eero Pro 7 Review
Posted 23-Jul-2025 12:07


BeeStation Plus Review
Posted 21-Jul-2025 14:21


eero Unveils New Wi-Fi 7 Products in New Zealand
Posted 21-Jul-2025 00:01


WiZ Introduces HDMI Sync Box and other Light Devices
Posted 20-Jul-2025 17:32


RedShield Enhances DDoS and Bot Attack Protection
Posted 20-Jul-2025 17:26


Seagate Ships 30TB Drives
Posted 17-Jul-2025 11:24


Oclean AirPump A10 Water Flosser Review
Posted 13-Jul-2025 11:05


Samsung Galaxy Z Fold7: Raising the Bar for Smartphones
Posted 10-Jul-2025 02:01


Samsung Galaxy Z Flip7 Brings New Edge-To-Edge FlexWindow
Posted 10-Jul-2025 02:01


Epson Launches New AM-C550Z WorkForce Enterprise printer
Posted 9-Jul-2025 18:22


Samsung Releases Smart Monitor M9
Posted 9-Jul-2025 17:46


Nearly Half of Older Kiwis Still Write their Passwords on Paper
Posted 9-Jul-2025 08:42


D-Link 4G+ Cat6 Wi-Fi 6 DWR-933M Mobile Hotspot Review
Posted 1-Jul-2025 11:34


Oppo A5 Series Launches With New Levels of Durability
Posted 30-Jun-2025 10:15









Geekzone Live »

Try automatic live updates from Geekzone directly in your browser, without refreshing the page, with Geekzone Live now.



Are you subscribed to our RSS feed? You can download the latest headlines and summaries from our stories directly to your computer or smartphone by using a feed reader.