Geekzone: technology news, blogs, forums
Guest
Welcome Guest.
You haven't logged in yet. If you don't have an account you can register now.

View this topic in a long page with up to 500 replies per page Create new topic
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13
185 posts

Master Geek
+1 received by user: 17


  Reply # 731097 12-Dec-2012 08:55 Send private message

surfisup1000: 

My opinion is that this is a mistake on their behalf for poor planning -- it is not a routine hardware failure which we all accept can occur from time to time but still should be minimised through redundancy.




from the PR release above it seems they do have redundancy, but that is what is causing the problem :)

When the AKL server crashed due to an upgrade ( yesterdays AM outage) it become out of sync with the CHC server. Then when they tried to restore the redundancy ( so everyone was not trying to use the same server) overload occurred and the CHC server had to be taken offline.


With out the redundancy I would say that there would have been no internet for Telecom users at all yesterday AM :)

edit: fixed grammar and spelling, need coffee :)




19805 posts

Uber Geek
+1 received by user: 1525

Moderator
Trusted
Biddle Corp
Subscriber

  Reply # 731102 12-Dec-2012 08:58 Send private message

surfisup1000:
My opinion is that this is a mistake on their behalf for poor planning -- it is not a routine hardware failure which we all accept can occur from time to time but still should be minimised through redundancy. 
 
 


While you're entitled to an opinion I can assure you that your viewpoint is entirely incorrect.

1569 posts

Uber Geek
+1 received by user: 218


  Reply # 731107 12-Dec-2012 09:05 Send private message

clinty:
surfisup1000: 

My opinion is that this is a mistake on their behalf for poor planning -- it is not a routine hardware failure which we all accept can occur from time to time but still should be minimised through redundancy.




from the PR release above it seems they do have redundancy, but that is what is causing the problem :)

When the AKL server crashed due to any upgrade ( yesterdays AM outage) it become out of sync with the CHC server. Then when they tried to restore the restore the redundancy ( so everyone was not trying to use the same server) overload occurred and the CHC server had to be taken offline.

With out the redundancy I would say that there would have been no internet for Telecom suers at all yesterday AM :)





From what I understand , Telecom have 2 PPPOA authenticating servers and their redundancy plan is that if one goes down the other can cope.   

But, it appears, they need 2 servers for normal operations. Any less, and the system breaks down. 

So, telecom have made a planning error in thinking one server can cope. 

Does that sound about right? 







1569 posts

Uber Geek
+1 received by user: 218


  Reply # 731109 12-Dec-2012 09:06 Send private message

sbiddle:
surfisup1000:
My opinion is that this is a mistake on their behalf for poor planning -- it is not a routine hardware failure which we all accept can occur from time to time but still should be minimised through redundancy. 
 
 


While you're entitled to an opinion I can assure you that your viewpoint is entirely incorrect.


Well, all I know is that my broadband is down again, and that some telecom migration is at fault. 

Are you saying telecom have not made any error here? 

185 posts

Master Geek
+1 received by user: 17


  Reply # 731111 12-Dec-2012 09:11 Send private message

surfisup1000:
clinty:
surfisup1000: 

My opinion is that this is a mistake on their behalf for poor planning -- it is not a routine hardware failure which we all accept can occur from time to time but still should be minimised through redundancy.




from the PR release above it seems they do have redundancy, but that is what is causing the problem :)

When the AKL server crashed due to any upgrade ( yesterdays AM outage) it become out of sync with the CHC server. Then when they tried to restore the restore the redundancy ( so everyone was not trying to use the same server) overload occurred and the CHC server had to be taken offline.

With out the redundancy I would say that there would have been no internet for Telecom suers at all yesterday AM :)





From what I understand , Telecom have 2 PPPOA authenticating servers and their redundancy plan is that if one goes down the other can cope.   

But, it appears, they need 2 servers for normal operations. Any less, and the system breaks down. 

So, telecom have made a planning error in thinking one server can cope. 

Does that sound about right? 





I would say that one server would cope fine as it is did for around 6-7 hours yesterday according to the release. But the usual plan in these scenarios is to restore redundancy as soon as possible  because lightning can strike twice :)

The issue is that when restoring the AKL server it appears to have caused an issue with the CHC server and crashing the system again. 

I would expect that Telecom will now be reviewing how they restore the redundancy between these servers when something happens



54 posts

Master Geek


  Reply # 731116 12-Dec-2012 09:14 Send private message

Yay, I think we are finally going on broadband at last (9.05am) but won't hold my breath....
Harks back to the XT debacles of a few Christmases ago - 3 major outages in 24 hours - we know Telecom considers it a "best efforts" service, but this hardly qualifies as even an "attempt" does it?

185 posts

Master Geek
+1 received by user: 17


  Reply # 731119 12-Dec-2012 09:16 Send private message

Please note this is all from the PR release posted above, I have no actual knowledge of what is happening at Telecom :)

172 posts

Master Geek
+1 received by user: 3


  Reply # 731121 12-Dec-2012 09:17 Send private message

clinty:
surfisup1000:
clinty:
surfisup1000: 

My opinion is that this is a mistake on their behalf for poor planning -- it is not a routine hardware failure which we all accept can occur from time to time but still should be minimised through redundancy.




from the PR release above it seems they do have redundancy, but that is what is causing the problem :)

When the AKL server crashed due to any upgrade ( yesterdays AM outage) it become out of sync with the CHC server. Then when they tried to restore the restore the redundancy ( so everyone was not trying to use the same server) overload occurred and the CHC server had to be taken offline.

With out the redundancy I would say that there would have been no internet for Telecom suers at all yesterday AM :)





From what I understand , Telecom have 2 PPPOA authenticating servers and their redundancy plan is that if one goes down the other can cope.   

But, it appears, they need 2 servers for normal operations. Any less, and the system breaks down. 

So, telecom have made a planning error in thinking one server can cope. 

Does that sound about right? 





I would say that one server would cope fine as it is did for around 6-7 hours yesterday according to the release. But the usual plan in these scenarios is to restore redundancy as soon as possible  because lightning can strike twice :)

The issue is that when restoring the AKL server it appears to have caused an issue with the CHC server and crashing the system again. 

I would expect that Telecom will now be reviewing how they restore the redundancy between these servers when something happens




I have some experience in enterprise network infrastructure and to only have two PPPOA authentication servers for an operation the size of Telecom seems a bit light to me. There are all manner of DR scenarios which can easily be predicted where you can run into problems. That is, if this really is how the infrastructure is architected.

172 posts

Master Geek
+1 received by user: 3


  Reply # 731122 12-Dec-2012 09:18 Send private message

Anyway, thank goodness XT isn't affected and the ability to tether over the iPad and iPhone!

1912 posts

Uber Geek
+1 received by user: 456

Trusted
Spark NZ

  Reply # 731124 12-Dec-2012 09:18 Send private message

GJB21: Yay, I think we are finally going on broadband at last (9.05am) but won't hold my breath....
Harks back to the XT debacles of a few Christmases ago - 3 major outages in 24 hours - we know Telecom considers it a "best efforts" service, but this hardly qualifies as even an "attempt" does it?


I can assure you that the people working on resolving this fault probably haven't worked this hard or under this much pressure in their life. While consumer broadband doesn't carry any SLAs with it, it is important, and when an outage affects this many people at once, EVERYONE involved downs tools on everything else and gives this their complete attention.

It is a complex fault, and people are working very very hard to identify the best way to bring the affected services back up and keep them stable.

To suggest this isn't even an "attempt" is disrespectful to those working on the issue.

Regards
N


185 posts

Master Geek
+1 received by user: 17


  Reply # 731125 12-Dec-2012 09:21 Send private message

JonoNZ:
clinty:
surfisup1000:
clinty:
surfisup1000: 

My opinion is that this is a mistake on their behalf for poor planning -- it is not a routine hardware failure which we all accept can occur from time to time but still should be minimised through redundancy.




from the PR release above it seems they do have redundancy, but that is what is causing the problem :)

When the AKL server crashed due to any upgrade ( yesterdays AM outage) it become out of sync with the CHC server. Then when they tried to restore the restore the redundancy ( so everyone was not trying to use the same server) overload occurred and the CHC server had to be taken offline.

With out the redundancy I would say that there would have been no internet for Telecom suers at all yesterday AM :)





From what I understand , Telecom have 2 PPPOA authenticating servers and their redundancy plan is that if one goes down the other can cope.   

But, it appears, they need 2 servers for normal operations. Any less, and the system breaks down. 

So, telecom have made a planning error in thinking one server can cope. 

Does that sound about right? 





I would say that one server would cope fine as it is did for around 6-7 hours yesterday according to the release. But the usual plan in these scenarios is to restore redundancy as soon as possible  because lightning can strike twice :)

The issue is that when restoring the AKL server it appears to have caused an issue with the CHC server and crashing the system again. 

I would expect that Telecom will now be reviewing how they restore the redundancy between these servers when something happens




I have some experience in enterprise network infrastructure and to only have two PPPOA authentication servers for an operation the size of Telecom seems a bit light to me. There are all manner of DR scenarios which can easily be predicted where you can run into problems. That is, if this really is how the infrastructure is architected.


I agree. I assume the PR release has been dumbed down a bit and the underlying architecture is actually more complex than one server in each city, but for discussion purposes i kept using the same terms :)

1569 posts

Uber Geek
+1 received by user: 218


  Reply # 731127 12-Dec-2012 09:24 Send private message

JonoNZ: 
I have some experience in enterprise network infrastructure and to only have two PPPOA authentication servers for an operation the size of Telecom seems a bit light to me. There are all manner of DR scenarios which can easily be predicted where you can run into problems. That is, if this really is how the infrastructure is architected.


I have no idea :) Telecom have printed a little bit of info about this. 

User 'biddle'  seems to know what has gone wrong. 

Maybe it is just the within the normal range of occasional problems. 



172 posts

Master Geek
+1 received by user: 3


  Reply # 731128 12-Dec-2012 09:26 Send private message

Talkiet:
GJB21: Yay, I think we are finally going on broadband at last (9.05am) but won't hold my breath....
Harks back to the XT debacles of a few Christmases ago - 3 major outages in 24 hours - we know Telecom considers it a "best efforts" service, but this hardly qualifies as even an "attempt" does it?


I can assure you that the people working on resolving this fault probably haven't worked this hard or under this much pressure in their life. While consumer broadband doesn't carry any SLAs with it, it is important, and when an outage affects this many people at once, EVERYONE involved downs tools on everything else and gives this their complete attention.

It is a complex fault, and people are working very very hard to identify the best way to bring the affected services back up and keep them stable.

To suggest this isn't even an "attempt" is disrespectful to those working on the issue.

Regards
N



Consumer broadband "important"? It almost sounds like it isn't all that important with a statement like that. Most of us don't care if it has an SLA or not, we just want it to work most of the time.

Some network monitoring is lacking because during this whole incident the status and page and twitter account were some of the last to know that the service was out. This morning we were assured via the status it was ok but it clearly was not.


We understand the pressure, many of are under pressure in IT everyday. Let's hope you get it resolved quickly or you'll be facing an unenviable PR disaster.

1912 posts

Uber Geek
+1 received by user: 456

Trusted
Spark NZ

  Reply # 731131 12-Dec-2012 09:34 Send private message

Just to clarify for all - the status updates on our website bear no relation to the operational staff that picked up and started working on this issue minutes after it (re)occurred. The service was under CONSTANT monitoring last night after it was initially restored.

I agree that the status updates were slack and have already asked the product manager to include this issue in the Post Issue Review process.

Cheers - N

233 posts

Master Geek


  Reply # 731132 12-Dec-2012 09:36 Send private message

Talkiet not at all questioning the dedication the staff have to getting this fixed. A huge thanks to them for it :)




Zeb A.
Personal site: http://ixari.net
Twitter: @asgard

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13
View this topic in a long page with up to 500 replies per page Create new topic




Twitter »
Follow us to receive Twitter updates when new discussions are posted in our forums:



Follow us to receive Twitter updates when news items and blogs are posted in our frontpage:



Follow us to receive Twitter updates when tech item prices are listed in our price comparison site:





Trending now »

Hot discussions in our forums right now:

Speed limit when overtaking? Teach me please.
Created by nakedmolerat, last reply by Hobchild on 26-Oct-2014 00:11 (92 replies)
Pages... 5 6 7


House Auctions
Created by t0ny, last reply by mentalinc on 26-Oct-2014 08:58 (48 replies)
Pages... 2 3 4


VDSL, which router/modem sub $200?
Created by TeaLeaf, last reply by NonprayingMantis on 25-Oct-2014 19:48 (28 replies)
Pages... 2


Neon - Sky's new streaming service
Created by JarrodM, last reply by JimmyH on 25-Oct-2014 17:37 (29 replies)
Pages... 2


5Ghz AP recommendations?
Created by ubergeeknz, last reply by sbiddle on 24-Oct-2014 12:42 (12 replies)

Snap have failed our company!
Created by dafman, last reply by kornflake on 23-Oct-2014 17:41 (37 replies)
Pages... 2 3


Thief taunts 12 year old via stolen laptop
Created by macuser, last reply by charsleysa on 22-Oct-2014 23:49 (12 replies)

iPad Air 2 and iPad Mini 3. Gonna get one?
Created by Dingbatt, last reply by dickytim on 26-Oct-2014 12:32 (116 replies)
Pages... 6 7 8



Geekzone Live »

Try automatic live updates from Geekzone directly in your browser, without refreshing the page, with Geekzone Live now.

Are you subscribed to our RSS feed? You can download the latest headlines and summaries from our stories directly to your computer or smartphone by using a feed reader.

Alternatively, you can receive a daily email with Geekzone updates.