Geekzone: technology news, blogs, forums
Guest
Welcome Guest.
You haven't logged in yet. If you don't have an account you can register now.


140 posts

Master Geek
+1 received by user: 20


Topic # 240932 3-Oct-2018 10:46
Send private message quote this post

We have a client who has reported an internet issue in Greymouth - I've logged in remotely (I'm in Christchurch) and confirm they can get to some IP's and not others. Stuff works, NZ Herald does not, Bing works, Google does not. Can't get responses pinging 8.8.8.8.

 

They CAN RDP to My server, the cannot RDP to the server of their head office who is also on Vodafone Fibre so its very strange. I was initially blaming some sort of network issue within Vodafone. I spoke to the help desk who were very helpful up until the point of the client not using a Vodafone router. I had to get the Vodafone router plugged back in and confirm the fault still existed to get any further. 

 

This is a common issue and very frustrating. On the one hand I can understand it but at the same time we know the routers we are using and trust them and it would be an odd router issue (Beyond stupid miss-configuration).

 

We rang a local tech in Greymouth to get him to go onsite and plug in the new in the box Vodafone HG659 and he reported he has a few businesses in Greymouth experiencing the same issue - One client can get to Trademe but their phones can't connect to the VOIP provider. Apparently the Spark shop in Greymouth has the same issue.

So now the problem is murkier - Turns out Chorus had a fault affecting Fibre in Greymouth that was resolved at 1:50am this morning - Its looking like its not resolved but we can't contact Chorus as we have to go through our retailer. Vodafone apparently have nothing that lets them look for faults in their call centre geographically - Why do providers not have any AI in their systems so alert them that they have 10 customers on the phone to the call centre faults department who are all in the same localised geographic area?? The third Vodafone tech agreed on the information I provided to raise a ticket at least so hopefully that'll go up and then the next level tech team might notice a pattern, likewise having the Spark store affected might get them special notice from Sparks support (Assuming its a real spark store not a reseller) but its pretty frustrating when its like this and its very hard to actually get the fault escalated. For all I know the right people are looking at it but Spark list no faults, Vodafone list no faults and neither do Chorus. (Thats another bugbear, check the status sites first, ring and sit on hold only to be told they are aware of the fault - On the plus side all three of my calls to Vodafone today were answered pretty promptly)


View this topic in a long page with up to 500 replies per page Create new topic
 1 | 2
27137 posts

Uber Geek
+1 received by user: 6578

Moderator
Trusted
Biddle Corp
Lifetime subscriber

  Reply # 2100574 3-Oct-2018 11:04
3 people support this post
Send private message quote this post

I'm not really sure how any of those events could really be related.

 

A Chorus connection is full layer 2 back to the RSP handover - so any such issues which seem like routing can't be localised.

 

 




140 posts

Master Geek
+1 received by user: 20


  Reply # 2100576 3-Oct-2018 11:05
Send private message quote this post

Just checked again and Chorus now have it back as a fault - Apparently it was reported at 9:34am even though it wasn't back on the map until after 10:30am - Would have made my life easier to be able to refer my client to the fault and for me to know it was heading toward a resolution.




140 posts

Master Geek
+1 received by user: 20


  Reply # 2100581 3-Oct-2018 11:10
Send private message quote this post

mobiusnz:

 

Just checked again and Chorus now have it back as a fault - Apparently it was reported at 9:34am even though it wasn't back on the map until after 10:30am - Would have made my life easier to be able to refer my client to the fault and for me to know it was heading toward a resolution.

 

 

I'm with you there. I couldn't see how a Chorus fault could effect them in the way it is. I thought if they were getting to their Providers network then from then on all traffic was controlled by the retailer but the pattern is there - Multiple people on multiple providers all reporting similar things, some sites contactable, some not. Not Chorus has the same fault (Generic "You may be experiencing issues with your fibre broadband") back on the fault map affecting the Greymouth Chorus Fibre network.

 

I don't understand enough about the relationship between Chorus (Or in my case here Enable) and the provider and how things are routed etc but it does appear somehow there is an issue - It actually reeked of MTU size issues to me, I've had it before with miss-configured MTU meant I had a client who could get to most sites but couldn't get to the companies office and a handful of other NZ hosted sites. It was a horrible cheap Edimax ADSL model many years back that set the wrong MTU by default.


'That VDSL Cat'
8892 posts

Uber Geek
+1 received by user: 1949

Trusted
Spark
Subscriber

  Reply # 2100625 3-Oct-2018 11:50
Send private message quote this post

mobiusnz:

 

Why do providers not have any AI in their systems so alert them that they have 10 customers on the phone to the call centre faults department who are all in the same localised geographic area?? The third Vodafone tech agreed on the information I provided to raise a ticket at least so hopefully that'll go up and then the next level tech team might notice a pattern, likewise having the Spark store affected might get them special notice from Sparks support (Assuming its a real spark store not a reseller)

 

 

There are some providers with such "AI"

 

the most obvious way spark use it, is assuming you have connection promise setup in myspark, you will automatically get data loaded on mobiles per your preferences.

 

 

 

If a spark store is affected by an outage, they too go through the same process as you, call into a service desk and report the fault. It's then logged through with the relevant access provider and investigated..

 

If this is what it sounds like and a backhaul related issue, maybe a carrier fibre has been cut? - i have not looked into the details.





#include <std_disclaimer>

 

Any comments made are personal opinion and do not reflect directly on the position my current or past employers may have.




140 posts

Master Geek
+1 received by user: 20


  Reply # 2100756 3-Oct-2018 13:02
Send private message quote this post

hio77:

 

mobiusnz:

 

Why do providers not have any AI in their systems so alert them that they have 10 customers on the phone to the call centre faults department who are all in the same localised geographic area?? The third Vodafone tech agreed on the information I provided to raise a ticket at least so hopefully that'll go up and then the next level tech team might notice a pattern, likewise having the Spark store affected might get them special notice from Sparks support (Assuming its a real spark store not a reseller)

 

 

There are some providers with such "AI"

 

the most obvious way spark use it, is assuming you have connection promise setup in myspark, you will automatically get data loaded on mobiles per your preferences.

 

 

 

If a spark store is affected by an outage, they too go through the same process as you, call into a service desk and report the fault. It's then logged through with the relevant access provider and investigated..

 

If this is what it sounds like and a backhaul related issue, maybe a carrier fibre has been cut? - i have not looked into the details.

 

 

Its interesting you mention Spark - I'm going back to the Telecom days but my phone and broadband went down (around 6 years ago). I had no dial tone and no DSL sync, just horrible hiss. I rang and a fault was logged, eventually I heard back and it turned out that an underground cable was damaged but over road vibrations and let water into the cable that was good old fashioned paper wrapped copper so all the copper was wet and shorting. It affected 300 customers but took a bunch of them ringing to raise the fault and get someone out to dig up the road, dry it out with a hot air gun and wrap it all with tape to "fix" it. You would think a pattern of:

We have a small geographic region where none of the customers have DSL sync and none of them have made or received a single phone call.

Would be enough to raise an alert for investigation automatically. Maybe today it would? Something like DSL Sync though you'd think would be a real easy one, the odd person powering off their router isn't at all unusual - 200 people who are all neighbours doing it at the same time should trigger an alert.


'That VDSL Cat'
8892 posts

Uber Geek
+1 received by user: 1949

Trusted
Spark
Subscriber

  Reply # 2100775 3-Oct-2018 13:37
Send private message quote this post

mobiusnz:

 

Its interesting you mention Spark - I'm going back to the Telecom days but my phone and broadband went down (around 6 years ago). I had no dial tone and no DSL sync, just horrible hiss. I rang and a fault was logged, eventually I heard back and it turned out that an underground cable was damaged but over road vibrations and let water into the cable that was good old fashioned paper wrapped copper so all the copper was wet and shorting. It affected 300 customers but took a bunch of them ringing to raise the fault and get someone out to dig up the road, dry it out with a hot air gun and wrap it all with tape to "fix" it. You would think a pattern of:

 

to be clear, 6 years ago we had not built that system, nor was it when i started at spark; This bothered me immensely as i was able to quite aggressively use tool sets to prove out outages.

 

I'm glad to say it's a heck of alot smarter.. Not to say it's perfect, you need volume to be accurate etc there will always be exceptions, improvements that can be done.

 

 

 

Trend analysis is a big thing that's done, but it ultimately is chorus that need to raise it as an area fault. until then we as an RSP can only suspect it's a chorus issue and raise it as such (Also check our end, has a BNG gone down? a single card? a handover? Look around for a few examples of these lately elsewhere..).

 

Network teams need time to confirm things, and unfortunately that can also take time.

 

 

 

I do feel chorus have also gotten faster, I suspect they have put some more smarts in place there.

 

It's worth noting though, i haven't been "front line" for a good year or so now. So I'm not in the muck of it, seeing what's coming through first hand; i pay close attention to the resi/small business faults though.

 

 





#include <std_disclaimer>

 

Any comments made are personal opinion and do not reflect directly on the position my current or past employers may have.




140 posts

Master Geek
+1 received by user: 20


  Reply # 2100822 3-Oct-2018 14:41
Send private message quote this post

hio77:

 

 

 

to be clear, 6 years ago we had not built that system, nor was it when i started at spark; This bothered me immensely as i was able to quite aggressively use tool sets to prove out outages.

 

I'm glad to say it's a heck of alot smarter.. Not to say it's perfect, you need volume to be accurate etc there will always be exceptions, improvements that can be done.

 

 

 

Trend analysis is a big thing that's done, but it ultimately is chorus that need to raise it as an area fault. until then we as an RSP can only suspect it's a chorus issue and raise it as such (Also check our end, has a BNG gone down? a single card? a handover? Look around for a few examples of these lately elsewhere..).

 

Network teams need time to confirm things, and unfortunately that can also take time.

 

 

 

I do feel chorus have also gotten faster, I suspect they have put some more smarts in place there.

 

It's worth noting though, i haven't been "front line" for a good year or so now. So I'm not in the muck of it, seeing what's coming through first hand; i pay close attention to the resi/small business faults though.

 

 

 

 

Yeah - Hopefully Chorus are smarter. And due to the weirdness of this fault where users were able to get to PPP auth and even traffic with mixed results it would have been a hard one for Chorus (or anyone) to detect - Thats where I think the next step for RSP's to have the call centre staff enter a customer with a location etc (hopefully pulled through from the account) into a fault database immediately regardless of the issue so even if the database shows 15 users from one location are currently talking to a help desk operator after selecting faults/tech support that might be enough to start a deeper dig rather then each of those 15 users battling their way through the first tier of helpdesk operator to demonstrate they actually have an issue and yes they have turned their router off and on etc. I still think investment in that kind of tech would ultimately benefit both sides of the equation as it would lower call centre costs AND make for happier customers.

The problem with Chorus is they take too much of an attitude of distancing themselves from the end user - Fair enough as most end users would otherwise ring them when the problem is with their computer etc but it also means people like me can't ring them. I rang their number to try and discuss the fact that it looked like the fault they had fixed wasn't fixed but the first message on their phone system is "If you are experiencing problems with your phone or internet connection hang up and call your provider" so I knew I would get know where talking to the receptionist. Sure enough an hour later they worked out (or at least publicly admitted) that they did infact still have a fault in Greymouth. I now have to bill my client for fluffing around chasing a problem which they shouldn't really have to pay for.

I do think that the forced split of Chorus and Telecom has caused more issues than it solved - Before if you were a telecom customer you rang them and they resolved it - Now they have to work our where the fault lies and escalate up the chain to their provider and are at the whim of their SLA agreements etc.

Oh well - Gripe over - I'm not really looking to point fingers at any organisation in particular - Just trying to demonstrate some of the weaknesses that lie in the state things are in here where an end user fault has to go to the Retailer, who then has to lodge it with the Wholesale provider, who has to log it for their Contractor to look at which often means a client whose only issue is the DC Brick on their ONT dies has to wait 5 days for an "engineer" to come out and swap it so they have phone and internet back (This happened to a friend, I saw their facebook gripe and went out and loaned them a brick - That then caused Spark/Enable to see their PPP was up and they cancelled the job)

 

 


'That VDSL Cat'
8892 posts

Uber Geek
+1 received by user: 1949

Trusted
Spark
Subscriber

  Reply # 2100862 3-Oct-2018 15:39
Send private message quote this post

yep, it's all about awareness.

 

 

 

i've had my hand in-many projects around spark where we been able to push things on in to get that gold, everyone wants to do it smarter. (externally too I'm sure.)

 

i'd like to see other providers do more smarter things too, it's like a challenge, lets one up them ;) overall outcome comes down to...  better experiences for everyone.

 

 

 

As for power bricks failing, there is a process for chorus on this it can easily be done by any provider.

 

Chorus ship out the power brick and that's that. Sounds to me like the suspicion was an ONT failure thus the tech.

 

 

 

I can't speak for cancellation of any faults.





#include <std_disclaimer>

 

Any comments made are personal opinion and do not reflect directly on the position my current or past employers may have.




140 posts

Master Geek
+1 received by user: 20


  Reply # 2100893 3-Oct-2018 17:01
Send private message quote this post

hio77:

 

yep, it's all about awareness.

 

 

 

i've had my hand in-many projects around spark where we been able to push things on in to get that gold, everyone wants to do it smarter. (externally too I'm sure.)

 

i'd like to see other providers do more smarter things too, it's like a challenge, lets one up them ;) overall outcome comes down to...  better experiences for everyone.

 

 

 

As for power bricks failing, there is a process for chorus on this it can easily be done by any provider.

 

Chorus ship out the power brick and that's that. Sounds to me like the suspicion was an ONT failure thus the tech.

 

 

 

I can't speak for cancellation of any faults.

 

 

Actually my other thought on my roll in supporting customer internet connections was actually to Have Spark, Vodafone etc have a certification program - Even if I had to pay to sit it. Once I'm certified I have a helpdesk number I ring, enter my Certification ID number and it gets me right up to the level 2 support team so I don't have to spend 20 minutes arguing with the customer facing initial help desk to get to someone with a brain - The worst case ever (And it is in the Jetstream branding days) was me ringing the helpdesk, explaining I'd tried another router and another login (As in those days you did login with your own username password and you could use them on other sites) and after my long description on what I'd already tried before ringing her first response was "Look at the Nokia modem, is the power light on" - Facepalm. In the end while arguing with her and she spoke the address out load someone in the same room realised they were talking to another client in the same building - Turned out one of the DSLAMS had died.


'That VDSL Cat'
8892 posts

Uber Geek
+1 received by user: 1949

Trusted
Spark
Subscriber

  Reply # 2100898 3-Oct-2018 17:23
Send private message quote this post

mobiusnz:

 

Actually my other thought on my roll in supporting customer internet connections was actually to Have Spark, Vodafone etc have a certification program - Even if I had to pay to sit it. Once I'm certified I have a helpdesk number I ring, enter my Certification ID number and it gets me right up to the level 2 support team so I don't have to spend 20 minutes arguing with the customer facing initial help desk to get to someone with a brain - The worst case ever (And it is in the Jetstream branding days) was me ringing the helpdesk, explaining I'd tried another router and another login (As in those days you did login with your own username password and you could use them on other sites) and after my long description on what I'd already tried before ringing her first response was "Look at the Nokia modem, is the power light on" - Facepalm. In the end while arguing with her and she spoke the address out load someone in the same room realised they were talking to another client in the same building - Turned out one of the DSLAMS had died.

 

 

Love the idea. Doubt it would happen outside of the SME space though.

 

 

 

It's very much like if someone calls up and says I'm IT support.

 

That line always makes me laugh quietly... 9/10 they aren't "IT" they simply know how to do a few things and that's about it.

 

 

 

 

 

How do you create a test to avoid that?

 

Can't be like CCNA, that can be cheated on easily (and often is...)





#include <std_disclaimer>

 

Any comments made are personal opinion and do not reflect directly on the position my current or past employers may have.


3167 posts

Uber Geek
+1 received by user: 1220

Subscriber

  Reply # 2100913 3-Oct-2018 18:14
One person supports this post
Send private message quote this post

Did anyone find out exactly what the fault was? As I'm still trying to figure out how a layer 1 or 2 fault in the Chorus network could cause the above symptoms.






5246 posts

Uber Geek
+1 received by user: 1132

Trusted
Subscriber

  Reply # 2100983 3-Oct-2018 19:00
One person supports this post
Send private message quote this post

mobiusnz:

 

We have a client who has reported an internet issue in Greymouth - I've logged in remotely (I'm in Christchurch) and confirm they can get to some IP's and not others. Stuff works, NZ Herald does not, Bing works, Google does not. Can't get responses pinging 8.8.8.8.

 

They CAN RDP to My server, the cannot RDP to the server of their head office who is also on Vodafone Fibre so its very strange. I was initially blaming some sort of network issue within Vodafone. I spoke to the help desk who were very helpful up until the point of the client not using a Vodafone router. I had to get the Vodafone router plugged back in and confirm the fault still existed to get any further. 

 

This is a common issue and very frustrating. On the one hand I can understand it but at the same time we know the routers we are using and trust them and it would be an odd router issue (Beyond stupid miss-configuration).

 

We rang a local tech in Greymouth to get him to go onsite and plug in the new in the box Vodafone HG659 and he reported he has a few businesses in Greymouth experiencing the same issue - One client can get to Trademe but their phones can't connect to the VOIP provider. Apparently the Spark shop in Greymouth has the same issue.

So now the problem is murkier - Turns out Chorus had a fault affecting Fibre in Greymouth that was resolved at 1:50am this morning - Its looking like its not resolved but we can't contact Chorus as we have to go through our retailer. Vodafone apparently have nothing that lets them look for faults in their call centre geographically - Why do providers not have any AI in their systems so alert them that they have 10 customers on the phone to the call centre faults department who are all in the same localised geographic area?? The third Vodafone tech agreed on the information I provided to raise a ticket at least so hopefully that'll go up and then the next level tech team might notice a pattern, likewise having the Spark store affected might get them special notice from Sparks support (Assuming its a real spark store not a reseller) but its pretty frustrating when its like this and its very hard to actually get the fault escalated. For all I know the right people are looking at it but Spark list no faults, Vodafone list no faults and neither do Chorus. (Thats another bugbear, check the status sites first, ring and sit on hold only to be told they are aware of the fault - On the plus side all three of my calls to Vodafone today were answered pretty promptly)

 

 

It's the usual sub-contractor model.

A recipe for failure.

This is why I went to Spark for everything. They are the only company with anything close to an end-to-end view of the infrastructure.

The market model in many segments is great for everyone.....except the customer.

One day we'll shed this failed ideology and its rubbish business models.....but we're not quite there yet. In the meantime, I've done all I can to reduce the useless bureaucracy imposed by un-integrated problem resolution processes, foreign help desks working off scripts they don't undertand.....and whose job it is - really - to frustrate users and discourage calls from all but the biggest customers (who get a special number to ring anyway).





____________________________________________________
I'm on a high fibre diet. 

 

High fibre diet


90 posts

Master Geek
+1 received by user: 1


  Reply # 2101022 3-Oct-2018 20:20
Send private message quote this post

 

he reported he has a few businesses in Greymouth experiencing the same issue - One client can get to Trademe but their phones can't connect to the VOIP provider.

 

I encountered this with one of our clients in Greymouth today.
Their ISP is a local wireless operator who intercepts traffic destined for 2talk and NAT's it out a seperate 2talk UFB connection - the 2talk connection was out.

 

I did see UFB connections for other Greymouth clients bounce overnight but no reported issues this morning.




140 posts

Master Geek
+1 received by user: 20


  Reply # 2101040 3-Oct-2018 21:01
Send private message quote this post

The fault my client is experiencing isn't resolved yet and the listed fault on Chorus still says they have a fault and the expected resolution is 9:34am tomorrow (24 hours after the fault was apparently reported so a default 24 hour period I'm picking).

 

There is NOTHING to state what the issue actually is.

 

For all I know at this point the "fault" has nothing to do with what my client is still experiencing. Before having someone in Greymouth report other sites with related issues and discovering there was a prior fault registered with Chorus and later re-loaded as existing I was definately suspecting a routing issue within Vodafone.

I've had issues before with Vodafone clients with a static IP unable to reach certain isolated online services (Weird stuff, a client with a Smart Treadmill that failed to download online programs from the provider until they were taken off that IP pool within Vodafone). 

I actually requested to have the static IP address removed as there is little I can't work around with a dynamic IP - The tech reported they have done that for me. 11 hours later and multiple router reboots and they still have the same IP address - Typical experience for me with Vodafone, request a static IP - Get told its done - It doesn't happen. Ask for a DNS PTR Record, told its done - Doesn't happen - Told to wait 24 hours for caching even though I've explained I've used a service that ignores TTL's - Waited - Still nothing, told "Oh wait, they can see why it didn't work - Theyve fixed it now, wait 24 hours" - Still doesn't work.

For me its a roll reversal - In the Telecom days I always Jokes their motto was "Right third time" - Now I've found Spark tend to do it right first time (On the run of the mill request stuff) and Vodafone can be like pulling teeth.

A while back I spent a couple of hours calling and going onsite to test things for a Vodafone tech so he could raise a ticket on a Fibre outage in Christchurch, when he finally accepted I'd tried everything I could he escalated it to Enable. Rang me back less then 20 minutes later to say Enable had a fault effecting that neighborhood FML.

 

 




140 posts

Master Geek
+1 received by user: 20


  Reply # 2101381 4-Oct-2018 10:45
Send private message quote this post

Well - The problem was resolved by Chorus overnight. My client is working perfectly again this morning. The word I have from the local tech in Greymouth that knew a few clients affected was it only Affected clients on a Static IP address - So I'm not sure how the relationship for IP allocation works between Chorus and the RSP. Frustrating considering I suspected the static IP might relate (more as a Vodafone issue) so requested removing it which I was told was done but never happened. Might have had them functional a little sooner.

 

I'd love to know what the resolution was that it took Chorus around 18 hours to fix (From the initial fault overnight they thought they'd fixed through to eventual resolution) - Its also incredibly frustrating that Vodafone weren't aware of the issue so ringing them was just an exercise in futility banging my head on the wall. If Chorus had notified RSP's of the issue with technical information then at least Vodafone could have offered to remove static IP for the duration of the fault.


 1 | 2
View this topic in a long page with up to 500 replies per page Create new topic

Twitter »

Follow us to receive Twitter updates when new discussions are posted in our forums:



Follow us to receive Twitter updates when news items and blogs are posted in our frontpage:



Follow us to receive Twitter updates when tech item prices are listed in our price comparison site:



Geekzone Live »

Try automatic live updates from Geekzone directly in your browser, without refreshing the page, with Geekzone Live now.



Are you subscribed to our RSS feed? You can download the latest headlines and summaries from our stories directly to your computer or smartphone by using a feed reader.

Alternatively, you can receive a daily email with Geekzone updates.