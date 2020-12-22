Geekzone: technology news, blogs, forums
Forums2degrees (including Slingshot, Orcon, Flip, Stuff Fibre, 2talk and Vocus)PPPoE disconnects across 4 LB WAN connections to VDSL lines (2Degrees) - stumped (long)
hugovg

3 posts

Wannabe Geek


#280542 22-Dec-2020 06:27
Send private message

For about a year we have been accessing our internet via PPPoE load balancing across 4 2Degrees VDSL connections using an UBNT ER-Pro.

 

Recently, we have been experiencing short (5-10 minute) outages where the PPPoE sessions on all connections are dropped, simultaneously, then resumed. Their timing is mostly (but not always) at popular times for general internet consumption (eg. after 6PM, 8-10PM etc).

 

2D (and Chorus) state the VDSL lines are all fine and the config is 100% valid. The only technical snippet of info we have is that a 2nd MAC address appears on a VDSL connection, which results in the disconnection, with the stated cause being another modem is being connected across the same connection/line.

 

This statement has resulted in many questions from us, all unanswered. Currently we are told our configuration is too complex for Business Support and they will not support us and allow us to liaise with anyone more technical despite repeated requests (now pleads). We are on our own & in the dark. Our configuration is a bit different for sure, and it's possible we are introducing "complications" that result in unforeseen issues

 

Our config is as follows:

 

  • 4 VDSL physical connections & 5 associated ISP accounts (long story!), each with a static IP assigned. Fiber not yet available at our location.
  • there are 4 Fritz!Box 7940 modems/routers configured to allow PPPoE pass-through, but otherwise default (Connection to a DSL line, NAT etc.)
  • each Fritz!Box has its own Lan subnet defined, and connects to its own WAN eth port on the ER-Pro
  • 4 PPPoE interfaces are defined on the 4 WAN eth interfaces, and have their own login account credentials
  • load balancing (LB) is configured on the 4 PPPoE WAN interfaces
  • Note: 3 of the 4 FB7940 have intentionally incorrect login details (from memory the FB7940 wouldn't allow empty Uname/PW fields) so in effect just pass through PPPoE from the ER-Pro, but 1 FB7940 does have a 5th 2D account login to support Fritz!Fone VOIP phone services. 
  • Each VDSL connection supports approx 66Mb/s download, and with speedtest.net running in multi-session mode, we normally can achieve approx 220Mb/s down.
  • Not to say it's been perfect, but on the whole, this has allowed us to share VDSL-only speeds with 100's of clients (mostly on a Unifi guest network)

Facts & thoughts:

 

  • We have made no wiring changes so a 2nd modem appearing across a VDSL line at our end seems impossible, but a cable fault in the ground may be, but why drop all our connections if (say) 2 copper pairs short. Besides, there is never any DSL entries in logs.
  • Perhaps it is the 2nd PPPoE connection through 1 of the Fritz!Boxes that is the root cause? The 2D has not managed to state that this is or isn't a permissible config, but we are willing to accept that it might not be? So we disabled PPPoE pass-through on the Fritz!Box that has it's own login credentials (to support voice services), and although things were fine for 1 day, the issue has come back. Our rapid conclusion is the 2nd PPPoE connection through 1 Fritz!Box is not the cause.
  • There are no PPPoE or other errors in the ER messages log file, simply that LB reports reachability failures.
  • Ditto for the 1 Fritz!Box that does login, its PPPoE connection is simply dropped at exactly the same time.
  • We have no reason to suspect power, all networking gear is on a UPS & there are no other issues at the drop out times.
  • FRITZ!OS: the device that has its own 2D login 07.21 while others are on 07.11
  • No IPv6 addressing is used anywhere in our network just IPv4, but we note the Fritz!Box system event log does mention IPv6 when the connection drops (is it transporting IPv4 over IPv6?)

Possible causes?

 

  • The ER-Pro (v1.10.11 firmware) is doing something to cause the ISP's end to drop all our connections at the same time, eg. a Martian packet?
  • The ER-Pro is faulty, we have a spare ER8 to test this, but we think unlikely
  • Our LB config across the 4 PPPoE connections is wrong, but why that would cause all 5 PPPoE sessions (including the one in the FB7940 itself) to drop is beyond me.
  • We can drop the ER's PPPoE config & revert to a simple Fritz!Box config of NAT, & LB the WAN eth interfaces instead, which will result in double-NAT, but at least we can tell 2D Support the Fritz!Boxes are in a default config.
  • We have mixed up the physical VDSL lines and the 2D logins that are used across them - this is possible but we are not aware of what the restrictions are and if TRS-069 could be affecting things (eg. is it tied to 2D login/account, or Fritz!Box MAC address, or physical VDSL circuit number)?
  • That there is a config error on the 2D/Chorus side, which I think is still possible, as something is happening at times of the day when their network is busy,
  • Other?
  • We can do more diagnostics with effort (eg. packet capture), but really need guidance to avoid diving down rabbit holes trying to understand how PPPOoE works at a low level.

Thank you for getting this far - any thoughts great appreciated - we have our peak time upon us (Christmas & New Years) and depend 100.00000% on reliable internet services to support us & our guests, all options considered!

 

Also posted to UBNT/ui.com community as it may be an ER issue.

 

----

 

Follows is the system event log entries (on the Fritz!Box that does login) around the latest outage: 

 

  • 22.12.20 05:12:53 MyFRITZ! error: Error during DNS resolution of the MyFRITZ! name
  • 22.12.20 05:12:50 IPv6 prefix obtained successfully. New prefix: y:y:y:y:y::/56
  • 22.12.20 05:12:50 IPv6 internet connection established successfully. IP address: y:y:y:y:y
  • 22.12.20 05:12:45 Internet connection established successfully. IP address: x.x.x.x, DNS server: x.x.x.x and x.x.x.x, gateway: x.x.x.x, broadband PoP: SNAP-62
  • 22.12.20 05:03:19 PPPoE error: Timeout. [4 messages since 22.12.20 05:02:38]
  • 22.12.20 05:02:24 Timeout during PPP negotiation. 22.12.20 05:02:24 Internet connection cleared.
  • 22.12.20 05:02:24 IPv6 internet connection was cleared; prefix no longer valid.
  •  

View this topic in a long page with up to 500 replies per page Create new topic
 1 | 2
c0ld
206 posts

Master Geek


  #2625216 22-Dec-2020 08:27
Send private message

If you want to quickly rule out IPv6 causing any issues you can easily disable in the Fritzbox by going to Internet > Account Information > IPv6 > IPv6 Support > Uncheck 'IPv6 support enabled'. 

michaelmurfy
/dev/ttys0
10983 posts

Uber Geek

Moderator
ID Verified
Trusted
Lifetime subscriber

  #2625282 22-Dec-2020 09:25
Send private message

1) You should be using Draytek DV130's for this job.
2) You're running the old legacy firmware on the Edgerouter - any particular reason for not using the latest?
3) Any reason also for doing double-NAT?

 

Your setup has multiple problems and you'll be hard-pressed to find any support here.




Michael Murphy | https://murfy.nz | https://keybase.io/michaelmurfy - Referral Links: Sharesies | Electric Kiwi
Are you happy with what you get from Geekzone? Please consider supporting us by making a donation.

danfaulknor
789 posts

Ultimate Geek

Trusted
Prodigi

  #2625363 22-Dec-2020 10:19
Send private message

You are probably pushing it with this configuration and a "normal" ISP such as 2degrees.

 

If you had the budget available, you may want to find a provider who can rule out the load balancing for you. It would be fairly trivial to provision 4 VDSL connections and bond them (with equipment at both ends), rather than using connection-based load balancing which does have issues. Then you could also get single connection speeds higher than one VDSL connection can provide. We have done it in the past, though just with 2 but 4 is no more complex. You would have to move your VoIP as well though, as I assume that is with 2degrees and that 5th ISP connection. You can use the FritzFone via the LAN port or an ethernet WAN if you are tied to that system for some reason.

 

As Michael mentions as well, DV130s would be much better suited and can do raw PPPoE pass through without all the fluff that you're getting stuck with, with the Fritzboxes.




they/them

 

Prodigi - Optimised IT Solutions
WebOps/DevOps, Managed IT, Hosting and Internet/WAN.



ArcticSilver
714 posts

Ultimate Geek


  #2625408 22-Dec-2020 11:36
Send private message

Start simple. Isolate the issue first.

 

Take one VDSL connection out of the group that is load balanced and set it up like a standard VDSL2 connection with just the Fritzbox and monitor the connection to see if it drops at the same time.

 

If it drops then you should follow up with 2 Degrees regarding this issue specifically (leave out the complexity of the load balanced connections).

 

If it doesn't drop, put it back to how you originally had it and check that it does. This indicates that its likely your setup.

 

 

 

Without these steps you'll be chasing your tail. 

 

 

 

If its only dropping as part of your load balanced setup a visual diagram of the setup that would be extremely helpful.

 

 

BMarquis
327 posts

Ultimate Geek

Trusted
Chorus
Lifetime subscriber

  #2625464 22-Dec-2020 13:22
Send private message

 

  • 22.12.20 05:03:19 PPPoE error: Timeout. [4 messages since 22.12.20 05:02:38]
  • 22.12.20 05:02:24 Timeout during PPP negotiation. 22.12.20 05:02:24 Internet connection cleared.

 

To me, this would tend to indicate lost LCP messages, causing PPP to drop.

 

The most likely cause is something in your network. hopefully its not loadbalancing the LCP messages, although that's not something that would expect to suddenly change behaviour :)

 

But.... its going to be quite difficult to track down. You might need to start packet capturing between the EdgeRouter and Fritz's to see whats going on with PPP and LCP

hio77
'That VDSL Cat'
12970 posts

Uber Geek

ID Verified
Trusted
Voyager
Subscriber

  #2625467 22-Dec-2020 13:25
Send private message

BMarquis:

 

 

  • 22.12.20 05:03:19 PPPoE error: Timeout. [4 messages since 22.12.20 05:02:38]
  • 22.12.20 05:02:24 Timeout during PPP negotiation. 22.12.20 05:02:24 Internet connection cleared.

 

To me, this would tend to indicate lost LCP messages, causing PPP to drop.

 

The most likely cause is something in your network. hopefully its not loadbalancing the LCP messages, although that's not something that would expect to suddenly change behaviour :)

 

But.... its going to be quite difficult to track down. You might need to start packet capturing between the EdgeRouter and Fritz's to see whats going on with PPP and LCP

 

 

Loadbalance/Failover connections are only as reliable as the connections between them and the wide world...

 

Would not surprise me if there is an identical piece of equipment handling all 4 connections in the path. Would also explain LCP loss rather than physical DSL.




#include <std_disclaimer>

 

Any comments made are personal opinion and do not reflect directly on the position my current or past employers may have.

 

 

sbiddle
30853 posts

Uber Geek

Retired Mod
Trusted
Biddle Corp
Lifetime subscriber

  #2625470 22-Dec-2020 13:29
Send private message

I would assume there is also a chance these  connections are all on the same ISAM linecard. If you're somehow creating a loop and making a MAC address visible on a 2nd interface it can do all sorts of strange things.



cyril7
8736 posts

Uber Geek

ID Verified
Trusted
Subscriber

  #2625477 22-Dec-2020 13:41
Send private message

sbiddle:

 

I would assume there is also a chance these  connections are all on the same ISAM linecard. If you're somehow creating a loop and making a MAC address visible on a 2nd interface it can do all sorts of strange things.

 

 

Exactly, I look after a set of private DSLAMs at work (not ALU ISAMs) and if a MAC jumps from vlan to vlan which can happen the DSLAM will see it as a rouge device and temporarily close the ports, same goes if the MAC is seen move from port to port, which often happens in our environment as clients move around the camp, so to counter that I have rouge MAC flapping disabled, its totally possible the ISAM has a similar filter.

 

Cyril

BMarquis
327 posts

Ultimate Geek

Trusted
Chorus
Lifetime subscriber

  #2625478 22-Dec-2020 13:44
Send private message

cyril7:

 

sbiddle:

 

I would assume there is also a chance these  connections are all on the same ISAM linecard. If you're somehow creating a loop and making a MAC address visible on a 2nd interface it can do all sorts of strange things.

 

 

Exactly, I look after a set of private DSLAMs at work (not ALU ISAMs) and if a MAC jumps from vlan to vlan which can happen the DSLAM will see it as a rouge device and temporarily close the ports, same goes if the MAC is seen move from port to port, which often happens in our environment, so to counter that I have rouge MAC flapping disabled, its totally possible the ISAM has a similar filter.

 

Cyril

 

 

 

 

Our DSLAMs will do that, but they only block the offending MAC.  That also relates to my comment about a behaviour that wouldn't suddenly change.
If this (MAC moving) is happening, there was probably some kind of change to the EdgeRouter which triggered it.

hugovg

3 posts

Wannabe Geek


  #2625513 22-Dec-2020 15:19
Send private message

Thank to all responders for their advice:

 

c0ld: thanks re IPv6

 

michaelmurfy - thanks: 

 

  • Re. firmware on the ER, we stayed at the most mature version of 1.x as it was stable and didn't have the issues reported for the early v2.x versions that could have potentially affected us. But sufficient time has elapsed for those to have been resolved, and an upgrade was planned. I couldn't find any reference to bugs remaining n the v1.x firmware that related to our issue.
  • There is no double NAT in our configuration currently. I only mentioned it as a potential consequence of returning the Fritz!Boxes to a very simple config for easier fault finding with 2D Support.
  • Thanks re Draytek DV130 recommendation, purchasing 4 of those for their simple PPPoE pass-though config is an option, just annoying as fibre is not far away.

danielfaulknor: Re. 4 VDSL connections and bond them - can you elaborate on the equipment you have used for this?

 

ArcticSilver: Re. "Isolate the issue first" - agreed, and we have done that (1 of the FB's is totally separated now with just a laptop connected to it), and the problem remains just the same, twice today. Plus have swapped in a spare ER-8 router this morning, no change.

 

BMarquis, hio77, sbiddle, cyril7:

 

  • thanks for this new line of enquiry guys, I am ok with general networking/routing but will need to get more up to speed on this side of things.
  • I think we can assume the 4 lines will likely be on the same card/DSLAM, as they are all associated with the same physical address and I believe connected at the same time.
  • It's also possible additional subscriber/s have been added recently, perhaps exposing & then exacerbating the issue?
  • Re. changing MAC address on our side, I don't really understand how, as each WAN eth port (with own MAC) on the ER could not change (TBC?), though not sure about the PPPoE sub-interface off them? Does it have its own MAC addr & could they change? Would need to read up on that sorry & do some testing, no idea why it would change once the PPPoE interfaces were up.
  • Considering possible solutions at this very late stage in the year, is it more likely we could work around it, or request 2Degrees to ask Chorus to disable "rouge MAC flapping" on the card we are attached to or split our connections across different cards? (both sound very unlikely to me).
  • If we cant work around it, we are in a spot of poo, so anything we can do to work within the limitations if they can be quantified must be the best option. Would the Drayton devices be of any advantage vs the Fritz!Boxes in this regard.
  • Or change ISP?!

Thank you all for your help.

ArcticSilver
714 posts

Ultimate Geek


  #2625636 22-Dec-2020 20:44
Send private message

It sounds like it’s not your problem. I would go back to 2Degrees about the issues with your isolated connection (Just plugged into your laptop) and see if they can track it down from there.


Unless you have the DSL lines bridged somewhere? Which seems unlikely (since all of the connections connect separately)

ArcticSilver
714 posts

Ultimate Geek


  #2625640 22-Dec-2020 20:48
Send private message

I assume you have no MAC address spoofing going on anywhere on the other connections that could have the MAC or the Fritzbox you’ve isolated?

danfaulknor
789 posts

Ultimate Geek

Trusted
Prodigi

  #2625651 22-Dec-2020 21:41
Send private message

hugovg:

 

danielfaulknor: Re. 4 VDSL connections and bond them - can you elaborate on the equipment you have used for this?

 

 

It's not really an immediate solution, but possibly something to explore in January if you still get nowhere with 2degrees who may be a little biased against you now they know you're doing something quite complex.

 

You can do it with the EdgeOS router(s) you have, but you need another router at the "other end", so bonding is probably not something that an ISP like 2degrees will do. It's something we can do, happy to chat about it in more detail if you like.

 

 

 

Another unrelated thought that has just occurred to me, to rule out the EdgeRouter having all the PPPoE connections being the cause and still have no double NAT, you could move PPPoE back to the individual Fritzboxes and then add some static routes from the Fritzes to the EdgeRouter, and turn off NAT on the EdgeRouter. You can still load balance out normal interfaces as well.




they/them

 

Prodigi - Optimised IT Solutions
WebOps/DevOps, Managed IT, Hosting and Internet/WAN.

sbiddle
30853 posts

Uber Geek

Retired Mod
Trusted
Biddle Corp
Lifetime subscriber

  #2625684 23-Dec-2020 07:12
Send private message

The "proper" way to do what you're wanting to do is MLPPP. This is true aggregation of multiple PPP links but requires the BNG at the RSP end to support this, and for your hardware to also support this.

 

A number of smaller RSP's that use Mikrotik hardware could also support basic Mikrotik bonding which requires a Mikrotik at both ends, and works relatively well.

 

 

BMarquis
327 posts

Ultimate Geek

Trusted
Chorus
Lifetime subscriber

  #2625691 23-Dec-2020 07:39
Send private message

It would be remiss of me to not at least take a look to see if I can spot anything obvious or a more widespread issue.

 

Can you please PM me the address where the DSL lines are?

 

 

View this topic in a long page with up to 500 replies per page Create new topic





