For about a year we have been accessing our internet via PPPoE load balancing across 4 2Degrees VDSL connections using an UBNT ER-Pro.

Recently, we have been experiencing short (5-10 minute) outages where the PPPoE sessions on all connections are dropped, simultaneously, then resumed. Their timing is mostly (but not always) at popular times for general internet consumption (eg. after 6PM, 8-10PM etc).

2D (and Chorus) state the VDSL lines are all fine and the config is 100% valid. The only technical snippet of info we have is that a 2nd MAC address appears on a VDSL connection, which results in the disconnection, with the stated cause being another modem is being connected across the same connection/line.

This statement has resulted in many questions from us, all unanswered. Currently we are told our configuration is too complex for Business Support and they will not support us and allow us to liaise with anyone more technical despite repeated requests (now pleads). We are on our own & in the dark. Our configuration is a bit different for sure, and it's possible we are introducing "complications" that result in unforeseen issues

Our config is as follows:

4 VDSL physical connections & 5 associated ISP accounts (long story!), each with a static IP assigned. Fiber not yet available at our location.

there are 4 Fritz!Box 7940 modems/routers configured to allow PPPoE pass-through, but otherwise default (Connection to a DSL line, NAT etc.)

each Fritz!Box has its own Lan subnet defined, and connects to its own WAN eth port on the ER-Pro

4 PPPoE interfaces are defined on the 4 WAN eth interfaces, and have their own login account credentials

load balancing (LB) is configured on the 4 PPPoE WAN interfaces

Note: 3 of the 4 FB7940 have intentionally incorrect login details (from memory the FB7940 wouldn't allow empty Uname/PW fields) so in effect just pass through PPPoE from the ER-Pro, but 1 FB7940 does have a 5th 2D account login to support Fritz!Fone VOIP phone services.

Each VDSL connection supports approx 66Mb/s download, and with speedtest.net running in multi-session mode, we normally can achieve approx 220Mb/s down.

Not to say it's been perfect, but on the whole, this has allowed us to share VDSL-only speeds with 100's of clients (mostly on a Unifi guest network)

Facts & thoughts:

We have made no wiring changes so a 2nd modem appearing across a VDSL line at our end seems impossible, but a cable fault in the ground may be, but why drop all our connections if (say) 2 copper pairs short. Besides, there is never any DSL entries in logs.

Perhaps it is the 2nd PPPoE connection through 1 of the Fritz!Boxes that is the root cause? The 2D has not managed to state that this is or isn't a permissible config, but we are willing to accept that it might not be? So we disabled PPPoE pass-through on the Fritz!Box that has it's own login credentials (to support voice services), and although things were fine for 1 day, the issue has come back. Our rapid conclusion is the 2nd PPPoE connection through 1 Fritz!Box is not the cause.

There are no PPPoE or other errors in the ER messages log file, simply that LB reports reachability failures.

Ditto for the 1 Fritz!Box that does login, its PPPoE connection is simply dropped at exactly the same time.

We have no reason to suspect power, all networking gear is on a UPS & there are no other issues at the drop out times.

FRITZ!OS: the device that has its own 2D login 07.21 while others are on 07.11

No IPv6 addressing is used anywhere in our network just IPv4, but we note the Fritz!Box system event log does mention IPv6 when the connection drops (is it transporting IPv4 over IPv6?)

Possible causes?

The ER-Pro (v1.10.11 firmware) is doing something to cause the ISP's end to drop all our connections at the same time, eg. a Martian packet?

The ER-Pro is faulty, we have a spare ER8 to test this, but we think unlikely

Our LB config across the 4 PPPoE connections is wrong, but why that would cause all 5 PPPoE sessions (including the one in the FB7940 itself) to drop is beyond me.

We can drop the ER's PPPoE config & revert to a simple Fritz!Box config of NAT, & LB the WAN eth interfaces instead, which will result in double-NAT, but at least we can tell 2D Support the Fritz!Boxes are in a default config.

We have mixed up the physical VDSL lines and the 2D logins that are used across them - this is possible but we are not aware of what the restrictions are and if TRS-069 could be affecting things (eg. is it tied to 2D login/account, or Fritz!Box MAC address, or physical VDSL circuit number)?

That there is a config error on the 2D/Chorus side, which I think is still possible, as something is happening at times of the day when their network is busy,

Other?

We can do more diagnostics with effort (eg. packet capture), but really need guidance to avoid diving down rabbit holes trying to understand how PPPOoE works at a low level.

Thank you for getting this far - any thoughts great appreciated - we have our peak time upon us (Christmas & New Years) and depend 100.00000% on reliable internet services to support us & our guests, all options considered!

Also posted to UBNT/ui.com community as it may be an ER issue.

----

Follows is the system event log entries (on the Fritz!Box that does login) around the latest outage: