Hello, all!
I'm having a serious problem with my 10 Gb fiber service. I have been unable to resolve this problem since the service was installed.
Every few days, my IPv4 connection drops for several hours. The symptom that I can see is that starting at some point, my router's ARP requests to the Sonic edge router go unanswered, so my router becomes unable to reach the Sonic router. This is only corrected once the DHCP lease is about to expire, and my DHCP client sends a DHCPREQUEST to the broadcast address. The Sonic DHCP server then responds with a DHCPACK, and simultaneously starts answering ARP requests again. Until it happens again a few days later.
I have called Sonic support to investigate, and the support tech told me that he could not see the ARP requests from my router--even as I watched the packets leave my router on the way to the ONT. I'm not exactly sure where he was watching the traffic, but he did tell me that he couldn't directly inspect traffic at the ONT.
This appears to be a bug or configuration problem in either the ONT or the Sonic edge router (the one that my CPE router talks to through the ONT). Either the ONT is failing to bridge the packets, or the Sonic edge router is ignoring them. I'm not able to determine any pattern to the cessation of ARP responses.
Things we have tried:
Some additional details:
What would cause the ONT to stop forwarding ARP requests at some point? Or what would cause the Sonic edge router to stop receiving and acting on them? Why does a DHCPREQUEST to the broadcast address trigger things to start working again?
Also, why is this not causing massive problems for all Sonic customers? I'm not the only one seeing this, though: viewtopic.php?t=18064
Our CPE router is running a Linux-based routing OS, VyOS. This same hardware is currently working just fine with an AT&T connection and has no problems with Comcast, either. We've had no issues running in a data center connected to Hurricane Electric, as well. I've never seen a problem like this in any other installation.
I'm having a serious problem with my 10 Gb fiber service. I have been unable to resolve this problem since the service was installed.
Every few days, my IPv4 connection drops for several hours. The symptom that I can see is that starting at some point, my router's ARP requests to the Sonic edge router go unanswered, so my router becomes unable to reach the Sonic router. This is only corrected once the DHCP lease is about to expire, and my DHCP client sends a DHCPREQUEST to the broadcast address. The Sonic DHCP server then responds with a DHCPACK, and simultaneously starts answering ARP requests again. Until it happens again a few days later.
I have called Sonic support to investigate, and the support tech told me that he could not see the ARP requests from my router--even as I watched the packets leave my router on the way to the ONT. I'm not exactly sure where he was watching the traffic, but he did tell me that he couldn't directly inspect traffic at the ONT.
This appears to be a bug or configuration problem in either the ONT or the Sonic edge router (the one that my CPE router talks to through the ONT). Either the ONT is failing to bridge the packets, or the Sonic edge router is ignoring them. I'm not able to determine any pattern to the cessation of ARP responses.
Things we have tried:
- Replacing the CPE router with a different type of hardware
- Rebuilding the CPE router config from scratch
- Replacing the ONT
- Manually sending other broadcast traffic during the outages
Some additional details:
- Sonic's DHCP server does not respond at all to unicast DHCPREQUESTs to refresh the lease. Ever. This seems odd.
- Because the lease time is 6 hours, the outages last less than 6 hours. But they're often 4 or 5 hours, which is of course unacceptable.
- IPv6 traffic is unaffected; NDP packets continue to work as expected. It's just ARP that fails.
- IPv4 inbound traffic continues, but since our router's ARP cache entry has expired and ARP requests receive no replies, our router cannot send responses. We also can't initiate any outbound connections. The IPv4 link is effectively down.
- There are no interface errors on the CPE router <-> ONT ethernet link.
Code: Select all
...
09:06:22.245883 ARP, Request who-has 192-184-176-1.fiber.dynamic.sonic.net tell 192-184-177-68.fiber.dynamic.sonic.net, length 28
09:06:23.276791 ARP, Request who-has 192-184-176-1.fiber.dynamic.sonic.net tell 192-184-177-68.fiber.dynamic.sonic.net, length 28
09:06:24.293870 ARP, Request who-has 192-184-176-1.fiber.dynamic.sonic.net tell 192-184-177-68.fiber.dynamic.sonic.net, length 28
09:06:25.318874 ARP, Request who-has 192-184-176-1.fiber.dynamic.sonic.net tell 192-184-177-68.fiber.dynamic.sonic.net, length 28
-- typed `renew dhcp int eth9` here --
09:06:26.357816 ARP, Request who-has 192-184-176-1.fiber.dynamic.sonic.net tell 192-184-177-68.fiber.dynamic.sonic.net, length 28
09:06:27.337164 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 20:7c:14:f5:90:59 (oui Unknown), length 300
09:06:27.529024 IP bng1.snfcca05.sonic.net.bootps > 192-184-177-68.fiber.dynamic.sonic.net.bootpc: BOOTP/DHCP, Reply, length 548
09:06:27.529205 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 20:7c:14:f5:90:59 (oui Unknown), length 300
09:06:27.553513 IP bng1.snfcca05.sonic.net.bootps > 192-184-177-68.fiber.dynamic.sonic.net.bootpc: BOOTP/DHCP, Reply, length 548
09:06:27.566775 ARP, Reply 192-184-176-1.fiber.dynamic.sonic.net is-at b4:f9:5d:35:2e:3c (oui Unknown), length 50
09:06:28.328917 ARP, Request who-has 192-184-176-1.fiber.dynamic.sonic.net tell 192-184-177-68.fiber.dynamic.sonic.net, length 28
09:06:28.333337 ARP, Reply 192-184-176-1.fiber.dynamic.sonic.net is-at b4:f9:5d:35:2e:3c (oui Unknown), length 50
Also, why is this not causing massive problems for all Sonic customers? I'm not the only one seeing this, though: viewtopic.php?t=18064
Our CPE router is running a Linux-based routing OS, VyOS. This same hardware is currently working just fine with an AT&T connection and has no problems with Comcast, either. We've had no issues running in a data center connected to Hurricane Electric, as well. I've never seen a problem like this in any other installation.