WAN DHCP Issues/Requirements

Advanced feature discussion, beta programs and unsupported "Labs" features.
8 posts Page 1 of 1
by iklier1221 » Mon Sep 19, 2022 8:58 am
I'm in the process of swapping out my UDM Pro with a pfSense box, after setting the proper interfaces, I'm not getting a WAN IP vended via DHCP to the pfSense box.
• I get link lights on the SFP module and host light the ONT with the pfSense box connected.
• Tried power cycling both the pfSense box and ONT.
• Swapping the WAN cable back to the UDM Pro continues to work.

My next steps are:
• Contact support to flush the MAC table in the ONT. I don't think this is the issue, since I've only ever connected 3 devices to the ONT (UDM Pro, Test Host, pfSense box)
• Allow WAN ICMP on the pfSense box. (Saw some reports of this potentially being an issue as it is blocked by default)

Are there any other requirements on the WAN side to get vended a WAN DHCP address?
by js9erfan » Tue Sep 20, 2022 6:16 am
Is this a Netgate box or 3rd party? What pfSense version are you running? SFP model? I've setup several pfSense boxes with Sonic (even one as recent as last weekend without much issue). With older pfSense versions there was some issues with WAN DHCP renewals but I resolved that by either putting a dumb switch between the WAN interface and modem or manually forcing the speed/duplex on the WAN interface. That said I haven't seen this issue resurface since upgrading to pfSense+ v22.05.

For IPv4 allowing ICMP on WAN is not required (see WAN rules below from a box on Sonic's 10G fiber).

I would suggest looking at the following if clearing the MAC table doesn't resolve it:

- Upgrade pfSense if applicable
- Check System logs for DHCP errors
- Force speed/duplex on WAN interface
- As a test, assign a rj45 port if available as WAN to rule out the SFP


Good luck!

Screenshot 2022-09-20 060509.png
Screenshot 2022-09-20 060509.png (77.24 KiB) Viewed 702 times
by iklier1221 » Wed Sep 21, 2022 8:36 am
This is a Netgate box running 22.05-RELEASE.

After some trial and error it came down to the fiber SFP+ modules not being happy; with the UDM Pro I had a working setup of
ONT [RJ45] --> FS.com passive media converter (RJ45 to SFP+)[FS.com FS SFP+] -> OM3 -> UDM Pro WAN [FS.com GEN SFP+]
I tried a few variants (Intel, Generic) of the FS.COM SFP+ fiber modules in the WAN port and they all got link lights but no IP.
Swapping the Fiber SFP+ for a 10GbE one that I had previously used in the UDM Pro worked perfectly.

I did notice the web UI listed a single "10Gbase-SR" option in the Speed and Duplex dropdown, but setting it with the fiber or 10GbE SFP module resulted in the link going down and an error for that interface in the CLI.

I guess I'll stick with the toasty RJ45 SFP+ module until I can figure out a working solution for the fiber SFP+ modules.
by iklier1221 » Wed Sep 21, 2022 4:17 pm
Addendum of things I've learned so others have a place to start.
Setup details:
- Netgate 1537 running pFSense 22.05-RELEASE (amd64),
- WAN to Adtran 622 ONT via Ubiquiti 10GbE SFP+ module
- LAN to USW-Pro-Aggregation via SFP+ DAC cable
- WAN DHCP renewal is 3hrs

1. There is a bug (https://redmine.pfsense.org/issues/13217) in pFSense 22.05-RELEASE, where the dhclient boot script writes the pid in /var/run/dhclient/, however it doesn't ensure the dhclient sub-directory exists. This leads to the system not being able to track the current pid causing WAN to have issues at every WAN dhcp renewal as multiple dhclient processes are created.
Until the bug is fixed you can use the workaround detailed in this thread https://forum.netgate.com/topic/168172/dhclient-error-cannot-open-or-create-pidfile-no-such-file-or-directory to create the directory during boot.

2. Not sure why, but after changing the LAN DHCP server range from 192.168.1.1/24 to 10.0.0.1/24, the LAN Net alias was still mapped to 192.168.1.1/24 causing all LAN to WAN traffic to get dropped, ended up adding a LAN to any/any rule to restore routing.
by iklier1221 » Fri Sep 23, 2022 10:36 am
Alright, I've isolated the issue further and I'm kind of stumped, it appears that when I consume significant amounts of traffic in a short period of time, something in the Sonic network starts blocking my device.

Setup:
- Netgate 1537 box (22.05-RELEASE (amd64))
- LAN via DAC (ix1)
- WAN via Ubiquiti SFP+ (ix0)
- Gateway monitoring set to 1 hop into the Sonic network
- Disable Gateway Monitoring Action is checked.

With this setup, I can use the internet normally including "normal" higher bandwidth like downloading games from Steam, streaming content, etc. During this time max bandwidth is ~3Gbps down and the WAN link remains stable.

If I however run a high bandwidth speed test, a few minutes later I lose routing to the Internet.
9:50am - WAN is working fine, doing remote work with VPN and no issues.
9:50am-9:53am - Run 3 back to back speed tests 15sec duration, 18 streams, hitting 8Gbps down and 4.5Gbps up using a internal work speed test tool.
9:55am - Internet routing goes down. WAN interface is up, can ping two hops into Sonic network, but Internet host pings fail (e.g. 1.1.1.1, 8.8.8.8)

I tried the following which did not resolve the issue.
- take WAN interface down, wait a few seconds and bring it back up. Interface transitions and pulls same WAN address from DHCP when it comes up.
- Rebooted pfSense box
- Hot-plugged the WAN SFP module

If I then take the same SFP module still connected to the ONT and plug it into the UDM Pro, Internet routing is re-established and I can workaround the issue running *shudders* double-NAT with UDM Pro between the ONT and pfSense box.

After about an hour I swapped the SFP back to the Netgate box and it is working fine again.

Is there some backend system detecting this as a threat and locking out that MAC for a fixed time?
by js9erfan » Tue Sep 27, 2022 6:57 am
So the interface is up when you lose routing? No flapping, etc.? Are you seeing any interface errors on wan? Do you have multiple gateways (VPN, etc.)? Can you ping out from the WAN interface on pfSense? Is the ARP table still showing a wan lease? Any related system logs under system general / system gateway? Might run a packet capture on wan as well when when doing the speed tests. I'd also look at your mbuf usage during the speed test and depending on the amount of firewall rules, your cpu usage as well.

If nothing is obvious and pings fail under the wan interface you might want to contact Netgate tac. They provide 'zero to ping' for all customers who purchase Netgate boxes and can probably diagnose the issue.

On another note it appears v22.11 is getting released next month with some big changes. Perhaps some of these issues will also get cleaned up.
by iklier1221 » Wed Sep 28, 2022 1:03 pm
WAN interface stays up the whole time, I looked through the logs and don't see anything out of the ordinary. When it is in this state I can ping a few hops into the traceroute, but can't hit anything outside the Sonic network (e.g. 1.1.1.1, 208.67.220.220 or Apple.com), in this state if I fallback to the Sonic DNS I can still resolve hostnames, just can't reach them.

Bouncing the interface, hot plugging the SFP+ module, and rebooting the pfSense box have no effect.

I didn't see any noticeably high system stats (e.g. mbuf utilization, CPU, MEM, states, etc)

If I take the SFP+ module from the pfSense WAN port and plug it into the UDM Pro, I get DHCP and internet access.
If I then wait maybe 30min to an hour and move the SFP+ module back the pfSense box it starts working again.

The fact that moving the same SFP+ module to another router, which will cause the WAN side MAC to change, magically makes things work really, has me thinking something in the Sonic network is detecting this huge burst of traffic (short runs are around 23GB , longer tests are 43GB) and locking out the MAC for a fix amount of time.

I want to reproduce this again and do a packet capture to see what is going on, I just need to find some time to take Internet without housemates freaking out.
by js9erfan » Wed Sep 28, 2022 8:35 pm
iklier1221 wrote:
The fact that moving the same SFP+ module to another router, which will cause the WAN side MAC to change, magically makes things work really, has me thinking something in the Sonic network is detecting this huge burst of traffic (short runs are around 23GB , longer tests are 43GB) and locking out the MAC for a fix amount of time.


If you only encounter this issue on the pfSense box you could always try spoofing the mac address on the wan interface...
8 posts Page 1 of 1

Who is online

In total there is 1 user online :: 0 registered, 0 hidden and 1 guest (based on users active over the past 5 minutes)
Most users ever online was 999 on Mon May 10, 2021 1:02 am

Users browsing this forum: No registered users and 1 guest