Inbound Fragments Dropped over 100Mbps

Internet access discussion, including Fusion, IP Broadband, and Gigabit Fiber!
22 posts Page 1 of 3
by mattcorallo » Tue Apr 14, 2020 5:09 pm
On fiber, it seems that a UDP VPN which uses a full 1500 bytes internally results in some kind of anti-DoS appliance kicking in and dropping all inbound fragments for 30 seconds any time the flow gets over ~100Mbps. In-VPN pings continue fine, except for those which are of size sufficient to cause fragmentation, which will pause for about 30 seconds after the filter kicks in. This screws with my work VPN, which seems to be in a catch-22 - either you have MTU issues accessing things because the in-tunnel MTU is <1500 bytes (alas, the v6 works is screwed), or you have the connection drop for 30 seconds because Sonic starts dropping packets.

Is there some way to allow fragments on established flows? Fragment-based DDoS seems to not be much of an issue anymore these days with modern kernels on the other end :/.
by kgc » Thu Apr 23, 2020 2:54 pm
What router are you using? Have you tried plugging a computer directly into the ONT to see if the problem exists when directly connected?
Kelsey Cummings
System Architect, Sonic.net, Inc.
by mattcorallo » Fri Apr 24, 2020 10:37 am
Yes, this is after testing performed with a Linux box connected directly to the ONT. Seems to happen across several different inbound paths towards Sonic - at least the GTT and Telia links.
by tomoc » Sun Apr 26, 2020 12:42 pm
Other than shaping traffic to your contracted rate (1000mbps/1000mbps on Fusion Fiber), we don't implement any traffic manipulation on the access side, nor through the core of our network. Can you provide a couple traceroutes to help us figure out what's happening here?

Traceroute #1: To/from VPN concentrator without VPN enabled
Traceroute #2: To/from problem destination with VPN enabled, large packet size
Traceroute #3: To/from problem destination with VPN enabled, large packet size, while experiencing your large flow problem

If you can't provide any of these, that's fine, we'll do our best with what we can get
Tomoc
Sonic NOC
by mattcorallo » Mon Apr 27, 2020 5:55 pm
The trace doesn't change between before and druing the drops, eg here's one:

Code: Select all

traceroute to 198.27.190.XXX (198.27.190.XXX), 30 hops max, 60 byte packets
 1  * * *
 2  107.191.59.33 (107.191.59.33)  3.621 ms  6.600 ms  9.574 ms
 3  * * *
 4  * * *
 5  ae11-551.cr5-lax2.ip4.gtt.net (173.205.58.193)  0.320 ms  0.319 ms  0.257 ms
 6  ae6.cr6-sjc1.ip4.gtt.net (89.149.180.78)  21.300 ms  20.721 ms  20.663 ms
 7  as7065.xe-3-0-3.ar2.sjc1.us.as4436.gtt.net (69.22.130.110)  7.127 ms  8.299 ms  25.966 ms
 8  100.ae1.cr1.equinix-sj.sonic.net (75.101.33.186)  105.655 ms  104.624 ms  102.205 ms
 9  0.ae0.cr1.hywrca01.sonic.net (75.101.36.254)  87.581 ms  87.573 ms  87.563 ms
10  0.ae0.cr1.rcmdca11.sonic.net (157.131.209.74)  75.563 ms  216.230 ms  216.188 ms
11  0.ae1.cr1.snrfca01.sonic.net (157.131.209.138)  9.062 ms  9.069 ms  9.053 ms
12  0.ae2.cr2.snrfca01.sonic.net (157.131.209.170)  27.610 ms  93.338 ms  93.332 ms
13  0.ae0.cr1.snfcca05.sonic.net (198.27.244.50)  23.680 ms  107.220 ms  107.193 ms
14  300.ae0.bras1.snfcca05.sonic.net (198.27.186.140)  9.168 ms  9.164 ms  9.165 ms
15  198-27-190-XXX.fiber.dynamic.sonic.net (198.27.190.XXX)  10.077 ms  10.065 ms  9.796 ms
and here's two outbounds (one before, one during the inbound drops):

Code: Select all

traceroute to vt-lax-prox.as397444.net (144.202.126.211), 30 hops max, 60 byte packets
 1  lo0.bras1.snfcca05.sonic.net (50.0.79.115)  1.373 ms  1.213 ms  0.924 ms
 2  301.irb.cr2.snfcca05.sonic.net (198.27.186.210)  21.177 ms  21.043 ms  20.777 ms
 3  * * *
 4  0.ae2.cr1.snrfca01.sonic.net (157.131.209.169)  2.628 ms  2.535 ms 0.ae1.cr1.colaca01.sonic.net (157.131.209.65)  7.082 ms
 5  0.ae0.cr1.lsatca11.sonic.net (157.131.209.86)  9.084 ms  8.970 ms  8.901 ms
 6  * * 0.ae1.cr1.snjsca11.sonic.net (157.131.209.149)  7.329 ms
 7  * * *
 8  * * *
 9  ae17.cr6-sjc1.ip4.gtt.net (69.22.130.109)  4.598 ms  3.181 ms 100.ae1.nrd1.equinix-sj.sonic.net (75.101.33.185)  2.770 ms
10  ae16.cr5-lax2.ip4.gtt.net (89.149.182.174)  10.110 ms  9.093 ms ae17.cr6-sjc1.ip4.gtt.net (69.22.130.109)  3.746 ms
11  ae16.cr5-lax2.ip4.gtt.net (89.149.182.174)  11.261 ms as20473-gw.lax20.ip4.gtt.net (173.205.58.194)  10.203 ms  10.036 ms
12  as20473-gw.lax20.ip4.gtt.net (173.205.58.194)  9.857 ms  9.955 ms  11.029 ms
13  * * *
14  * * *
15  * vt-lax-prox.as397444.net (144.202.126.211)  9.879 ms  9.428 ms

Code: Select all

traceroute to vt-lax-prox.as397444.net (144.202.126.211), 30 hops max, 60 byte packets
 1  lo0.bras1.snfcca05.sonic.net (50.0.79.115)  0.889 ms  0.367 ms  0.906 ms
 2  301.irb.cr2.snfcca05.sonic.net (198.27.186.210)  20.114 ms 300.irb.cr1.snfcca05.sonic.net (198.27.186.138)  12.889 ms  12.304 ms
 3  0.ae14.cr2.snrfca01.sonic.net (198.27.244.49)  24.841 ms 0.ae14.cr2.colaca01.sonic.net (198.27.244.57)  41.855 ms  41.406 ms
 4  0.ae1.cr1.colaca01.sonic.net (157.131.209.65)  39.654 ms  39.596 ms  39.586 ms
 5  0.ae1.cr1.rcmdca11.sonic.net (157.131.209.137)  13.696 ms  13.033 ms 0.ae0.cr1.snrfca01.sonic.net (157.131.209.82)  8.229 ms
 6  0.ae3.cr1.hywrca01.sonic.net (157.131.209.73)  6.947 ms  3.032 ms  2.978 ms
 7  0.ae3.cr1.hywrca01.sonic.net (157.131.209.73)  11.082 ms  8.113 ms  7.960 ms
 8  100.ae1.nrd1.equinix-sj.sonic.net (75.101.33.185)  8.542 ms 0.ae0.cr1.equinix-sj.sonic.net (75.101.36.253)  9.796 ms  14.513 ms
 9  ae17.cr6-sjc1.ip4.gtt.net (69.22.130.109)  2.862 ms 100.ae1.nrd1.equinix-sj.sonic.net (75.101.33.185)  3.108 ms  3.031 ms
10  ae17.cr6-sjc1.ip4.gtt.net (69.22.130.109)  2.910 ms  3.099 ms  2.942 ms
11  as20473-gw.lax20.ip4.gtt.net (173.205.58.194)  11.141 ms  11.033 ms ae16.cr5-lax2.ip4.gtt.net (89.149.182.174)  9.386 ms
12  * * as20473-gw.lax20.ip4.gtt.net (173.205.58.194)  10.308 ms
13  * * *
14  * * *
15  vt-lax-prox.as397444.net (144.202.126.211)  9.715 ms * *
Both of the following pings were sent inside the tunnel at the same time (note the drops between 158 and 189 on the pings using 1448 bytes, which blows up to >1500 bytes outside but no drops on the smaller packets). The traffic was generated with iperf -l 1410 -V -u -c IP_INSIDE_VPN -b 200M, but the issue doesn't occur if the bandwidth is less than around 100M. I can provide pcaps as well, but there's nothing surprising there.

Code: Select all

1448 bytes from IP_INSIDE_VPN: icmp_seq=154 ttl=64 time=10.7 ms
1448 bytes from IP_INSIDE_VPN: icmp_seq=155 ttl=64 time=10.3 ms
1448 bytes from IP_INSIDE_VPN: icmp_seq=156 ttl=64 time=27.5 ms
1448 bytes from IP_INSIDE_VPN: icmp_seq=157 ttl=64 time=11.0 ms
1448 bytes from IP_INSIDE_VPN: icmp_seq=158 ttl=64 time=10.9 ms
1448 bytes from IP_INSIDE_VPN: icmp_seq=189 ttl=64 time=131 ms
1448 bytes from IP_INSIDE_VPN: icmp_seq=190 ttl=64 time=10.3 ms
1448 bytes from IP_INSIDE_VPN: icmp_seq=191 ttl=64 time=10.5 ms
1448 bytes from IP_INSIDE_VPN: icmp_seq=192 ttl=64 time=10.2 ms
1448 bytes from IP_INSIDE_VPN: icmp_seq=193 ttl=64 time=10.5 ms
1448 bytes from IP_INSIDE_VPN: icmp_seq=194 ttl=64 time=11.8 ms
1448 bytes from IP_INSIDE_VPN: icmp_seq=195 ttl=64 time=11.3 ms
1448 bytes from IP_INSIDE_VPN: icmp_seq=196 ttl=64 time=10.2 ms

Code: Select all

64 bytes from IP_INSIDE_VPN: icmp_seq=12 ttl=64 time=10.3 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=13 ttl=64 time=9.90 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=14 ttl=64 time=35.5 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=15 ttl=64 time=9.54 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=16 ttl=64 time=10.8 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=17 ttl=64 time=10.6 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=18 ttl=64 time=10.1 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=19 ttl=64 time=10.5 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=20 ttl=64 time=10.4 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=21 ttl=64 time=9.79 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=22 ttl=64 time=9.64 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=23 ttl=64 time=10.6 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=24 ttl=64 time=10.1 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=25 ttl=64 time=10.3 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=26 ttl=64 time=9.58 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=27 ttl=64 time=10.2 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=28 ttl=64 time=9.87 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=29 ttl=64 time=9.81 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=30 ttl=64 time=9.45 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=31 ttl=64 time=10.0 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=32 ttl=64 time=10.3 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=33 ttl=64 time=9.59 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=34 ttl=64 time=9.96 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=35 ttl=64 time=9.62 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=36 ttl=64 time=9.74 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=37 ttl=64 time=10.8 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=38 ttl=64 time=9.88 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=39 ttl=64 time=10.6 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=40 ttl=64 time=10.3 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=41 ttl=64 time=10.6 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=42 ttl=64 time=9.93 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=43 ttl=64 time=9.81 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=44 ttl=64 time=9.63 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=45 ttl=64 time=9.59 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=46 ttl=64 time=10.5 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=47 ttl=64 time=10.1 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=48 ttl=64 time=11.1 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=49 ttl=64 time=10.5 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=50 ttl=64 time=10.4 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=51 ttl=64 time=11.0 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=52 ttl=64 time=10.4 ms
64 bytes from IP_INSIDE_VPN: icmp_seq=53 ttl=64 time=9.53 ms
by mattcorallo » Mon Apr 27, 2020 6:18 pm
While testing the above it seems fragmented traffic from several sources towards my IP was dropped for the same time period, not just this one, implying its in Sonic's network.

Also note the one 35ms ping on the small-packet-size results - that also occurs pinging the next-hop/8.8.8.8/etc directly without the VPN (with about 0.1-0.3% packet loss depending on the time-of-day), I presume that's just regular GPON neighbor results and not an issue with the line somehow?
by tomoc » Tue Apr 28, 2020 3:02 pm
Thanks for the good info, I wish it was enough to get to the bottom of this.

We're still missing some insight on where in the path your traffic is getting dropped.
Do you have the ability to run a continuous trace with a utility like mtr that can show us hop-by-hop loss? I suspect if the problem is within Sonic, it will be on the access side (BRAS/OLT/ONT) and would like to make sure we put our efforts in the right spot when we dig in.

Do you experience the same problem with non-vpn fragmented traffic?

To address your next-hop latency/loss, you're correct that's just related to GPON bandwidth sharping/packet prioritization. I took a quick look at your circuit to verify that your light levels are good and your connection is error-free.
Tomoc
Sonic NOC
by mattcorallo » Tue Apr 28, 2020 7:25 pm
Ah, sorry, totally missed what you were going for. Indeed, seems to be on the OLT/ONT end - bras still pings. The packets dont show in tcpdump, and there's no XDP installed. I did a once-over to see if I saw any rate-limiting on fragments on the driver side (linux igb) and didn't see anything.

Code: Select all

~# traceroute -U -p4242 198.27.190.XXX 1520
traceroute to 198.27.190.XXX (198.27.190.XXX), 30 hops max, 1520 byte packets
...
 5  ae11-551.cr5-lax2.ip4.gtt.net (173.205.58.193)  4.832 ms  4.855 ms  4.834 ms
 6  ae6.cr6-sjc1.ip4.gtt.net (89.149.180.78)  7.965 ms  17.534 ms  17.496 ms
 7  as7065.xe-3-0-3.ar2.sjc1.us.as4436.gtt.net (69.22.130.110)  7.968 ms  8.028 ms  7.936 ms
 8  * * *
 9  * * *
10  * * *
11  * * *
12  * * *
13  * * *
14  300.ae0.bras1.snfcca05.sonic.net (198.27.186.140)  10.101 ms  10.084 ms  10.915 ms
15  198-27-190-XXX.fiber.dynamic.sonic.net (198.27.190.XXX)  11.469 ms  11.468 ms  11.459 ms
~# traceroute -U -p4242 198.27.190.XXX 1520
traceroute to 198.27.190.XXX (198.27.190.XXX), 30 hops max, 1520 byte packets
...
 5  ae11-551.cr5-lax2.ip4.gtt.net (173.205.58.193)  18.261 ms  18.308 ms  18.237 ms
 6  ae6.cr6-sjc1.ip4.gtt.net (89.149.180.78)  7.634 ms  7.834 ms  7.729 ms
 7  as7065.xe-3-0-3.ar2.sjc1.us.as4436.gtt.net (69.22.130.110)  11.261 ms  10.029 ms  7.899 ms
 8  * * *
 9  * * *
10  * * *
11  * * *
12  * * *
13  * * *
14  300.ae0.bras1.snfcca05.sonic.net (198.27.186.140)  9.862 ms  9.951 ms  9.860 ms
15  * * *
16  * * *
17  * * *
18  * * *
19  * * *
20  * * *
21  * * *
22  * * *
23  * * *
24  * * *
25  * * *
26  * * *
27  * * *
28  * * *
29  * * *
30  * * *
by tomoc » Wed Apr 29, 2020 5:22 pm
Well, the behavior you're seeing is definitely not expected, so we'll start looking in to it and let you know if we need more diagnostic data.
Hopefully this will be easily reproduced in our lab.
Tomoc
Sonic NOC
by tomoc » Thu Apr 30, 2020 9:23 am
Will you clarify whether you experience this same behavior when you are NOT logged in to the VPN? Sorry if you already mentioned this, but I couldn't figure out a clear answer.

I don't think I made it clear before, but we do not have any DDoS applications in our network that do packet level filtering or could behave in this manner. If what you're seeing is caused by gear in our network, it's unintended behavior from a network device.
Tomoc
Sonic NOC
22 posts Page 1 of 3

Who is online

In total there are 29 users online :: 0 registered, 0 hidden and 29 guests (based on users active over the past 5 minutes)
Most users ever online was 999 on Mon May 10, 2021 1:02 am

Users browsing this forum: No registered users and 29 guests