long SSH packet retransmission delays between home and work

Internet access discussion, including Fusion, IP Broadband, and Gigabit Fiber!
4 posts Page 1 of 1
by detrout » Mon Jul 29, 2019 10:53 pm
I tried support but this is odd enough and AT&T fusion diagnostics are limited enough the technician suggested I ask on the forums.

I have a problem where I am having long packet retransmission delays if I try connecting directly from home to work and then try sending a large block of data. Simple interactive sessions work fine, but if I try running dmesg, or dumping a large file or tunneling jupyter notebook sessions through ssh I start getting several minutes delays between packets.

An easy way for me to trigger it is to cat a large file, it'll scroll for a bit, then hang, and then if I'm patient enough it'll continue. It happens on all the servers at Caltech that I've tried logging into. Other services hosted elsewhere like youtube and netflix seem to behave fine.

If I use a vpn to a different IP address range (sonic's vpn, or work's vpn.caltech.edu, or use a direct udp based openvpn tunnel before logging in I don't have the retransmission delays.

1 _gateway (192.168.34.1) 17.605 ms 17.658 ms 19.153 ms
2 75-27-240-1.lightspeed.psdnca.sbcglobal.net (75.27.240.1) 19.538 ms 28.725 ms 29.365 ms
3 71.147.184.13 (71.147.184.13) 20.373 ms 20.825 ms 21.477 ms
4 12.123.136.190 (12.123.136.190) 28.424 ms 27.666 ms 28.214 ms
5 ggr2.la2ca.ip.att.net (12.122.128.101) 26.528 ms 27.267 ms 26.624 ms
6 ae-5.a00.lsanca20.us.bb.gin.ntt.net (129.250.9.61) 25.632 ms 20.678 ms 20.697 ms
7 ae-3.r00.lsanca20.us.bb.gin.ntt.net (129.250.2.253) 22.027 ms 22.036 ms ae-3.r01.lsanca20.us.bb.gin.ntt.net (129.250.2.233) 79.303 ms
8 ae-6.r22.lsanca07.us.bb.gin.ntt.net (129.250.6.46) 78.999 ms 78.689 ms ae-8.r23.lsanca07.us.bb.gin.ntt.net (129.250.6.48) 74.187 ms
9 ae-2.r01.lsanca07.us.bb.gin.ntt.net (129.250.4.107) 73.190 ms ae-1.r01.lsanca07.us.bb.gin.ntt.net (129.250.3.123) 73.159 ms 72.877 ms
10 ae-1.a02.lsanca07.us.bb.gin.ntt.net (129.250.3.234) 72.485 ms ae-0.a02.lsanca07.us.bb.gin.ntt.net (129.250.2.186) 71.864 ms 71.778 ms
11 ntt-los-nettos-usc.ln.net (165.254.21.242) 71.464 ms 20.332 ms 20.477 ms
12 * * *
13 booth-rsw.ilan.caltech.edu (131.215.254.253) 24.603 ms 24.687 ms 24.974 ms
14 cacr-imss.caltech.edu (131.215.5.146) 27.471 ms 27.620 ms 27.845 ms

Attached is a compressed file of the wireshark dump of me trying to zcat a large file.

The network topology used for the capture is laptop -> 8 port ethernet switch -> Netgear WDR3700 running openwrt -> AT&T gateway in

Though since the same hardware works when I use a VPN, I think it's less like its a problem with my equipment?

Working network path using sonic openvpn:
traceroute to pongo.caltech.edu (131.215.148.81), 30 hops max, 60 byte packets
1 184-23-191-129.vpn.dynamic.sonic.net (184.23.191.129) 159.798 ms 160.017 ms 160.529 ms
2 vm-dist1-1.equinix-sj.sonic.net (157.131.0.2) 160.582 ms vm-dist1-2.equinix-sj.sonic.net (157.131.0.3) 160.557 ms vm-dist1-1.equinix-sj.sonic.net (157.131.0.2) 160.558 ms
3 308.ae4.gw.equinix-sj.sonic.net (209.148.113.209) 366.200 ms 366.156 ms 307.ae4.gw.equinix-sj.sonic.net (209.148.113.217) 365.775 ms
4 100gigabitethernet2-3.core1.sjc2.he.net (206.223.116.37) 160.433 ms 160.438 ms 160.419 ms
5 100ge15-1.core1.lax1.he.net (184.104.193.50) 160.422 ms 160.403 ms 160.404 ms
6 100ge14-1.core1.lax2.he.net (72.52.92.122) 160.396 ms 41.300 ms 62.881 ms
7 65.19.156.114 (65.19.156.114) 41.497 ms 42.463 ms 41.723 ms
8 * * *
9 booth-rsw.ilan.caltech.edu (131.215.254.253) 45.779 ms * 45.931 ms
10 booth-rsw.ilan.caltech.edu (131.215.254.253) 47.588 ms 47.742 ms cacr-imss.caltech.edu (131.215.5.146) 46.123 ms
11 * cacr-imss.caltech.edu (131.215.5.146) 46.392 ms *
12 * * *

Working network path using caltechs vpn server (cisco anyconnect)
1 booth-rsw-304.caltech.edu (131.215.5.61) 20.796 ms 20.795 ms 20.919 ms
2 cacr-imss.caltech.edu (131.215.5.146) 21.429 ms 21.422 ms 21.412 ms^C

Though the route to the caltech vpn server seems the same route as a direct connection.

I am puzzled.

Attachments

by virtualmike » Tue Jul 30, 2019 8:27 pm
Possibly related to the issues described in this message thread? It seems to come and go for me.
by detrout » Wed Jul 31, 2019 8:08 pm
It might be. The behavior sounds similar.

One thing I noticed is that I was getting TCP retransmission timeouts with ssh, but same computer to same computer over an openvpn tunnel worked fine.

Then I noticed openvpn was using UDP, so it'll keep sending packets even if one of them stalls.

Mosh is like ssh, but uses udp packets. It works fine. Then I try with ssh and it it hangs.

Things I've wondered about is AT&T's rumored buffer bloat or perhaps badly implemented network throttling.
by virtualmike » Wed Jul 31, 2019 9:22 pm
... or badly implemented routing?

AT&T updated the firmware in my router a few weeks ago, and the problem cleared up until earlier this week. It returned, but a reboot of the router made things work a bit better, but not as well as right after the firmware update.
4 posts Page 1 of 1

Who is online

In total there are 56 users online :: 1 registered, 0 hidden and 55 guests (based on users active over the past 5 minutes)
Most users ever online was 999 on Mon May 10, 2021 1:02 am

Users browsing this forum: Google [Bot] and 55 guests