Network Engineer; Please Help.

Internet access discussion, including Fusion, IP Broadband, and Gigabit Fiber!
16 posts Page 1 of 2
by erickpk » Fri Mar 01, 2013 11:20 am
Hi, first off, loving the service and the support staff is great. Anytime I've ever had a question you guys have been on it. It is for this reason I hope you can help me out.

I play Tribes: Ascend, which is very ping dependent, and for the last year (since beta), the game has run very well. I should note that Tribes servers are virtualized and that they're hosted by Internap. There are two regions I play: West and Central. The West Coast servers are located in Santa Clara and the Central servers are in Dallas. Running a ping -t shows that I ping ~9ms West and ~ 51ms Central. Two months ago the game started lagging really badly and a lot of players asked the developer (Hi-Rez Studios) to look into it. Three weeks ago they asked us to run MTR reports to the IP's while playing to see what was happening. With three weeks worth of MTR reports (game servers actually getting worse) we've still not been given a reason or resolution. They're in the middle of pushing out a new game at the moment, so I get that they're busy, which is why I was hoping a network engineer could look at a couple of my MTR's as tell me what's going on? Essentially they're all the same, but here are two from yesterday and the day before:

http://pastebin.com/3dpnRdqC

http://pastebin.com/TeXGPWS5

Consistently I have two 200ms+ ping spikes when I play West, and at least two 250ms+ spikes Central. More typically however the spikes will exceed the 300-600ms range. The game has become unplayable; with all the stuttering going on, people are shooting forward and hitting themselves in the back. Not only for me, but for everyone on my team spread out across the country. We'd like to help fix that, so if you could help me understand what those MTR's are saying, I will name my firstborn after you. :)

**edit**

I forgot to add that net-code has not changed.
Here's an MTR with ping -t running at the same time to the West Coast IP: http://pastebin.com/qv2sy5kW

What I hear, often, is that this is bad routing; that this is somehow my/our fault. Please dispel that notion for me.

***edit2x***

I just read on Reddit that Hi-Rez stated specifically that this is a client ISP problem. Please help me make heads or tails of this.


***edit 3x***

Just more info for you:
Two more mtr's today, with time, total number of players in the region and ping -t for the duration:

West: http://pastebin.com/pYcJQ17m

Central: http://pastebin.com/Jp4gFhRD

Here's a small video clip of the warping that happens: http://www.youtube.com/watch?v=oPlT6RGw ... XY&index=6


Thanks in advance for your time.

Erick
by toast0 » Fri Mar 01, 2013 9:50 pm
Erick,

I'm a long time network enthusiast, and I've never been called a network engineer, but here's my take:

MTR is showing some variability in round trip times to intermediate routers, but not very much to the final destination, and your ping is showing very stable round trip times. Intermediate routers may deprioritize ICMP packets if they're busy (for whatever value of busy), so these observations alone might not indicate a problem; but you're seeing lag and exciting things like that, so there is a problem.

Try doing a ping with larger packets, if you can find out what an average sized packet for your game is, use that size, but I tried with 500 bytes (ping -l 500 -t X, there's an option for MTR as well), and I saw things like this:

Code: Select all

Reply from 107.6.89.35: bytes=500 time=34ms TTL=121
Reply from 107.6.89.35: bytes=500 time=58ms TTL=121
Reply from 107.6.89.35: bytes=500 time=111ms TTL=121
Reply from 107.6.89.35: bytes=500 time=130ms TTL=121
Reply from 107.6.89.35: bytes=500 time=34ms TTL=121
Reply from 107.6.89.35: bytes=500 time=42ms TTL=121
Reply from 107.6.89.35: bytes=500 time=57ms TTL=121
Reply from 107.6.89.35: bytes=500 time=67ms TTL=121
Reply from 107.6.89.35: bytes=500 time=38ms TTL=121
Reply from 107.6.89.35: bytes=500 time=40ms TTL=121


I assume you're connected directly the the modem, or at least on a wired connection; wireless is not the way to connect when you're diagnosing network problems. Ideally you'd also stop any other traffic on your connection, as other traffic may delay your probes going out or your probes coming back (I'm not able to do that right now, so that may be a factor in my pings).

My guess is that the telia route from San Jose Equinix to San Jose Internap is congested/flapping/something (it might also be the reverse route), but I don't know enough BGP foo to figure that out (also the Sonic.net network looking glass seems to be dead)
by erickpk » Sat Mar 02, 2013 10:54 am
Hi toast0,

Thanks for the reply. I tried doing this this am with 32, 64, 128, and 256 bytes; they all showed pings 2x higher than normal. I have another question. Dslreports seems to think this hop is a router: INTERNAP.TenGigabitEthernet1-2.ar3.DAL2. http://www.dslreports.com/routerwatch/I ... 2.gblx.net

If that's true, then doesn't that mean the hop right before it is inside Internap's building? It's the hop where I get my biggest spikes. I had one spike the other day of almost 2 seconds (1,971ms).



Code: Select all

Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Windows\system32>ping 107.6.98.194 -l 256 -t

Pinging 107.6.98.194 with 256 bytes of data:
Reply from 107.6.98.194: bytes=256 time=57ms TTL=111
Reply from 107.6.98.194: bytes=256 time=57ms TTL=111
Reply from 107.6.98.194: bytes=256 time=57ms TTL=111
Reply from 107.6.98.194: bytes=256 time=57ms TTL=111
Reply from 107.6.98.194: bytes=256 time=130ms TTL=111
Reply from 107.6.98.194: bytes=256 time=127ms TTL=111
Reply from 107.6.98.194: bytes=256 time=129ms TTL=111
Reply from 107.6.98.194: bytes=256 time=128ms TTL=111
Reply from 107.6.98.194: bytes=256 time=128ms TTL=111
Reply from 107.6.98.194: bytes=256 time=130ms TTL=111

Ping statistics for 107.6.98.194:
    Packets: Sent = 10, Received = 10, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 57ms, Maximum = 130ms, Average = 100ms
Control-C
^C
C:\Windows\system32>ping 107.6.98.194 -l 128 -t

Pinging 107.6.98.194 with 128 bytes of data:
Reply from 107.6.98.194: bytes=128 time=125ms TTL=111
Reply from 107.6.98.194: bytes=128 time=124ms TTL=111
Reply from 107.6.98.194: bytes=128 time=126ms TTL=111
Reply from 107.6.98.194: bytes=128 time=123ms TTL=111
Reply from 107.6.98.194: bytes=128 time=124ms TTL=111
Reply from 107.6.98.194: bytes=128 time=126ms TTL=111
Reply from 107.6.98.194: bytes=128 time=126ms TTL=111
Reply from 107.6.98.194: bytes=128 time=129ms TTL=111
Reply from 107.6.98.194: bytes=128 time=125ms TTL=111
Reply from 107.6.98.194: bytes=128 time=122ms TTL=111

Ping statistics for 107.6.98.194:
    Packets: Sent = 10, Received = 10, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 122ms, Maximum = 129ms, Average = 125ms
Control-C
^C
C:\Windows\system32>ping 107.6.98.194 -l 64 -t

Pinging 107.6.98.194 with 64 bytes of data:
Reply from 107.6.98.194: bytes=64 time=123ms TTL=111
Reply from 107.6.98.194: bytes=64 time=125ms TTL=111
Reply from 107.6.98.194: bytes=64 time=124ms TTL=111
Reply from 107.6.98.194: bytes=64 time=120ms TTL=111
Reply from 107.6.98.194: bytes=64 time=127ms TTL=111
Reply from 107.6.98.194: bytes=64 time=125ms TTL=111
Reply from 107.6.98.194: bytes=64 time=125ms TTL=111
Reply from 107.6.98.194: bytes=64 time=128ms TTL=111
Reply from 107.6.98.194: bytes=64 time=53ms TTL=111
Reply from 107.6.98.194: bytes=64 time=124ms TTL=111
Reply from 107.6.98.194: bytes=64 time=124ms TTL=111
Reply from 107.6.98.194: bytes=64 time=124ms TTL=111

Ping statistics for 107.6.98.194:
    Packets: Sent = 12, Received = 12, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 53ms, Maximum = 128ms, Average = 118ms
Control-C
^C
C:\Windows\system32>ping 107.6.98.194 -t

Pinging 107.6.98.194 with 32 bytes of data:
Reply from 107.6.98.194: bytes=32 time=121ms TTL=111
Reply from 107.6.98.194: bytes=32 time=124ms TTL=111
Reply from 107.6.98.194: bytes=32 time=126ms TTL=111
Reply from 107.6.98.194: bytes=32 time=128ms TTL=111
Reply from 107.6.98.194: bytes=32 time=122ms TTL=111
Reply from 107.6.98.194: bytes=32 time=126ms TTL=111

Ping statistics for 107.6.98.194:
    Packets: Sent = 6, Received = 6, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 121ms, Maximum = 128ms, Average = 124ms
Control-C
^C
C:\Windows\system32>tracert 107.6.98.194

Tracing route to 107.6.98.194 over a maximum of 30 hops

  1    15 ms    18 ms    13 ms  70-36-141-1.dsl.dynamic.sonic.net [70.36.141.1]

  2    13 ms    13 ms    13 ms  gig1-30.cr1.colaca01.sonic.net [70.36.228.57]
  3    14 ms    13 ms    13 ms  po3.cr1.lsatca11.sonic.net [75.101.33.166]
  4    81 ms    84 ms    81 ms  0.xe-5-1-0.gw.pao1.sonic.net [69.12.211.1]
  5    85 ms    78 ms    84 ms  xe-1-0-6.ar1.pao1.us.nlayer.net [69.22.130.85]
  6    80 ms    79 ms    81 ms  ae0-90g.cr1.pao1.us.nlayer.net [69.22.153.18]
  7    14 ms    13 ms    16 ms  ae1-70g.cr1.sjc1.us.nlayer.net [69.22.143.165]
  8    80 ms    84 ms    81 ms  ae1-40g.ar2.sjc1.us.nlayer.net [69.22.143.118]
  9    81 ms    84 ms    87 ms  as3549.xe-8-0-4.ar2.sjc1.us.nlayer.net [69.22.13
0.142]
 10    85 ms    84 ms    84 ms  ae12-10G-scr4.SNV2.gblx.net [67.16.146.57]
 11    60 ms    59 ms    59 ms  po5.ar3.DAL2.gblx.net [67.16.138.209]
 12   125 ms   125 ms   123 ms  INTERNAP.TenGigabitEthernet1-2.ar3.DAL2.gblx.net
 [208.51.41.58]
 13    60 ms    59 ms    56 ms  border2.te4-1-bbnet2.dal006.pnap.net [216.52.191
.67]
 14   130 ms   122 ms   128 ms  inapvoxcust-14.border2.dal006.pnap.net [74.201.5
3.94]
 15    54 ms    54 ms    54 ms  107.6.98.194

Trace complete.

C:\Windows\system32>
by erickpk » Sat Mar 02, 2013 12:56 pm
Using Wireshark, and just running through a quick 2 min online, I send between 56 and 89 bytes per frame. The game sends between 300 and 600 bytes per frame. Again though, this was just 2 min online and I didn't actually do much except fire a gun aimlessly.
by toast0 » Sat Mar 02, 2013 9:40 pm
Response times of the intermediate routers don't necessarily indicate an issue; those are busy routers (10Gigabit), and they may deprioritize sending back the TTL expired packets that make traceroute work.

Without data from both sides, it'll be hard to figure out what's going on; I would put money on this being close to the server though. Either a problem on the servers themselves, network capacity issues between the server and the directly upstream switch(es) (possibly only in bursts), network capacity issues between those switches and the border routers at internap, or capacity issues between internap and other networks (global crossing in the midwest case, teliasonera in the san jose case). It would be super nice to see what the reverse routes looked like, but I can't find a pubcli internap traceroute (looking glass) site :(

If you can tell, I wonder if an usually latency or packet loss incident induces the client or server to send more packets than usual (to resync). If so, and they're running close to the network capacity somewhere, having an incident will cause more traffic which may cause more problems.
by erickpk » Mon Mar 04, 2013 10:08 am
Any way for a Sonic engineer to come in and give us an assist?
by erickpk » Fri Mar 08, 2013 11:10 am
erickpk wrote:
Any way for a Sonic engineer to come in and give us an assist?

please?
by mmerner » Fri Mar 08, 2013 12:03 pm
Erick,

I haven't played Tribes with any competitiveness since a time that all lag could firmly be blamed on a 56k dialup connection. I guess this day and age low latency and lack of lag is a much bigger deal :D. I'm going to try my best to explain what can be concluded from your network testing, though I suspect you may not like the end result.

First and foremost, you stated that the lag spikes and choppiness/warping is not a problem unique to yourself, but reported by a large subset of players. Unless all these players are on our network (I'd like to believe Sonic is that big, but sincerely doubt it), it leaves us with two options for the source of the lag. The first, is that there is a shared upstream provider (large intermediary internet backbone) used by both us and some other ISPs which is having some sort of ongoing routing issue. The second is there is an issue inside of Hi-Rez's network or servers. I would suspect it is their servers due to the fact they are the single (geographically diverse) commonality between all the problems. It is possible if they were running identical network architecture in both their datacenters, an network issue could arise in tandem at both locations but that is highly unlikely.

Your MTRs both seem accurate on latency to the given geographical locations of the 2 Tribes servers. As toast0 correctly stated, ICMP (the protocol for network testing by ping and traceroute) is the first thing to be deprioritized by all routers if they are busy for any reason. Thus when you see latency spikes on a few hops in a traceroute, it does not actually indicate any problem. For there to be an actual issue, the latency spike would have to follow every single hop to the end of the traceroute/mtr. All of the data you have provided shows perfectly normal and decongested network paths. Now forward and return paths almost always take different routes so without return traceroutes from the Hi-Rez servers towards your IP, we have an incomplete view of your issue. If you can somehow get them to provide one to you, it could give more useful information, though I suspect that to be unlikely. The combination of forward and reverse traceroutes can be used to identify the exact location of a network issue when one exists, but as I don't see indication of one in the traceroutes you provided, trying to acquire a reverse would be a waste of time and energy.

If Hi-Rez provides any more specific information on what they believe the client ISP problem to be and/or how it manifests, I'd be more than happy to investigate more. As of right now, having reviewed your data, my personal opinion is that the problem likely lies with the servers themselves. The one key questions that comes to mind is, does everyone on the server experience the chop/warp at exactly the same time?

Cheers,
Matt
--
Sonic.net NOC
by erickpk » Sat Mar 09, 2013 11:55 am
mmerner wrote:
The one key questions that comes to mind is, does everyone on the server experience the chop/warp at exactly the same time?

Cheers,
Matt


Thanks Matt, I appreciate the reply... even if it's not really what I wanted to hear.

There will be times where everyone notices huge lag spikes at once. There will be a collective gasp over Mumble, "whoa, did you feel that one?" To which everyone replies, "yeah". Having said that, yes, everyone complains about lag. Literally no one on our team says game play is good, and we're spread out across the country, with no two guys having the same isp.

The warping and stuttering in the video is rare, but over the course of a 25 minute match everyone will see it a few times. Most of the time I can't capture evidence on video because it's just pings spiking. Spikes are usually 3x of my baseline ping, so if I'm pinging 9 to the server, it will spike to 30; likewise, a ping of 50 will spike to 150.

These spikes of 3x occur on the last hop, on the server itself. I'll edit in an example once I get downstairs.

Should I ask Hi-Rez to ask internap to do reverse traceroutes? If it's pointless I won't press the issue.

***edit***

Here are two examples of pings spiking 3x on the final hop, on the server itself:

Code: Select all

na central 6pm pst

|------------------------------------------------------------------------------------------|
|                                      WinMTR statistics                                   |
|                       Host              -   %  | Sent | Recv | Best | Avrg | Wrst | Last |
|------------------------------------------------|------|------|------|------|------|------|
|       70-36-141-1.dsl.dynamic.sonic.net -    0 |  922 |  922 |    7 |    9 |   17 |    9 |
|          gig1-30.cr1.colaca01.sonic.net -    0 |  922 |  922 |    7 |    8 |   97 |    7 |
|              po3.cr1.lsatca11.sonic.net -    1 |  918 |  917 |    7 |    8 |   46 |    8 |
|            0.xe-5-1-0.gw.pao1.sonic.net -    0 |  922 |  922 |    7 |    9 |   70 |    8 |
|         xe-1-0-6.ar1.pao1.us.nlayer.net -    0 |  922 |  922 |    8 |   10 |   38 |   10 |
|          ae0-90g.cr1.pao1.us.nlayer.net -    1 |  918 |  917 |    7 |   10 |   69 |    8 |
|          ae1-70g.cr1.sjc1.us.nlayer.net -    0 |  922 |  922 |    8 |   12 |   59 |    8 |
|          ae1-40g.ar2.sjc1.us.nlayer.net -    2 |  875 |  863 |    9 |   11 |   45 |   10 |
|  as3549.xe-8-0-4.ar2.sjc1.us.nlayer.net -    0 |  922 |  922 |   10 |   15 |   61 |   10 |
|             ae13-20G-scr3.SNV2.gblx.net -    1 |  918 |  917 |   10 |   11 |   55 |   11 |
|                   po4.ar3.DAL2.gblx.net -    1 |  918 |  917 |   52 |   59 |  259 |   54 |
|INTERNAP.TenGigabitEthernet1-2.ar3.DAL2.gblx.net -    1 |  918 |  917 |   52 |   52 |   78 |   52 |
|    border2.te3-1-bbnet1.dal006.pnap.net -    0 |  922 |  922 |   52 |   55 |  258 |   53 |
|  inapvoxcust-13.border1.dal006.pnap.net -    0 |  922 |  922 |   52 |   55 |   77 |   54 |
|                            107.6.98.194 -    0 |  922 |  922 |   52 |   55 |  123 |   53 |
|________________________________________________|______|______|______|______|______|______|
   WinMTR v0.92 GPL V2 by Appnor MSP - Fully Managed Hosting & Cloud Provider

na central 8pm pst

|------------------------------------------------------------------------------------------|
|                                      WinMTR statistics                                   |
|                       Host              -   %  | Sent | Recv | Best | Avrg | Wrst | Last |
|------------------------------------------------|------|------|------|------|------|------|
|       70-36-141-1.dsl.dynamic.sonic.net -    1 | 5654 | 5648 |    6 |    8 |  223 |    8 |
|          gig1-30.cr1.colaca01.sonic.net -    1 | 5669 | 5667 |    6 |    7 |   97 |    7 |
|              po3.cr1.lsatca11.sonic.net -    1 | 5662 | 5658 |    7 |    8 |  104 |    8 |
|            0.xe-5-1-0.gw.pao1.sonic.net -    0 | 5677 | 5677 |    7 |    9 |   99 |    8 |
|         xe-1-0-6.ar1.pao1.us.nlayer.net -    1 | 5657 | 5652 |    8 |   10 |  109 |    9 |
|          ae0-90g.cr1.pao1.us.nlayer.net -    1 | 5665 | 5662 |    7 |    9 |   88 |    8 |
|          ae1-70g.cr1.sjc1.us.nlayer.net -    0 | 5676 | 5676 |    7 |   12 |   95 |    9 |
|          ae1-40g.ar2.sjc1.us.nlayer.net -    1 | 5530 | 5493 |    9 |   11 |  107 |   10 |
|  as3549.xe-8-0-4.ar2.sjc1.us.nlayer.net -    0 | 5676 | 5676 |    9 |   15 |  103 |   10 |
|             ae13-20G-scr3.SNV2.gblx.net -    0 | 5676 | 5676 |   10 |   12 |  102 |   11 |
|                   po4.ar3.DAL2.gblx.net -    0 | 5676 | 5676 |   52 |   61 |  777 |   63 |
|INTERNAP.TenGigabitEthernet1-2.ar3.DAL2.gblx.net -    0 | 5676 | 5676 |   51 |   52 |  134 |   52 |
|    border1.te3-1-bbnet1.dal006.pnap.net -    0 | 5676 | 5676 |   52 |   55 |  258 |   52 |
|  inapvoxcust-14.border2.dal006.pnap.net -    0 | 5676 | 5676 |   52 |   55 |  269 |   60 |
|                            107.6.98.194 -    1 | 5669 | 5667 |   52 |   54 |  153 |   53 |
|________________________________________________|______|______|______|______|______|______|
   WinMTR v0.92 GPL V2 by Appnor MSP - Fully Managed Hosting & Cloud Provider



ps... reddit loves you: http://www.reddit.com/r/Tribes/comments ... t_my_mtrs/
by toast0 » Sat Mar 09, 2013 8:13 pm
Re: http://www.reddit.com/r/Tribes/comments ... rs/c8spkye
Who/what is the "toast0" that he referred to?


I am amused :)
16 posts Page 1 of 2

Who is online

In total there are 15 users online :: 2 registered, 0 hidden and 13 guests (based on users active over the past 5 minutes)
Most users ever online was 700 on Thu Jun 18, 2020 12:00 pm

Users browsing this forum: Bing [Bot], Google [Bot] and 13 guests