static IPs not being routed to if idle

Internet access discussion, including Fusion, IP Broadband, and Gigabit Fiber!
33 posts Page 2 of 4
by waynesung » Thu Feb 21, 2013 1:56 pm
I read through your original post once more to be sure I got the whole picture. My previous post regarding the two separate devices does not apply, since you have static IP's and don't need to NAT them. Can you run tcpdump in the affected machine to see if you get packets from the outside when the machine is in the unreachable condition?
by waynesung » Mon Feb 25, 2013 10:18 pm
Using tcpdump or an equivalent packet capture program will allow you to see which side is not behaving when your host becomes invisible to the internet. I'm assuming you have a way to ping your host from outside.

My router is 192.168.0.1, you will be looking for Sonic's default router address which should be stored in your host. My test host is 192.168.0.249, substitute yours.

If Sonic's router does not have an arp for your host, then the first thing it should do is get that arp:
13:43:36.132561 arp who-has 192.168.0.249 tell 192.168.0.1
13:43:36.132606 arp reply 192.168.0.249 is-at xx:xx:xx:xx:xx:xx

If it already has the arp then the ping will proceed immediately:
13:43:36.132812 IP 192.168.0.1 > 192.168.0.249: ICMP echo request, id 5708, seq 0, length 64
13:43:36.132858 IP 192.168.0.249 > 192.168.0.1: ICMP echo reply, id 5708, seq 0, length 64

You said that after just a few minutes of inactivity the host will disappear. All the routers I work on normally have arp timeouts of hours, not minutes. If anyone from Sonic is reading this and can give us an exact number, that would be appreciated.
by doctorfb » Tue Feb 26, 2013 5:31 pm
Yes, I have an external method of pinging to verify.
I've run wireshark (and tcpdump, though they essentially do the same thing) and verified that my hosts are not receiving any packets once the magic amount of time has passed (and I haven't done an explicit ping out).
I agree Sonic's routers should be doing discovery, but it doesn't not appear they are for whatever reason. And, once I pingout, there's no need for Sonic's router to do an arp who-has for a cache entry it already has (until it expires).
When I talked with a Sonic support rep he mentioned something on the order of 2-3 minutes. In actual testing, on average, it appears to be 60 seconds.
If this were a plain 'ol Legacy DSL connection I'd plug in my old Zoom modem, but since I'm on Fusion (dual-bonded at that), I can't do that and I'm not willing to go buy yet another ADSL modem just to verify what I suspect isn't something I can fix on my side of the line.
Any other suggestions?
by tdo » Wed Feb 27, 2013 12:49 pm
Hi,

I've taken a look at your line and I don't think this is an issue on our side. The router that acts as your default gateway has a 4 hour ARP table timeout. The DSLAM that sits between you and this router essentially does not time out MAC entries. If the entry on the router does happen to time out, the DSLAM will very quickly refresh the table on the router.

In normal operation, we should see an ARP who-has from your hosts at whatever interval they time out their cached entry of your default gateways MAC address. I've been watching your port for the last hour and I have not seen a single ARP request from you. Either you have a static ARP entry for your default gateway programmed into all of your hosts or something else is going on. During the same time period we have not lost your MAC entry in our table and your modem remains ATM pingable.

My suggestion would be to hook a single non-virtualized host directly to your modem (double check that you are indeed in bridged mode), assign it one of your static IPs, and see if the same problem occurs. This should help isolate where the problem is occurring.

-Tim
by doctorfb » Wed Feb 27, 2013 5:33 pm
Greetings, Tim,
Thanks for looking into this. To answer some of your questions:
I have a ZyXEL P-663HN-51 which is a combo dual/bonded-ADSL + 4-port LAN router + Wireless AP.
It is configured in bridged mode in accordance with your wiki article, expect it does not have a static IP assigned to it from my pool. It is configured with the standard 192.168.1.1 address for admin functions.
Since it is a combo-router there's no effective way to plug in directly to just the modem.
I've connected a non-vm machine to IP 173.228.5.243 (MAC:00:1a:92:78:2f:d4), plugged into LAN port #1 on the ZyXEL.
I have wireshark active and tracing the interface (eth1).
This machine has two interfaces, one of which goes to my internal network (eth0) and has the default route for the machine, and the other (eth1) temporarily connected to the ZyXEL.
For these tests eth1 is not the default route interface, so only specific traffic will get out through it. Thus, it's very quite unless actively engaged for these tests.

When everything is fresh (ie: I've just pinged out to the Sonic router), I run a traceroute test from http://net.bluemoon.net to 173.228.5.243 on another machine. Where's the output:

traceroute to 173.228.5.243 (173.228.5.243), 64 hops max, 40 byte packets
1 gatekeeper (64.200.84.2) 1.435 ms 1.346 ms 1.118 ms
2 250.ATM1-0.GW10.NYC9.ALTER.NET (63.125.96.5) 26.729 ms 29.115 ms 30.596 ms
3 545.at-6-0-0.XR2.NYC9.ALTER.NET (152.63.24.234) 30.097 ms 29.661 ms 31.483 ms
4 0.so-4-0-1.XT2.NYC9.ALTER.NET (152.63.9.90) 23.507 ms 26.102 ms 24.486 ms
5 0.xe-5-1-0.BR2.NYC4.ALTER.NET (152.63.21.221) 24.162 ms 27.506 ms 23.957 ms
6 te9-2-0d0.cir1.nyc-ny.us.xo.net (206.111.13.125) 28.639 ms 24.097 ms 25.965 ms
7 207.88.14.185.ptr.us.xo.net (207.88.14.185) 30.779 ms 37.133 ms 51.152 ms
8 te-11-0-0.rar3.sanjose-ca.us.xo.net (207.88.12.69) 96.549 ms 104.921 ms 110.687 ms
9 207.88.14.226.ptr.us.xo.net (207.88.14.226) 96.227 ms 100.337 ms 101.597 ms
10 0.xe-4-1-0.gw3.equinix-sj.sonic.net (216.156.84.102) 100.906 ms 105.718 ms 95.892 ms
11 tengig2-1.cr1.snjsca11.sonic.net (64.142.0.106) 106.891 ms 108.801 ms 109.155 ms
12 gig1-1-1.gw.snjsca11.sonic.net (70.36.230.6) 102.648 ms 100.160 ms 99.083 ms
13 gig1-1-2.gw.lsatca11.sonic.net (70.36.243.10) 99.299 ms 100.429 ms 106.379 ms
14 173-228-5-243.dsl.static.sonic.net (173.228.5.243) 124.054 ms 103.771 ms 131.828 ms

Looks good, and I see packets coming in via wireshark.
When everything is fresh, I see UDP requests coming from 64.200.84.10, as you'd expect for traceroute.

The only 'who-has' requests I see, however, are from my host, any only when I actively make a connection out. Eg: Here's one when I ping out to net.bluemoon.net(64.200.84.10):

232 6710.081292000 AsustekC_78:2f:d4 Broadcast ARP 42 Who has 64.200.84.10? Tell 173.228.5.243
233 6710.091931000 Cisco_8b:52:c6 AsustekC_78:2f:d4 ARP 64 64.200.84.10 is at 64:16:8d:8b:52:c6 [ETHERNET FRAME CHECK SEQUENCE INCORRECT]

Here's another ping out doing a who-has for the Sonic default router (173.228.5.1):

283 7127.295263000 AsustekC_78:2f:d4 Cisco_8b:52:c6 ARP 42 Who has 173.228.5.1? Tell 173.228.5.243
284 7127.307748000 Cisco_8b:52:c6 AsustekC_78:2f:d4 ARP 64 173.228.5.1 is at 64:16:8d:8b:52:c6 [ETHERNET FRAME CHECK SEQUENCE INCORRECT]

'AsustekC_78:2f:d4' is my computer.
'Cisco_8b:52:c6' appears to be 173.228.5.1 (the Sonic router IP I've been told to use for a default router for my subnet.)
The 'FRAME CHECK SEQUENCE INCORRECT' message is because this reply packet from the Sonic router does not have a correct sequence number:
Frame check sequence: 0x00000000 [incorrect, should be 0x9290099d]

I never see any 'who-has' requests coming from Sonic's side of things.
When everything magically times out (appears to be about 6 minutes with this machine) the above traceroute stops at hop #13:
...
12 gig1-1-1.gw.snjsca11.sonic.net (70.36.230.6) 105.351 ms 106.285 ms 101.635 ms
13 gig1-1-2.gw.lsatca11.sonic.net (70.36.243.10) 112.018 ms 104.457 ms 104.358 ms
14 * * *
15 * * *
16 * * *
17 * *

When a timeout was occuring, If I do a traceroute and in the middle of that, I do a ping out to the Sonic router. The result was this:
...
12 gig1-1-1.gw.snjsca11.sonic.net (70.36.230.6) 111.335 ms 98.924 ms 98.283 ms
13 gig1-1-2.gw.lsatca11.sonic.net (70.36.243.10) 98.439 ms 99.736 ms 106.464 ms
14 * * *
15 173-228-5-243.dsl.static.sonic.net (173.228.5.243) 103.684 ms 105.170 ms 105.414 ms

You have some comments that says:
> In normal operation, we should see an ARP who-has from your hosts at whatever interval they time out their cached entry of your default gateways MAC address.

I don't see how this is likely. My hosts would have to be doing something active outbound in order to generate a who-has request. They would have no reason to refresh their cache otherwise. Again, outbound is not the problem. If my machines are quiet, they never receive any inbound requests.
Perhaps you were talking about the Sonic router doing a who-has to discover my machines? I don't see any such requests coming over the DSL line.

> I've been watching your port for the last hour and I have not seen a single ARP request from you.

Hmm... I have a ping out running every 60 seconds from two of my hosts. It's possible that this is causing implicit cache refresh on both ends, but I'm doing that specifically because I don't want the hosts to become unreachable. I did have these pings running every 120 seconds, but I found that there was still a variable window in which my hosts become unreachable, hence I ping every 60 seconds now.

> Either you have a static ARP entry for your default gateway programmed into all of your hosts or something else is going on.

No static ARP entries. I do have a default IP route going to the Sonic router, but no static ARP entries. My understanding of network protocol is that routers do segment discovery to find their endpoints. Endpoints (hosts) only do discovery when they need to make an out-bound connection. In this way, the routers know which host is on which segment and can route packets effectively. The hosts should not need to actively refresh their cache except when needing to send packets out for an unresolved IP. This is a bit simplistic, but essentially gets the point across.

> During the same time period we have not lost your MAC entry in our table and your modem remains ATM pingable.

I suspect you're cache is implicitly being refreshed by me because I don't want my hosts to become unaddressable. I have 4 IPs. Two are actively pinging out. One runs in stealth any only does NTP requests. The last one is this non-vm test host.
For your testing, I'll leave up my host on .243. Feel free to do some testing of your own.
Note that since this interface is separated, it has only one subnet that it knows about: 173.228.5.0
I had to add an explicit route for the bluemoon hosts subnet to ensure ping responses went back out through the same interface. I can add some additional explicit routes to any Sonic subnets if you need me to.

Thanks in advance for anything you can discovery about my problem!
by tdo » Wed Feb 27, 2013 9:53 pm
doctorfb wrote:Greetings, Tim,
Thanks for looking into this. To answer some of your questions:
I have a ZyXEL P-663HN-51 which is a combo dual/bonded-ADSL + 4-port LAN router + Wireless AP.
It is configured in bridged mode in accordance with your wiki article, expect it does not have a static IP assigned to it from my pool. It is configured with the standard 192.168.1.1 address for admin functions.
Since it is a combo-router there's no effective way to plug in directly to just the modem.
I've connected a non-vm machine to IP 173.228.5.243 (MAC:00:1a:92:78:2f:d4), plugged into LAN port #1 on the ZyXEL.
I have wireshark active and tracing the interface (eth1).
This machine has two interfaces, one of which goes to my internal network (eth0) and has the default route for the machine, and the other (eth1) temporarily connected to the ZyXEL.
For these tests eth1 is not the default route interface, so only specific traffic will get out through it. Thus, it's very quite unless actively engaged for these tests.

When everything is fresh (ie: I've just pinged out to the Sonic router), I run a traceroute test from http://net.bluemoon.net to 173.228.5.243 on another machine. Where's the output:

traceroute to 173.228.5.243 (173.228.5.243), 64 hops max, 40 byte packets
1 gatekeeper (64.200.84.2) 1.435 ms 1.346 ms 1.118 ms
2 250.ATM1-0.GW10.NYC9.ALTER.NET (63.125.96.5) 26.729 ms 29.115 ms 30.596 ms
3 545.at-6-0-0.XR2.NYC9.ALTER.NET (152.63.24.234) 30.097 ms 29.661 ms 31.483 ms
4 0.so-4-0-1.XT2.NYC9.ALTER.NET (152.63.9.90) 23.507 ms 26.102 ms 24.486 ms
5 0.xe-5-1-0.BR2.NYC4.ALTER.NET (152.63.21.221) 24.162 ms 27.506 ms 23.957 ms
6 te9-2-0d0.cir1.nyc-ny.us.xo.net (206.111.13.125) 28.639 ms 24.097 ms 25.965 ms
7 207.88.14.185.ptr.us.xo.net (207.88.14.185) 30.779 ms 37.133 ms 51.152 ms
8 te-11-0-0.rar3.sanjose-ca.us.xo.net (207.88.12.69) 96.549 ms 104.921 ms 110.687 ms
9 207.88.14.226.ptr.us.xo.net (207.88.14.226) 96.227 ms 100.337 ms 101.597 ms
10 0.xe-4-1-0.gw3.equinix-sj.sonic.net (216.156.84.102) 100.906 ms 105.718 ms 95.892 ms
11 tengig2-1.cr1.snjsca11.sonic.net (64.142.0.106) 106.891 ms 108.801 ms 109.155 ms
12 gig1-1-1.gw.snjsca11.sonic.net (70.36.230.6) 102.648 ms 100.160 ms 99.083 ms
13 gig1-1-2.gw.lsatca11.sonic.net (70.36.243.10) 99.299 ms 100.429 ms 106.379 ms
14 173-228-5-243.dsl.static.sonic.net (173.228.5.243) 124.054 ms 103.771 ms 131.828 ms

Looks good, and I see packets coming in via wireshark.
When everything is fresh, I see UDP requests coming from 64.200.84.10, as you'd expect for traceroute.

The only 'who-has' requests I see, however, are from my host, any only when I actively make a connection out. Eg: Here's one when I ping out to net.bluemoon.net(64.200.84.10):

232 6710.081292000 AsustekC_78:2f:d4 Broadcast ARP 42 Who has 64.200.84.10? Tell 173.228.5.243
233 6710.091931000 Cisco_8b:52:c6 AsustekC_78:2f:d4 ARP 64 64.200.84.10 is at 64:16:8d:8b:52:c6 [ETHERNET FRAME CHECK SEQUENCE INCORRECT]
Something is not right here. Why is your computer sending an ARP request for 64.200.84.10? This implies that your computer thinks it's on the same layer 2 segment as this host, which it should not be.

That second line is actually our DSLAM responding to the ARP request with the MAC address of the gateway, on behalf of the gateway router. I know it seems a little strange but this is expected and valid behavior.
doctorfb wrote: Here's another ping out doing a who-has for the Sonic default router (173.228.5.1):

283 7127.295263000 AsustekC_78:2f:d4 Cisco_8b:52:c6 ARP 42 Who has 173.228.5.1? Tell 173.228.5.243
284 7127.307748000 Cisco_8b:52:c6 AsustekC_78:2f:d4 ARP 64 173.228.5.1 is at 64:16:8d:8b:52:c6 [ETHERNET FRAME CHECK SEQUENCE INCORRECT]

'AsustekC_78:2f:d4' is my computer.
'Cisco_8b:52:c6' appears to be 173.228.5.1 (the Sonic router IP I've been told to use for a default router for my subnet.)
The 'FRAME CHECK SEQUENCE INCORRECT' message is because this reply packet from the Sonic router does not have a correct sequence number:
Frame check sequence: 0x00000000 [incorrect, should be 0x9290099d]

I never see any 'who-has' requests coming from Sonic's side of things.
When everything magically times out (appears to be about 6 minutes with this machine) the above traceroute stops at hop #13:
...
12 gig1-1-1.gw.snjsca11.sonic.net (70.36.230.6) 105.351 ms 106.285 ms 101.635 ms
13 gig1-1-2.gw.lsatca11.sonic.net (70.36.243.10) 112.018 ms 104.457 ms 104.358 ms
14 * * *
15 * * *
16 * * *
17 * *
You likely won't ever see any ARP requests coming from us. Once your MAC address is in the DSLAM table, the DSLAM will handle any ARP who-has requests from our gateway router.
doctorfb wrote:
When a timeout was occuring, If I do a traceroute and in the middle of that, I do a ping out to the Sonic router. The result was this:
...
12 gig1-1-1.gw.snjsca11.sonic.net (70.36.230.6) 111.335 ms 98.924 ms 98.283 ms
13 gig1-1-2.gw.lsatca11.sonic.net (70.36.243.10) 98.439 ms 99.736 ms 106.464 ms
14 * * *
15 173-228-5-243.dsl.static.sonic.net (173.228.5.243) 103.684 ms 105.170 ms 105.414 ms

You have some comments that says:
> In normal operation, we should see an ARP who-has from your hosts at whatever interval they time out their cached entry of your default gateways MAC address.

I don't see how this is likely. My hosts would have to be doing something active outbound in order to generate a who-has request. They would have no reason to refresh their cache otherwise. Again, outbound is not the problem. If my machines are quiet, they never receive any inbound requests.
Perhaps you were talking about the Sonic router doing a who-has to discover my machines? I don't see any such requests coming over the DSL line.
Most operating systems time out ARP cache entries at a set interval, usually some number of minutes, to prevent the cache from becoming stale.
doctorfb wrote:
> I've been watching your port for the last hour and I have not seen a single ARP request from you.

Hmm... I have a ping out running every 60 seconds from two of my hosts. It's possible that this is causing implicit cache refresh on both ends, but I'm doing that specifically because I don't want the hosts to become unreachable. I did have these pings running every 120 seconds, but I found that there was still a variable window in which my hosts become unreachable, hence I ping every 60 seconds now.

> Either you have a static ARP entry for your default gateway programmed into all of your hosts or something else is going on.

No static ARP entries. I do have a default IP route going to the Sonic router, but no static ARP entries. My understanding of network protocol is that routers do segment discovery to find their endpoints. Endpoints (hosts) only do discovery when they need to make an out-bound connection. In this way, the routers know which host is on which segment and can route packets effectively. The hosts should not need to actively refresh their cache except when needing to send packets out for an unresolved IP. This is a bit simplistic, but essentially gets the point across.

> During the same time period we have not lost your MAC entry in our table and your modem remains ATM pingable.

I suspect you're cache is implicitly being refreshed by me because I don't want my hosts to become unaddressable. I have 4 IPs. Two are actively pinging out. One runs in stealth any only does NTP requests. The last one is this non-vm test host.
For your testing, I'll leave up my host on .243. Feel free to do some testing of your own.
Note that since this interface is separated, it has only one subnet that it knows about: 173.228.5.0
I had to add an explicit route for the bluemoon hosts subnet to ensure ping responses went back out through the same interface. I can add some additional explicit routes to any Sonic subnets if you need me to.

Thanks in advance for anything you can discovery about my problem!
Again I can assure you that our cache is not being updated by your pings, I did not see any ARP traffic from you for over an hour. This is not a problem though because the MAC entries on our side never time out. There is something else going on here. I encourage you to take a look at your ARP cache and routing table before and after the problem is occurring.
by doctorfb » Thu Feb 28, 2013 11:00 am
tdo wrote:
doctorfb wrote:Greetings, Tim,
Thanks for looking into this. To answer some of your questions:
I have a ZyXEL P-663HN-51 which is a combo dual/bonded-ADSL + 4-port LAN router + Wireless AP.
It is configured in bridged mode in accordance with your wiki article, expect it does not have a static IP assigned to it from my pool. It is configured with the standard 192.168.1.1 address for admin functions.
Since it is a combo-router there's no effective way to plug in directly to just the modem.
I've connected a non-vm machine to IP 173.228.5.243 (MAC:00:1a:92:78:2f:d4), plugged into LAN port #1 on the ZyXEL.
I have wireshark active and tracing the interface (eth1).
This machine has two interfaces, one of which goes to my internal network (eth0) and has the default route for the machine, and the other (eth1) temporarily connected to the ZyXEL.
For these tests eth1 is not the default route interface, so only specific traffic will get out through it. Thus, it's very quite unless actively engaged for these tests.

When everything is fresh (ie: I've just pinged out to the Sonic router), I run a traceroute test from http://net.bluemoon.net to 173.228.5.243 on another machine. Where's the output:

traceroute to 173.228.5.243 (173.228.5.243), 64 hops max, 40 byte packets
1 gatekeeper (64.200.84.2) 1.435 ms 1.346 ms 1.118 ms
2 250.ATM1-0.GW10.NYC9.ALTER.NET (63.125.96.5) 26.729 ms 29.115 ms 30.596 ms
3 545.at-6-0-0.XR2.NYC9.ALTER.NET (152.63.24.234) 30.097 ms 29.661 ms 31.483 ms
4 0.so-4-0-1.XT2.NYC9.ALTER.NET (152.63.9.90) 23.507 ms 26.102 ms 24.486 ms
5 0.xe-5-1-0.BR2.NYC4.ALTER.NET (152.63.21.221) 24.162 ms 27.506 ms 23.957 ms
6 te9-2-0d0.cir1.nyc-ny.us.xo.net (206.111.13.125) 28.639 ms 24.097 ms 25.965 ms
7 207.88.14.185.ptr.us.xo.net (207.88.14.185) 30.779 ms 37.133 ms 51.152 ms
8 te-11-0-0.rar3.sanjose-ca.us.xo.net (207.88.12.69) 96.549 ms 104.921 ms 110.687 ms
9 207.88.14.226.ptr.us.xo.net (207.88.14.226) 96.227 ms 100.337 ms 101.597 ms
10 0.xe-4-1-0.gw3.equinix-sj.sonic.net (216.156.84.102) 100.906 ms 105.718 ms 95.892 ms
11 tengig2-1.cr1.snjsca11.sonic.net (64.142.0.106) 106.891 ms 108.801 ms 109.155 ms
12 gig1-1-1.gw.snjsca11.sonic.net (70.36.230.6) 102.648 ms 100.160 ms 99.083 ms
13 gig1-1-2.gw.lsatca11.sonic.net (70.36.243.10) 99.299 ms 100.429 ms 106.379 ms
14 173-228-5-243.dsl.static.sonic.net (173.228.5.243) 124.054 ms 103.771 ms 131.828 ms

Looks good, and I see packets coming in via wireshark.
When everything is fresh, I see UDP requests coming from 64.200.84.10, as you'd expect for traceroute.

The only 'who-has' requests I see, however, are from my host, any only when I actively make a connection out. Eg: Here's one when I ping out to net.bluemoon.net(64.200.84.10):

232 6710.081292000 AsustekC_78:2f:d4 Broadcast ARP 42 Who has 64.200.84.10? Tell 173.228.5.243
233 6710.091931000 Cisco_8b:52:c6 AsustekC_78:2f:d4 ARP 64 64.200.84.10 is at 64:16:8d:8b:52:c6 [ETHERNET FRAME CHECK SEQUENCE INCORRECT]
Something is not right here. Why is your computer sending an ARP request for 64.200.84.10? This implies that your computer thinks it's on the same layer 2 segment as this host, which it should not be.
I believe I had done a 'ping net.bluemoon.net' during these tests. As I said, I have a manual route entry for that subnet (64.200.84.) to go through eth1 (otherwise it would have tried to go through the default route, which is on eth0 via my internal network, and would never get through my firewall because of tracking). Discovery necessarily will send a 'who-has' request on the closest segment (which is via eth1) to determine how to route packets for this address. The response seems appropriate, saying that the Sonic router is the closest route, since the ZyXEL is in bridge mode and not a formal route node in the network. This seems correct to me. Here's my arp and route tables after doing said ping:

% ifconfig eth1
eth1 Link encap:Ethernet HWaddr 00:1a:92:78:2f:d4
inet addr:173.228.5.243 Bcast:173.228.5.255 Mask:255.255.255.0
inet6 addr: fe80::21a:92ff:fe78:2fd4/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2453 errors:0 dropped:0 overruns:0 frame:0
TX packets:514 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:283228 (276.5 KiB) TX bytes:43906 (42.8 KiB)
Interrupt:43 Base address:0x4000

% ping -c 1 net.bluemoon.net
PING net.bluemoon.net (64.200.84.10) 56(84) bytes of data.
64 bytes from net.bluemoon.net (64.200.84.10): icmp_req=1 ttl=51 time=484 ms

--- net.bluemoon.net ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 484.781/484.781/484.781/0.000 ms

% arp -a
net.bluemoon.net (64.200.84.10) at 64:16:8d:8b:52:c6 [ether] on eth1
ming.fruitbat.org (192.168.55.2) at 00:0c:29:61:f6:9d [ether] on eth0

% route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.55.1 0.0.0.0 UG 1 0 0 eth0
64.200.84.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo
173.228.5.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
192.168.0.5 0.0.0.0 255.255.255.255 UH 0 0 0 eth0
192.168.2.0 0.0.0.0 255.255.255.0 U 0 0 0 vmnet1
192.168.55.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
192.168.245.0 0.0.0.0 255.255.255.0 U 0 0 0 vmnet8
tdo wrote: That second line is actually our DSLAM responding to the ARP request with the MAC address of the gateway, on behalf of the gateway router. I know it seems a little strange but this is expected and valid behavior.
doctorfb wrote: Here's another ping out doing a who-has for the Sonic default router (173.228.5.1):

283 7127.295263000 AsustekC_78:2f:d4 Cisco_8b:52:c6 ARP 42 Who has 173.228.5.1? Tell 173.228.5.243
284 7127.307748000 Cisco_8b:52:c6 AsustekC_78:2f:d4 ARP 64 173.228.5.1 is at 64:16:8d:8b:52:c6 [ETHERNET FRAME CHECK SEQUENCE INCORRECT]

'AsustekC_78:2f:d4' is my computer.
'Cisco_8b:52:c6' appears to be 173.228.5.1 (the Sonic router IP I've been told to use for a default router for my subnet.)
The 'FRAME CHECK SEQUENCE INCORRECT' message is because this reply packet from the Sonic router does not have a correct sequence number:
Frame check sequence: 0x00000000 [incorrect, should be 0x9290099d]

I never see any 'who-has' requests coming from Sonic's side of things.
When everything magically times out (appears to be about 6 minutes with this machine) the above traceroute stops at hop #13:
...
12 gig1-1-1.gw.snjsca11.sonic.net (70.36.230.6) 105.351 ms 106.285 ms 101.635 ms
13 gig1-1-2.gw.lsatca11.sonic.net (70.36.243.10) 112.018 ms 104.457 ms 104.358 ms
14 * * *
15 * * *
16 * * *
17 * *
You likely won't ever see any ARP requests coming from us. Once your MAC address is in the DSLAM table, the DSLAM will handle any ARP who-has requests from our gateway router.
You stated earlier that your routers have a 4-hour cache timeout. Wouldn't they, then, do a who-has when a packet destined for my subnet comes in? How, exactly, does it build a route to my host initially? Are you expecting my host to generate some outbound traffic on a 4-hour interval in order for your router cache to be primed? Remember, I'm run a server here so it will hardly ever generate any outbound traffic itself. There's simply no need for it to do that.

Here's an experiment for you: purge your ARP cache of my test machine. Then try and ping it. Does your router generate a who-has request? If not, then that's the problem.
tdo wrote:
doctorfb wrote:
When a timeout was occuring, If I do a traceroute and in the middle of that, I do a ping out to the Sonic router. The result was this:
...
12 gig1-1-1.gw.snjsca11.sonic.net (70.36.230.6) 111.335 ms 98.924 ms 98.283 ms
13 gig1-1-2.gw.lsatca11.sonic.net (70.36.243.10) 98.439 ms 99.736 ms 106.464 ms
14 * * *
15 173-228-5-243.dsl.static.sonic.net (173.228.5.243) 103.684 ms 105.170 ms 105.414 ms

You have some comments that says:
> In normal operation, we should see an ARP who-has from your hosts at whatever interval they time out their cached entry of your default gateways MAC address.

I don't see how this is likely. My hosts would have to be doing something active outbound in order to generate a who-has request. They would have no reason to refresh their cache otherwise. Again, outbound is not the problem. If my machines are quiet, they never receive any inbound requests.
Perhaps you were talking about the Sonic router doing a who-has to discover my machines? I don't see any such requests coming over the DSL line.
Most operating systems time out ARP cache entries at a set interval, usually some number of minutes, to prevent the cache from becoming stale.
Yes, but it does not re-populate itself arbitrarily. If I had an active outbound session and it sat idle longer than the cache timeout, the cache entry would be dropped. If I then do something to generate a packet on that session, the system would then do a who-has request to re-discovery the route for that address so the packet can be sent.
tdo wrote:
doctorfb wrote:
> I've been watching your port for the last hour and I have not seen a single ARP request from you.

Hmm... I have a ping out running every 60 seconds from two of my hosts. It's possible that this is causing implicit cache refresh on both ends, but I'm doing that specifically because I don't want the hosts to become unreachable. I did have these pings running every 120 seconds, but I found that there was still a variable window in which my hosts become unreachable, hence I ping every 60 seconds now.

> Either you have a static ARP entry for your default gateway programmed into all of your hosts or something else is going on.

No static ARP entries. I do have a default IP route going to the Sonic router, but no static ARP entries. My understanding of network protocol is that routers do segment discovery to find their endpoints. Endpoints (hosts) only do discovery when they need to make an out-bound connection. In this way, the routers know which host is on which segment and can route packets effectively. The hosts should not need to actively refresh their cache except when needing to send packets out for an unresolved IP. This is a bit simplistic, but essentially gets the point across.

> During the same time period we have not lost your MAC entry in our table and your modem remains ATM pingable.

I suspect you're cache is implicitly being refreshed by me because I don't want my hosts to become unaddressable. I have 4 IPs. Two are actively pinging out. One runs in stealth any only does NTP requests. The last one is this non-vm test host.
For your testing, I'll leave up my host on .243. Feel free to do some testing of your own.
Note that since this interface is separated, it has only one subnet that it knows about: 173.228.5.0
I had to add an explicit route for the bluemoon hosts subnet to ensure ping responses went back out through the same interface. I can add some additional explicit routes to any Sonic subnets if you need me to.

Thanks in advance for anything you can discovery about my problem!
Again I can assure you that our cache is not being updated by your pings, I did not see any ARP traffic from you for over an hour. This is not a problem though because the MAC entries on our side never time out. There is something else going on here. I encourage you to take a look at your ARP cache and routing table before and after the problem is occurring.
Well, I'm certainly open to suggestions as where to look. From my perspective, I'm simply not receiving packets after some magical idle time.
by tdo » Thu Feb 28, 2013 12:33 pm
doctorfb wrote: I believe I had done a 'ping net.bluemoon.net' during these tests. As I said, I have a manual route entry for that subnet (64.200.84.) to go through eth1 (otherwise it would have tried to go through the default route, which is on eth0 via my internal network, and would never get through my firewall because of tracking). Discovery necessarily will send a 'who-has' request on the closest segment (which is via eth1) to determine how to route packets for this address. The response seems appropriate, saying that the Sonic router is the closest route, since the ZyXEL is in bridge mode and not a formal route node in the network. This seems correct to me. Here's my arp and route tables after doing said ping:

% ifconfig eth1
eth1 Link encap:Ethernet HWaddr 00:1a:92:78:2f:d4
inet addr:173.228.5.243 Bcast:173.228.5.255 Mask:255.255.255.0
inet6 addr: fe80::21a:92ff:fe78:2fd4/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2453 errors:0 dropped:0 overruns:0 frame:0
TX packets:514 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:283228 (276.5 KiB) TX bytes:43906 (42.8 KiB)
Interrupt:43 Base address:0x4000

% ping -c 1 net.bluemoon.net
PING net.bluemoon.net (64.200.84.10) 56(84) bytes of data.
64 bytes from net.bluemoon.net (64.200.84.10): icmp_req=1 ttl=51 time=484 ms

--- net.bluemoon.net ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 484.781/484.781/484.781/0.000 ms

% arp -a
net.bluemoon.net (64.200.84.10) at 64:16:8d:8b:52:c6 [ether] on eth1
ming.fruitbat.org (192.168.55.2) at 00:0c:29:61:f6:9d [ether] on eth0

% route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.55.1 0.0.0.0 UG 1 0 0 eth0
64.200.84.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo
173.228.5.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
192.168.0.5 0.0.0.0 255.255.255.255 UH 0 0 0 eth0
192.168.2.0 0.0.0.0 255.255.255.0 U 0 0 0 vmnet1
192.168.55.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
192.168.245.0 0.0.0.0 255.255.255.0 U 0 0 0 vmnet8
Yeah that's definitely not the best way to test this. That's only working because our DSLAM is responding to the ARP request. Again, I encourage you to simplify your configuration here for testing, in order to prove that it's not our router or your modem doing something weird. Put a computer behind the modem with a single static IP assigned to one interface and a single default gateway entry towards the gateway IP address you were given. This should work all day long without you having to initiate any outbound traffic. If it doesn't, then we need to look at why.
doctorfb wrote:
tdo wrote: That second line is actually our DSLAM responding to the ARP request with the MAC address of the gateway, on behalf of the gateway router. I know it seems a little strange but this is expected and valid behavior.
doctorfb wrote: Here's another ping out doing a who-has for the Sonic default router (173.228.5.1):

283 7127.295263000 AsustekC_78:2f:d4 Cisco_8b:52:c6 ARP 42 Who has 173.228.5.1? Tell 173.228.5.243
284 7127.307748000 Cisco_8b:52:c6 AsustekC_78:2f:d4 ARP 64 173.228.5.1 is at 64:16:8d:8b:52:c6 [ETHERNET FRAME CHECK SEQUENCE INCORRECT]

'AsustekC_78:2f:d4' is my computer.
'Cisco_8b:52:c6' appears to be 173.228.5.1 (the Sonic router IP I've been told to use for a default router for my subnet.)
The 'FRAME CHECK SEQUENCE INCORRECT' message is because this reply packet from the Sonic router does not have a correct sequence number:
Frame check sequence: 0x00000000 [incorrect, should be 0x9290099d]

I never see any 'who-has' requests coming from Sonic's side of things.
When everything magically times out (appears to be about 6 minutes with this machine) the above traceroute stops at hop #13:
...
12 gig1-1-1.gw.snjsca11.sonic.net (70.36.230.6) 105.351 ms 106.285 ms 101.635 ms
13 gig1-1-2.gw.lsatca11.sonic.net (70.36.243.10) 112.018 ms 104.457 ms 104.358 ms
14 * * *
15 * * *
16 * * *
17 * *
You likely won't ever see any ARP requests coming from us. Once your MAC address is in the DSLAM table, the DSLAM will handle any ARP who-has requests from our gateway router.
You stated earlier that your routers have a 4-hour cache timeout. Wouldn't they, then, do a who-has when a packet destined for my subnet comes in? How, exactly, does it build a route to my host initially? Are you expecting my host to generate some outbound traffic on a 4-hour interval in order for your router cache to be primed? Remember, I'm run a server here so it will hardly ever generate any outbound traffic itself. There's simply no need for it to do that.

Here's an experiment for you: purge your ARP cache of my test machine. Then try and ping it. Does your router generate a who-has request? If not, then that's the problem.
Yes, but again the entry in the DSLAM table never times out, and this table is what keeps the router ARP cache fed with current entries. We expect your host to generate exactly one ARP who-has request, ever. This populates all the necessary tables and caches. Entries in the DSLAM table are never removed, only overwritten. Even if this was being caused by our router cache entry timing out, it sounds like you are seeing a problem in a matter of minutes, not after 4 hours.
doctorfb wrote:
tdo wrote:
doctorfb wrote:
When a timeout was occuring, If I do a traceroute and in the middle of that, I do a ping out to the Sonic router. The result was this:
...
12 gig1-1-1.gw.snjsca11.sonic.net (70.36.230.6) 111.335 ms 98.924 ms 98.283 ms
13 gig1-1-2.gw.lsatca11.sonic.net (70.36.243.10) 98.439 ms 99.736 ms 106.464 ms
14 * * *
15 173-228-5-243.dsl.static.sonic.net (173.228.5.243) 103.684 ms 105.170 ms 105.414 ms

You have some comments that says:
> In normal operation, we should see an ARP who-has from your hosts at whatever interval they time out their cached entry of your default gateways MAC address.

I don't see how this is likely. My hosts would have to be doing something active outbound in order to generate a who-has request. They would have no reason to refresh their cache otherwise. Again, outbound is not the problem. If my machines are quiet, they never receive any inbound requests.
Perhaps you were talking about the Sonic router doing a who-has to discover my machines? I don't see any such requests coming over the DSL line.
Most operating systems time out ARP cache entries at a set interval, usually some number of minutes, to prevent the cache from becoming stale.
Yes, but it does not re-populate itself arbitrarily. If I had an active outbound session and it sat idle longer than the cache timeout, the cache entry would be dropped. If I then do something to generate a packet on that session, the system would then do a who-has request to re-discovery the route for that address so the packet can be sent.
tdo wrote:
doctorfb wrote:
> I've been watching your port for the last hour and I have not seen a single ARP request from you.

Hmm... I have a ping out running every 60 seconds from two of my hosts. It's possible that this is causing implicit cache refresh on both ends, but I'm doing that specifically because I don't want the hosts to become unreachable. I did have these pings running every 120 seconds, but I found that there was still a variable window in which my hosts become unreachable, hence I ping every 60 seconds now.

> Either you have a static ARP entry for your default gateway programmed into all of your hosts or something else is going on.

No static ARP entries. I do have a default IP route going to the Sonic router, but no static ARP entries. My understanding of network protocol is that routers do segment discovery to find their endpoints. Endpoints (hosts) only do discovery when they need to make an out-bound connection. In this way, the routers know which host is on which segment and can route packets effectively. The hosts should not need to actively refresh their cache except when needing to send packets out for an unresolved IP. This is a bit simplistic, but essentially gets the point across.

> During the same time period we have not lost your MAC entry in our table and your modem remains ATM pingable.

I suspect you're cache is implicitly being refreshed by me because I don't want my hosts to become unaddressable. I have 4 IPs. Two are actively pinging out. One runs in stealth any only does NTP requests. The last one is this non-vm test host.
For your testing, I'll leave up my host on .243. Feel free to do some testing of your own.
Note that since this interface is separated, it has only one subnet that it knows about: 173.228.5.0
I had to add an explicit route for the bluemoon hosts subnet to ensure ping responses went back out through the same interface. I can add some additional explicit routes to any Sonic subnets if you need me to.

Thanks in advance for anything you can discovery about my problem!
Again I can assure you that our cache is not being updated by your pings, I did not see any ARP traffic from you for over an hour. This is not a problem though because the MAC entries on our side never time out. There is something else going on here. I encourage you to take a look at your ARP cache and routing table before and after the problem is occurring.
Well, I'm certainly open to suggestions as where to look. From my perspective, I'm simply not receiving packets after some magical idle time.
Again, I suggest testing with the simplest possible configuration and working up from there.
by doctorfb » Thu Feb 28, 2013 1:07 pm
(Apparently the forum only allows a max of 3 embedded quotes. :-(
tdo wrote: Yeah that's definitely not the best way to test this. That's only working because our DSLAM is responding to the ARP request. Again, I encourage you to simplify your configuration here for testing, in order to prove that it's not our router or your modem doing something weird. Put a computer behind the modem with a single static IP assigned to one interface and a single default gateway entry towards the gateway IP address you were given. This should work all day long without you having to initiate any outbound traffic. If it doesn't, then we need to look at why.
Hmm...that begs the question: does the DSLAM ever issue a who-has, or is it relying upon my host to send the initial gratuitous arp? For that matter, if I re-cycle my ZyXEL does the DSLAM get notification of the line being re-established and does it do any discovery there after? I've had a few cases of where I had to recycle the ZyXEL to clear what appeared to be a DNS problem.
And, even though my current setup is not the most optimal configuration, it still illustrates the problem.
Tonight I will scrounge together a machine with a single network port and configure it solely to be on that IP address. I'll post to this thread with additional info once I have that up and running.
tdo wrote: Yes, but again the entry in the DSLAM table never times out, and this table is what keeps the router ARP cache fed with current entries. We expect your host to generate exactly one ARP who-has request, ever. This populates all the necessary tables and caches. Entries in the DSLAM table are never removed, only overwritten. Even if this was being caused by our router cache entry timing out, it sounds like you are seeing a problem in a matter of minutes, not after 4 hours.
Yes, it's certainly shorter than 4 hours (down to 60 seconds for the two virtual hosts). I don't suppose there is a way to track packets between your router and the DSLAM? I wish there was a way to track packets on my ZyXEL, but I don't see a way of doing that directly.
tdo wrote: Again, I suggest testing with the simplest possible configuration and working up from there.
I'll do that tonight and report here. I don't expect anything different to occur, however.
Hey, at least it's a consistent problem! Imagine if this were purely intermittent! :-)
Anyway, thanks for your continued interest in help me!
by tdo » Thu Feb 28, 2013 4:12 pm
doctorfb wrote:(Apparently the forum only allows a max of 3 embedded quotes. :-(
tdo wrote: Yeah that's definitely not the best way to test this. That's only working because our DSLAM is responding to the ARP request. Again, I encourage you to simplify your configuration here for testing, in order to prove that it's not our router or your modem doing something weird. Put a computer behind the modem with a single static IP assigned to one interface and a single default gateway entry towards the gateway IP address you were given. This should work all day long without you having to initiate any outbound traffic. If it doesn't, then we need to look at why.
Hmm...that begs the question: does the DSLAM ever issue a who-has, or is it relying upon my host to send the initial gratuitous arp? For that matter, if I re-cycle my ZyXEL does the DSLAM get notification of the line being re-established and does it do any discovery there after? I've had a few cases of where I had to recycle the ZyXEL to clear what appeared to be a DNS problem.
And, even though my current setup is not the most optimal configuration, it still illustrates the problem.
Tonight I will scrounge together a machine with a single network port and configure it solely to be on that IP address. I'll post to this thread with additional info once I have that up and running.
I don't have a good answer to that question because the DSLAM behavior in that scenario has changed and I can't remember exactly what it does. The point is, most hosts will either issue a gratuitous ARP or an ARP who-has for the default gateway IP address very quickly upon being connected to the Fusion line, and it only has to do this once to have your MAC address programmed into the table. The DSLAM will only ever overwrite this entry (in case your MAC address changes), it never removes it.
doctorfb wrote:
tdo wrote: Yes, but again the entry in the DSLAM table never times out, and this table is what keeps the router ARP cache fed with current entries. We expect your host to generate exactly one ARP who-has request, ever. This populates all the necessary tables and caches. Entries in the DSLAM table are never removed, only overwritten. Even if this was being caused by our router cache entry timing out, it sounds like you are seeing a problem in a matter of minutes, not after 4 hours.
Yes, it's certainly shorter than 4 hours (down to 60 seconds for the two virtual hosts). I don't suppose there is a way to track packets between your router and the DSLAM? I wish there was a way to track packets on my ZyXEL, but I don't see a way of doing that directly.
tdo wrote: Again, I suggest testing with the simplest possible configuration and working up from there.
I'll do that tonight and report here. I don't expect anything different to occur, however.
Hey, at least it's a consistent problem! Imagine if this were purely intermittent! :-)
Anyway, thanks for your continued interest in help me!
33 posts Page 2 of 4

Who is online

In total there are 18 users online :: 1 registered, 0 hidden and 17 guests (based on users active over the past 5 minutes)
Most users ever online was 999 on Mon May 10, 2021 1:02 am

Users browsing this forum: Semrush [Bot] and 17 guests