New shell server transition

Advanced feature discussion, beta programs and unsupported "Labs" features.
297 posts Page 30 of 30
by scott » Thu Jun 07, 2018 3:41 pm
casner wrote:
yronwode wrote:
It seems that you're still working on it. Thanks, if so. I keep getting disconnected. It's not urgent, but i would prefer not having to keep re-connecting.

Scott mentioned turning off some keepalives. That may be counterproductive for this problem. So long as the underlying network connectivity is stable, sending keepalives avoids having NAT devices time out the connection.

[As an aside, I'll mention that back in the early days of the Internet (late 1970s and 1980s) there were folks working on packet radio who hated keepalives because their network connectivity was intermittent. Their TCP connections would break even if they weren't active during the time of a connectivity loss due to a keepalive being sent automatically. But that was before NATs created the converse problem.]


I would have thought that ssh (not tcp) keepalives would keep NAT devices from terminating the connection.

This is how it is set up in sshd_config:

Code: Select all

ClientAliveInterval 60
ClientAliveCountMax 10
TCPKeepAlive no


If I'm correct, this should withstand an outage of up to 10 minutes. Am I out to lunch?

-Scott
by casner » Thu Jun 07, 2018 4:04 pm
scott wrote:
If I'm correct, this should withstand an outage of up to 10 minutes. Am I out to lunch?

-Scott

That should work because the ssh-level keepalive takes the place of the TCP-level keepalive.
by scott » Thu Jun 07, 2018 6:12 pm
yronwode wrote:
It seems that you're still working on it. Thanks, if so. I keep getting disconnected. It's not urgent, but i would prefer not having to keep re-connecting.


If the disconnect happened around 3:50am, that's when I rebooted. The server wasn't allowing logins.

I've stopped using auditctl and am now using systemtap to monitor the unmounting of volumes. I'm hoping to catch it in the act next time the server falls over.

-Scott
by scott » Fri Jun 08, 2018 8:19 pm
scott wrote:
yronwode wrote:
It seems that you're still working on it. Thanks, if so. I keep getting disconnected. It's not urgent, but i would prefer not having to keep re-connecting.


If the disconnect happened around 3:50am, that's when I rebooted. The server wasn't allowing logins.

I've stopped using auditctl and am now using systemtap to monitor the unmounting of volumes. I'm hoping to catch it in the act next time the server falls over.

-Scott


I was up with the server this morning at 2am and 3am. Made a few changes beforehand that might be mitigating the problem.

It turns out sshfs was allocating a pty for every mount, when it really didn't have to. I disabled that, and made some changes regarding when sshfs volumes get unmounted. (Not often.) I'm going to have to rethink the mount management, we're already using pam_mount in a way that its designers probably didn't expect.

How has the server been? Feedback appreciated. :)

-Scott
by netllama » Fri Jun 08, 2018 8:51 pm
No issues since the outage a few days ago.
by scott » Tue Jun 12, 2018 3:20 pm
Knock on wood, I think we may have done a lot to stabilize the new shell server:

15:19:37 up 5 days, 11:40, 22 users, load average: 0.39, 0.65, 0.73

-Scott
by scott » Fri Jun 15, 2018 2:04 am
scott wrote:
Knock on wood, I think we may have done a lot to stabilize the new shell server:

15:19:37 up 5 days, 11:40, 22 users, load average: 0.39, 0.65, 0.73


After being up for a little over a week, I rebooted it to clear out some stale cruft. I have more work to do on the mount manager so that it will be able to have multi-month uptimes.

-Scott
297 posts Page 30 of 30

Who is online

In total there are 7 users online :: 0 registered, 0 hidden and 7 guests (based on users active over the past 5 minutes)
Most users ever online was 422 on Sat May 26, 2012 5:28 am

Users browsing this forum: No registered users and 7 guests