ssh session times out annoyingly fast, why?

Discussion:

(too old to reply)

Britton Kerin

2020-09-21 23:40:02 UTC

I'm using ssh from a debian box to a rasberry pi (sorta debian also :).

For some reason ssh sessions seem to time out pretty quickly. I've
tried setting ClientAliveInterval and ClientAliveCountMax and also
ServerAliveInterval and ServerAliveCountMax, but it doesn't seem to
make any difference. Is there some other setting somewhere that
affects this?

Thanks,
Britton

Joseph Loo

2020-09-21 23:50:02 UTC

Permalink

have you tried "ssh -Y" option

Post by Britton Kerin
I'm using ssh from a debian box to a rasberry pi (sorta debian also :).
For some reason ssh sessions seem to time out pretty quickly. I've
tried setting ClientAliveInterval and ClientAliveCountMax and also
ServerAliveInterval and ServerAliveCountMax, but it doesn't seem to
make any difference. Is there some other setting somewhere that
affects this?
Thanks,
Britton

--
Joseph Loo
***@acm.org

Toni Mas Soler

2020-09-22 09:40:01 UTC

Permalink

First, you should be sure it is not a network issue.
You could open a terminal and run, for example, top program. This
avoid any timeout configured. If this does not work, you should follow
for a network issue, otherwise we can see sshd's config file.

Toni Mas

Gary Dale

2020-09-22 20:50:01 UTC

Permalink

My money is on a network issue. Lately my connection to a remote server
seems to lock up quickly while I have a stable connection to a local
server. Both servers are running Debian/Stable and I haven't fiddled
with the ssh settings in a long time.

Anssi Saari

2020-09-23 19:50:01 UTC

Permalink

Well, the keepalives themselves can cause a disconnect if the keepalive
messages are not reaching the other end due to bad connection for
example. Looks like by default in Debian client sends keepalives if
server is quiet but server doesn't send keepalives to a client.

You could try mosh instead of ssh. Or enable some debug printouts with
ssh -v (or -vv or -vvv for max messages) to see if that tells you why
the connection dropped.

Greg Wooledge

2020-09-23 19:50:02 UTC

Permalink

Post by Anssi Saari

Post by Britton Kerin
I'm using ssh from a debian box to a rasberry pi (sorta debian also :).
For some reason ssh sessions seem to time out pretty quickly.

How quickly, exactly? What is the actual message/behavior you see when
it happens? Are they both on the same LAN, or is there some complexity
in between them (especially a NAT router)?

Post by Anssi Saari
Well, the keepalives themselves can cause a disconnect if the keepalive
messages are not reaching the other end due to bad connection for
example. Looks like by default in Debian client sends keepalives if
server is quiet but server doesn't send keepalives to a client.

The normal reason people need to use ServerAlive or ClientAlive is NAT.
If your connection from ssh client to ssh server goes through a NAT
router, the router may keep track of activity on that connection, and
drop the translation when it goes idle for 5 minutes or so. Forcing the
*Alive packets to happen every few minutes prevents a NAT timeout.

If there is no NAT involved, then I agree with the previous suggestion
that this might be a shell's TMOUT variable. Are you sitting at a shell
prompt when the "timeout" occurs? Does the timeout stop occurring when
you're inside a text editor, for example?

Much more information is needed here.

Britton Kerin

2020-09-26 21:20:01 UTC

Permalink

Post by Greg Wooledge

Post by Anssi Saari

Post by Britton Kerin
I'm using ssh from a debian box to a rasberry pi (sorta debian also :).
For some reason ssh sessions seem to time out pretty quickly.

How quickly, exactly? What is the actual message/behavior you see when
it happens? Are they both on the same LAN, or is there some complexity
in between them (especially a NAT router)?

The normal reason people need to use ServerAlive or ClientAlive is NAT.
If your connection from ssh client to ssh server goes through a NAT
router, the router may keep track of activity on that connection, and
drop the translation when it goes idle for 5 minutes or so. Forcing the
*Alive packets to happen every few minutes prevents a NAT timeout.
If there is no NAT involved, then I agree with the previous suggestion
that this might be a shell's TMOUT variable. Are you sitting at a shell
prompt when the "timeout" occurs? Does the timeout stop occurring when
you're inside a text editor, for example?

Looks like NAT was the culprit, because top kept it alive. Internet has bogus
advice on this one because it suggests ServerAliveInterval 1200 or something
which I guess is larger than most firewall timeout.

Thanks for all help good to see debian community still so good.

Britton

Greg Wooledge

2020-09-28 12:10:01 UTC

Permalink

Post by Britton Kerin
Looks like NAT was the culprit, because top kept it alive. Internet has bogus
advice on this one because it suggests ServerAliveInterval 1200 or something
which I guess is larger than most firewall timeout.
Thanks for all help good to see debian community still so good.

I use "ServerAliveInterval 300" with my cheap consumer-grade router.

Michael Stone

2020-09-29 12:20:02 UTC

Permalink

Post by Greg Wooledge
The normal reason people need to use ServerAlive or ClientAlive is NAT.
If your connection from ssh client to ssh server goes through a NAT
router, the router may keep track of activity on that connection, and
drop the translation when it goes idle for 5 minutes or so. Forcing the
*Alive packets to happen every few minutes prevents a NAT timeout.

This is a stateful firewall thing, not a NAT thing

Gene Heskett

2020-09-29 12:50:01 UTC

Permalink

Post by Michael Stone

Post by Greg Wooledge
The normal reason people need to use ServerAlive or ClientAlive is
NAT. If your connection from ssh client to ssh server goes through a
NAT router, the router may keep track of activity on that
connection, and drop the translation when it goes idle for 5 minutes
or so. Forcing the *Alive packets to happen every few minutes
prevents a NAT timeout.

This is a stateful firewall thing, not a NAT thing

This is likely quite true Michael, but it also is only a hint as to how
to fix it for the OP. I maintain 8 to 12 such ssh connections here to
my othermachines, establishing them at boot time, but all are local
192.168.xx.xx addresses so not NAT'd going either direction, so I am not
affected. I would be upset if I was.

Cheers, Gene Heskett

--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If we desire respect for the law, we must first make the law respectable.
- Louis D. Brandeis
Genes Web page <http://geneslinuxbox.net:6309/gene>

Michael Stone

2020-09-29 13:00:01 UTC

Permalink

Post by Gene Heskett
This is likely quite true Michael, but it also is only a hint as to how
to fix it for the OP.

It was already fixed, serveraliveinterval/clientaliveinterval is the
right answer. I guess I can review: these options simply have the client
& server exchange an encrypted "are you here" message every N seconds to
prevent the firewall from timing out the connection. tcpkeepalives won't
do that, as the firewall can see that there is no actual data being
transferred and may still timeout idle connections.

All that aside, it's important to be precise about what functionality is
related to NAT and what functionality is related to firewalling.
Imprecision about these concepts leads to all sorts of (wrong) ideas
like "you need NAT to be secure".

Tom Dial

2020-09-30 03:00:01 UTC

Permalink

Post by Michael Stone

Post by Gene Heskett
This is likely quite true Michael, but it also is only a hint as to how
to fix it for the OP.

It was already fixed, serveraliveinterval/clientaliveinterval is the
right answer. I guess I can review: these options simply have the client
& server exchange an encrypted "are you here" message every N seconds to
prevent the firewall from timing out the connection. tcpkeepalives won't
do that, as the firewall can see that there is no actual data being
transferred and may still timeout idle connections.
All that aside, it's important to be precise about what functionality is
related to NAT and what functionality is related to firewalling.
Imprecision about these concepts leads to all sorts of (wrong) ideas
like "you need NAT to be secure".

+2

I use NAT for convenience, and a firewall (and other measures) for security.

And thank you for stating the distinction clearly; I sort of knew it,
but clarity always is a good thing.

Tom Dial

t***@tuxteam.de

2020-09-29 14:30:02 UTC

Permalink

Post by Michael Stone

This is a stateful firewall thing, not a NAT thing

That depends on what Greg means by "activity". NAT has to keep a
map of (internal IP, internal local port) to external local port
to do the translation (the so-called "translation table"). Since
it'd grow without bounds whenever one side drops the connection,
it's customary to let NAT table entries to expire after some
inactivity (typical: 1h, but network admins are known to be a
capricious species ;-)

So Greg is probably right. NAT is, in its own way, stateful.

Cheers
- t

t***@tuxteam.de

2020-09-29 14:40:02 UTC

Permalink

On Tue, Sep 29, 2020 at 04:22:32PM +0200, ***@tuxteam.de wrote:

Following up on myself: I had exactly this case with an (outsourced)
data centre: they had NATs between different realms (you might ask
"why, oh, why?" and you'd be right). The application server and the
database server were separated by a NAT. To add insult to injury,
ICMP "not reachable" packets were filtered. The database connection
NAT entry timed out from time to time. It took a timeout of several
minutes for the application to notice that and to reconnect.

Lots of hilarity ensued.

Setting the socket option to keep alive "fixed" that.

Cheers
- t

Michael Stone

2020-09-29 14:50:03 UTC

Permalink

Post by t***@tuxteam.de
Setting the socket option to keep alive "fixed" that.

You were lucky. ssh does that by default, so if ssh sessions are getting
killed these days it's because the firewall ignores tcp keepalives when
calculating timeouts. If you're in such an environment and can't fix the
firewall, then every application needs to be written to explicitly
exchange data when idle to keep connections alive.

t***@tuxteam.de

2020-09-29 15:00:02 UTC

Permalink

Post by Michael Stone

Post by t***@tuxteam.de
Setting the socket option to keep alive "fixed" that.

You were lucky. ssh does that by default, so if ssh sessions are
getting killed these days it's because the firewall ignores tcp
keepalives when calculating timeouts. If you're in such an
environment and can't fix the firewall, then every application needs
to be written to explicitly
exchange data when idle to keep connections alive.

It wasn't ssh in this case. It was a (Perl DBI) database connection,
which, by default, is silent on inactivity. So after one hour, the
NAT dropped it.

To set the keepalive option, I had to convince the application
provider to update its (then already paleontological) Perl version
to one in which setting the keepalive socket option was possible.

In the end, that helped.

(I first tried to talk the customer into hitting their data centre
provider with a Thick Ethernet cable, but wasn't successful, alas).

This was anoter long story on its own :)

If the above NAT is killing entries which send keepalives then
a Thick Ethernet cable probably won't help either. That's downright
malicious.

Cheers
- t

Michael Stone

2020-09-29 14:50:02 UTC

Permalink

Post by t***@tuxteam.de

Post by Michael Stone

This is a stateful firewall thing, not a NAT thing

That depends on what Greg means by "activity". NAT has to keep a
map of (internal IP, internal local port) to external local port
to do the translation (the so-called "translation table"). Since
it'd grow without bounds whenever one side drops the connection,
it's customary to let NAT table entries to expire after some
inactivity (typical: 1h, but network admins are known to be a
capricious species ;-)
So Greg is probably right. NAT is, in its own way, stateful.

NAT is a special case of a stateful firewall. You can get rid of NAT but
basically the entire modern internet has stateful firewalls so getting
rid of NAT won't make the problem at hand go away. The basic connection
state tables and NAT state tables track basically the same information
using the same algorithms for session start & stop, have the same issues
with potentially leaking entries if hosts disappear, and have the same
strategy of expiring inactive entries.

In general it's kind of dumb on modern hardware to expire sessions that
are still exchanging TCP keepalives unless you're under extreme pressure
from a DoS attack or somesuch. (Modern devices just don't have the
memory constraints that were an issue 20 years ago and don't need to
aggressively prune sessions that are actively advertising that they're
alive.) But people rarely get to choose the other end's firewall
configuration, so enter kludges like the ssh protocol keepalives.

Stefan Monnier

2020-09-29 15:20:01 UTC

Permalink

Post by Michael Stone
In general it's kind of dumb on modern hardware to expire sessions
that are still exchanging TCP keepalives unless you're under extreme
pressure from a DoS attack or somesuch.

Indeed, I'd be *very* surprised if a connection was dropped despite
exchange of TCP keepalives. It seems much more likely that the
keepalives aren't used by the application (quite common and normal) or
that they get filtered somewhere.

Post by Michael Stone
But people rarely get to choose the other end's firewall
configuration, so enter kludges like the ssh protocol keepalives.

According to `man ssh(d)_config` one reason to use SSH's `Clientalive` or
`ServerAlive` is that, contrary to TCP keepalives, it can't be spoofed.

Stefan

Michael Stone

2020-09-29 15:30:02 UTC

Permalink

Post by Stefan Monnier

Post by Michael Stone
In general it's kind of dumb on modern hardware to expire sessions
that are still exchanging TCP keepalives unless you're under extreme
pressure from a DoS attack or somesuch.

Nope, it's reasonably common on the internet and a complete PITA.

Post by Stefan Monnier

Post by Michael Stone
But people rarely get to choose the other end's firewall
configuration, so enter kludges like the ssh protocol keepalives.

According to `man ssh(d)_config` one reason to use SSH's `Clientalive` or
`ServerAlive` is that, contrary to TCP keepalives, it can't be spoofed.

The issue with spoofing is potentially *too much* keeping alive, and if
you read further that can be relevant if you for some reason need to
know that an ssh connection has died but (e.g.) a malicious third party
is using TCP keepalives to prevent ssh from knowing that the other end
is down. If the problem you're trying to solve is not enough keeping
alive (that is, your ssh connection is dying) rather than too much
keeping alive, this reason is irrelevant. The protocol keepalives
*also* fix the problem of firewalls timing out connections with TCP
keepalives. I don't know why the man page doesn't just say that, maybe
ideological opposition to accomodating firewall stupidity.