SO_KEEPALIVE considered harmful?

Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!ucsd!ucbvax!
EXPO.LCS.MIT.EDU!rws
From: r...@EXPO.LCS.MIT.EDU
Newsgroups: comp.protocols.tcp-ip
Subject: SO_KEEPALIVE considered harmful?
Message-ID: <8905231205.AA00500@expire.lcs.mit.edu>
Date: 23 May 89 12:04:55 GMT
Sender: dae...@ucbvax.BERKELEY.EDU
Organization: The Internet
Lines: 9

I have a random question that I hope this illustrious audience can answer
definitively for me (or else point me to a definitive source).  Is the BSD
notion of SO_KEEPALIVE on a TCP connection considered kosher with respect to
the TCP specification?  If so, is its use to be encouraged?  Specifically,
it has been suggested that in the X Window System world, X libraries
should automatically be setting SO_KEEPALIVE on connections to X servers.  Is this
a reasonable thing to do?

[If this is a totally inappropriate forum for this question, I apologize.]

Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!
ucbvax!AHWAHNEE.STANFORD.EDU!dcrocker
From: dcroc...@AHWAHNEE.STANFORD.EDU (Dave Crocker)
Newsgroups: comp.protocols.tcp-ip
Subject: Re:  SO_KEEPALIVE considered harmful?
Message-ID: <8905231641.AA25794@ucbvax.Berkeley.EDU>
Date: 23 May 89 14:57:06 GMT
Sender: dae...@ucbvax.BERKELEY.EDU
Organization: The Internet
Lines: 18

The use of Keepalives is terrible, but sometimes necessary.  The key
word, here, is "sometimes".

The "terrible" is due to the fact that they add traffic to the net.  An
important point to keep in mind, with TCP connections, is that they may
span the globe, over thin wires.  Extra traffic can have a very serious
effect.  Further, they scale poorly.  The incremental traffic from one
connection may not be onerous, but what about 1000 connections?  Lastly,
of course, there is the small fact that there may be a charge for those
extra packets, such as may happen if one of the links along the path
is over a public X.25 network.

If the group proposing the use of Keepalives has already gone through the
exercise of convincing themselves that critical functionality will be
lost if they are not used, then I hope the next question was/is how
to minimize their use.

Dave

Path: utzoo!attcan!uunet!cs.utexas.edu!tut.cis.ohio-state.edu!ucbvax!NNSC.NSF.NET!
craig
From: cr...@NNSC.NSF.NET (Craig Partridge)
Newsgroups: comp.protocols.tcp-ip
Subject: re: SO_KEEPALIVE considered harmful?
Message-ID: <8905231944.AA06042@ucbvax.Berkeley.EDU>
Date: 23 May 89 16:41:15 GMT
Sender: dae...@ucbvax.BERKELEY.EDU
Organization: The Internet
Lines: 75


> I have a random question that I hope this illustrious audience can answer
> definitively for me (or else point me to a definitive source).  Is the BSD
> notion of SO_KEEPALIVE on a TCP connection considered kosher with respect to
> the TCP specification?  If so, is its use to be encouraged?  Specifically,
> it has been suggested that in the X Window System world, X libraries
> should automatically be setting SO_KEEPALIVE on connections to X servers.  Is this
> a reasonable thing to do?

Oh what fun!  Keepalive wars return....

Well, I'm a firm hater of keep-alives, although Mike Karels has persuaded
me that in the current world they are a useful tool for catching clients
that go off into hyperspace without telling you.  I have lots of fellow
travellers (actually, I'm probably a fellow traveller with Phil Karn,
president of the "I hate keep-alives" party), witness the current host
requirements text, which is appended.

Craig

	Implementors MAY include "keep-alives" in their TCP           |
	implementations, although this practice is not universally    |
	accepted.  If keep-alives are included, the application MUST  |
	be able to turn them on or off for each TCP connection, and   |
	they MUST default to off.                                     |

	Keep-alive packets MUST NOT be sent when any data or          |
	acknowledgement packets have been received for the            |
	connection within a configurable interval; this interval      |
	MUST default to no less than two hours.                       |

	An implementation SHOULD send a keep-alive segment with no    |
	data; however, it MAY be configurable to send a keep-alive    |
	segment containing one garbage octet, for compatibililty      |
	with erroneous TCP implementations.                           |


	DISCUSSION:                                                   |
	     A "keep-alive" mechanism would periodically probe the    |
	     other end of a connection when the connection was        |
	     otherwise idle, even when there was no data to be sent.  |
	     The TCP specification does not include a keep-alive      |
	     mechanism because it could:  (1) cause perfectly good    |
	     connections to break during transient Internet           |
	     failures; (2) consume unnecessary bandwidth ("if no one  |
	     is using the connection, who cares if it is still        |
	     good?"); and (3) cost money for an Internet path that    |
	     charges for packets.                                     |

	     Some TCP implementations, however, have included a       |
	     keep-alive mechanism. To confirm that an idle            |
	     connection is still active, these implementations send   |
	     a probe segment designed to elicit a response from the   |
	     peer TCP.  Such a segment generally contains SEG.SEQ =   |
	     SND.NXT-1.  The segment may or may not contain one       |
	     garbage octet of data.  Note that on a quiet             |
	     connection, SND.NXT = RCV.NXT and SEG.SEQ will be        |
	     outside the window.  Therefore, the probe causes the     |
	     receiver to return an acknowledgment segment,            |
	     confirming that the connection is still live.  If the    |
	     peer has dropped the connection due to a network         |
	     partition or a crash, it will respond with a reset       |
	     instead of an acknowledgement.                           |

	     Unfortunately, some misbehaved TCP implementations fail  |
	     to respond to a segment with SEG.SEQ = SND.NXT-1 unless  |
	     the segment contains data.  Alternatively, an            |
	     implementation could determine whether a peer responded  |
	     correctly to keep-alive packets with no garbage data     |
	     octet.                                                   |

	     A TCP keep-alive mechanism should only be invoked in     |
	     network servers that might otherwise hang indefinitely   |
	     and consume resources unnecessarily if a client crashes  |
	     or aborts a connection during a network partition.       |

Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!
ucbvax!AHWAHNEE.STANFORD.EDU!dcrocker
From: dcroc...@AHWAHNEE.STANFORD.EDU (Dave Crocker)
Newsgroups: comp.protocols.tcp-ip
Subject: Re: SO_KEEPALIVE considered harmful?
Message-ID: <8905250638.AA21706@ucbvax.Berkeley.EDU>
Date: 23 May 89 21:03:13 GMT
Sender: dae...@ucbvax.BERKELEY.EDU
Organization: The Internet
Lines: 16

I tried to avoid saying that keepalives should be prohibited, except,
perhaps, from an aesthetic point of view.  Since aesthetics often are
altered by reality, it is no great concession to acknowledge the 
occasional need for the mechanism.

My point was that they are dangerous and therefore should be used VERY
judiciously.  Craig's note puts this point forward in more detail.

It is worth adding that the excessive use of keepalives has removed a
feature that used to be in TCP and has been recently re-documented by
Bob Braden:  TCP used to be remarkably robust against temporary
outages.  If you were willing to wait, so was TCP.  Now, an outage of
a very short time -- on some implementations, as short as 1-2 minutes --
will abort the connection.

Dave

Path: utzoo!attcan!uunet!lll-winken!ames!think!barmar
From: bar...@think.COM (Barry Margolin)
Newsgroups: comp.protocols.tcp-ip
Subject: Re: SO_KEEPALIVE considered harmful?
Message-ID: <20761@news.Think.COM>
Date: 25 May 89 16:32:31 GMT
References: <8905250638.AA21706@ucbvax.Berkeley.EDU>
Sender: n...@Think.COM
Reply-To: bar...@kulla.think.com.UUCP (Barry Margolin)
Organization: Thinking Machines Corporation, Cambridge, MA
Lines: 49

In article <8905250638.AA21...@ucbvax.Berkeley.EDU> 
dcroc...@AHWAHNEE.STANFORD.EDU (Dave Crocker) writes:
>It is worth adding that the excessive use of keepalives has removed a
>feature that used to be in TCP and has been recently re-documented by
>Bob Braden:  TCP used to be remarkably robust against temporary
>outages.  If you were willing to wait, so was TCP.  Now, an outage of
>a very short time -- on some implementations, as short as 1-2 minutes --
>will abort the connection.

I dispute this claim.  TCP is only robust against temporary outages if
you don't try to use the connection during that period.  For instance,
if I'm using telnet, the connection will stay alive during outages if
I don't type anything to the client or the host doesn't try to send
any output.  If either end tries to use the connection, and the outage
is longer than the TCP acknowledgement timeout, then the connection
will die.  If I happen to know that the network is having trouble I
won't type anything, but how often is this the case?  What it mostly
means is that a temporary outage after I go home won't break my
connections.

TCP's robustness is still a good idea.  It's nice to be able to swap
Ethernet cables without causing all the network connections to die.
But in my experience (which, I admit, isn't all that extensive), any
connection that dies for more than a minute or two probably isn't
going to come back.

What I mostly care about, though, is that the other end definitely has
reinitialized, e.g. it has crashed and been rebooted.  If it's a
telnet server that crashed I can do this by typing into the client,
which will provoke a reset, and the client will abort.  But if it's
the telnet client or an X server that died, there's often no way to
force the other end to try to send something so it will get a reset.

I think the right solution is a compromise.  What's needed is a way to
send a segment with infinite (or near-infinite, e.g. hours or a day)
retransmissions and slow retransmit rate (one to two minutes).  This
would allow idle connections to stay up across most network failures,
but they will die within a minute or so of the other end rebooting.
And, of course, it should be optional, so that applications that
perform frequent output of their own need not compound their network
use (although since keepalives need only be sent when there are no
normal packets in the retransmit queue, any application whose output
rate is higher than the keepalive rate will never invoke the keepalive
mechanism).

Barry Margolin
Thinking Machines Corp.

bar...@think.com
{uunet,harvard}!think!barmar

Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!
rutgers!bellcore!jupiter!karn
From: karn@jupiter (Phil R. Karn)
Newsgroups: comp.protocols.tcp-ip
Subject: Re: SO_KEEPALIVE considered harmful?
Message-ID: <16423@bellcore.bellcore.com>
Date: 25 May 89 22:40:49 GMT
References: <8905250638.AA21706@ucbvax.Berkeley.EDU> <20761@news.Think.COM>
Sender: n...@bellcore.bellcore.com
Reply-To: k...@jupiter.bellcore.com (Phil R. Karn)
Organization: Bell Communications Research, Inc
Lines: 28

>>It is worth adding that the excessive use of keepalives has removed a
>>feature that used to be in TCP and has been recently re-documented by
>>Bob Braden:  TCP used to be remarkably robust against temporary
>>outages. [...]

>I dispute this claim.  TCP is only robust against temporary outages if
>you don't try to use the connection during that period.

TCP becomes quite robust against all outages (whether or not the
connection is idle) once you make a very simple change: get rid of TCP
level timeouts!

I feel very strongly that TCP should *never* just give up on its own
accord; that decision belongs to the application. And, in the event the
application is an interactive one, the decision to abort should be left
to the human user. If he's willing to wait, why shouldn't the system let
him? (The only case when TCP should abort a connection on its own is
when it has clear proof that the other end has crashed, i.e., by
receiving a valid RST.)

Users of my TCP/IP package on amateur packet radio occasionally report
cases of FTP transfers that resume automatically after network outages
lasting for *days* (e.g., those due to crashes of network nodes in
remote locations that require manual resets).  They are most happy to do
without TCP give-up timers, as long as TCP backs off its retransmissions
to avoid channel congestion.

Phil

Path: utzoo!attcan!uunet!cs.utexas.edu!tut.cis.ohio-state.edu!ucbvax!
AHWAHNEE.STANFORD.EDU!dcrocker
From: dcroc...@AHWAHNEE.STANFORD.EDU (Dave Crocker)
Newsgroups: comp.protocols.tcp-ip
Subject: Re: SO_KEEPALIVE considered harmful?
Message-ID: <8906012254.AA23748@ucbvax.Berkeley.EDU>
Date: 26 May 89 13:28:24 GMT
Sender: dae...@ucbvax.BERKELEY.EDU
Organization: The Internet
Lines: 9

Phil,

As a test-of-concept:  I assume that you have no objection to a TCP
implementation's being able to do keepalives, under the control of the
application, where both the fact of keepalives AND their periodicity
can be specified; and the effect of a timeout is a signal to the
application, not an abort?

Dave

Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!
ucbvax!THUMPER.BELLCORE.COM!karn
From: k...@THUMPER.BELLCORE.COM (Phil R. Karn)
Newsgroups: comp.protocols.tcp-ip
Subject: Re: SO_KEEPALIVE considered harmful?
Message-ID: <8905262347.AA11535@thumper.bellcore.com>
Date: 26 May 89 23:47:25 GMT
Sender: dae...@ucbvax.BERKELEY.EDU
Organization: The Internet
Lines: 29

Dave,

Yes, that might be acceptable to me. I'd go a little further, though,
and say that a REMOTE USER (not just the application code) must always
be able to turn off keepalives, even on binary-only systems. It does no
good to say "the application must be able to disable keepalives" when
I'm having problems with a remote server that I have no administrative
control over.

Much of my animosity toward keepalives came from trying to make a Sun
workstation work properly over SLIP links and amateur packet radio. I
finally replaced the TCP object modules provided by Sun with ones
compiled from Van's latest TCP, which I had already edited to disable
keepalives.  Works like a charm.

At the last InterOp, I sat next to Dave Borman in a panel session on TCP
performance. Between us, we represented a "dynamic range" of about 6
orders of magnitude in TCP transfer rates (1200 bps amateur packet radio
to 500 Mbps between Crays). This is an exceptional achievement for a
single networking protocol, but it was possible only because TCP was
designed from the beginning to scale well over a wide network
performance range.

But broken mechanisms like keepalives threaten this. We need a big red
warning light that will flash whenever someone proposes to put an fixed
time interval into a protocol spec, because you can't scale protocols
that have arbitrary timers.

Phil

Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!
ucbvax!A.ISI.EDU!CERF
From: C...@A.ISI.EDU
Newsgroups: comp.protocols.tcp-ip
Subject: Re: SO_KEEPALIVE considered harmful?
Message-ID: <[A.ISI.EDU]28-May-89.14:23:38.CERF>
Date: 28 May 89 18:23:00 GMT
References: <2681@elxsi.UUCP>
Sender: dae...@ucbvax.BERKELEY.EDU
Organization: The Internet
Lines: 24

When TCP was first designed, and for all subsequent versions, it was
thought inappropriate to impose any kind of semantics on the logical
connections extablished by TCP. In particular, no sense of absolute
timeout for the severing of a connection was desired. We thought that
such notions of "impatience" or "time to give up" ought to be the
choice of the upper level protocol using TCP as the basis merely for
reliable delivery.

A part of this view stemmed from the fact that the networks over which
TCP had to function, for the DoD applications we had in mind, were
potentially very unpredictable as to loss and delay. Mobile packet
radio systems had to function under jamming and radio shadow effects,
for instance. TCP never unilaterally severed connections but only
reported failure to achieve positive acknowledgement after a time
which could be controlled by the application or upper-level protocol.
It was up to the application to decide whether to sever the connection
and, even then, the choice to do so gracefully or abruptly was also
left to the application.

The use of a feature (X-level NOP) to test the liveness of a TCP
connection is consonant with the model against which the TCP was
designed. 

Vint Cerf

Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!
ucbvax!OKEEFFE.BERKELEY.EDU!karels
From: kar...@OKEEFFE.BERKELEY.EDU (Mike Karels)
Newsgroups: comp.protocols.tcp-ip
Subject: Re: SO_KEEPALIVE considered harmful?
Message-ID: <8906082328.AA04514@okeeffe.Berkeley.EDU>
Date: 8 Jun 89 23:28:37 GMT
Sender: dae...@ucbvax.BERKELEY.EDU
Organization: The Internet
Lines: 39

Sorry, I can't let this go by without commenting on Phil's message
and this discussion, even though the discussion has mostly died down.
(I haven't been reading tcp-ip very often, but noticed this subject
line going by.)

Last time Phil and I talked about keepalives in person, I asked him
whether he had problems with telnet/rlogin servers accumulating on
his systems if they didn't use keepalives.  We certainly accumulate
junk, including xterm programs, waiting for input from a half-open
connection.  Phil told me that he doesn't have problems, because
he runs a "wall" every night to force output to all users, and of
course breaking connections that time out.  In other words, Phil
violently objects to servers requesting keepalives from TCP, but
allows the system manager (himself) to force them above the application
level.  And before people jump up to point out the difference in time
scales, the current BSD code sends no keepalive packets until a connection
has been idle for 2 hr, and that interval is easily changeable.
One proposal for the Host Requirements document was to wait for 12 hr.
I think that's a bit high, but the difference is only a factor of 6.
Compare the number of keepalive packets with the number of packets
exchanged by an xterm and an X server over the course of a week
if used 4 hours a day!

Phil says:
	... I'd go a little further, though,
	and say that a REMOTE USER (not just the application code) must always
	be able to turn off keepalives, even on binary-only systems. It does no
	good to say "the application must be able to disable keepalives" when
	I'm having problems with a remote server that I have no administrative
	control over.

I'm sorry, Phil, but remote users have no more right to override system
management policies than do local users (at least on *our* systems!).
On some of the systems where I have guest accounts, local or remote
users are logged off if they aren't active for two hours.  I don't like
that, either, but I don't claim that the managers of those systems
have no right to enforce such a policy.

		Mike