* Hour long timeout to ssh/telnet/ftp to down host? @ 2001-06-12 21:02 Rob Landley 2001-06-13 2:18 ` Ben Greear 0 siblings, 1 reply; 5+ messages in thread From: Rob Landley @ 2001-06-12 21:02 UTC (permalink / raw) To: linux-kernel I have scripts that ssh into large numbers of boxes, which are sometimes down. The timeout for figuring out the box is down is over an hour. This is just insane. Telnet and ftp behave similarly, or at least tthey lasted the 5 minutes I was willing to wait, anyway. Basically anything that calls connect(). If the box doesn't respond in 15 seconds, I want to give up. Is this a problem with the kernel or with glibc? If it's the kernel, I'd expect a /proc entry where I can set this, but I can't seem to find one. Is there one? What would be involved in writing one? If it's glibc I'm probably better off writing a wrapper to ping the destination before trying to connect, or killing the connection after a timeout with no traffic. But both of those are really ugly solutions. Anybody have any light to shed on the situation? Rob ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Hour long timeout to ssh/telnet/ftp to down host? 2001-06-12 21:02 Hour long timeout to ssh/telnet/ftp to down host? Rob Landley @ 2001-06-13 2:18 ` Ben Greear 2001-06-12 21:47 ` Rob Landley 2001-06-13 9:40 ` Luigi Genoni 0 siblings, 2 replies; 5+ messages in thread From: Ben Greear @ 2001-06-13 2:18 UTC (permalink / raw) To: landley; +Cc: linux-kernel Rob Landley wrote: > > I have scripts that ssh into large numbers of boxes, which are sometimes > down. The timeout for figuring out the box is down is over an hour. This is > just insane. > > Telnet and ftp behave similarly, or at least tthey lasted the 5 minutes I was > willing to wait, anyway. Basically anything that calls connect(). If the > box doesn't respond in 15 seconds, I want to give up. > > Is this a problem with the kernel or with glibc? If it's the kernel, I'd > expect a /proc entry where I can set this, but I can't seem to find one. Is > there one? What would be involved in writing one? > You can tune things by setting the tcp-timeout probably..I don't know exactly where to set this.. You probably don't want all tcp to time out at 15 seconds anyway, so I'd suggest either using non-blocking connect (if you have the code that does the connect), or just set a timer (or use sigalarm) when you start the attempt, and fail the attempt if the timer or alarm signal goes off. > If it's glibc I'm probably better off writing a wrapper to ping the > destination before trying to connect, or killing the connection after a > timeout with no traffic. But both of those are really ugly solutions. Ugly is relative, and don't use ping because there is still a race condition (ping worked, but by the time you try tcp, the box is down.) > > Anybody have any light to shed on the situation? > > Rob > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Ben Greear <greearb@candelatech.com> <Ben_Greear@excite.com> President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Hour long timeout to ssh/telnet/ftp to down host? 2001-06-13 2:18 ` Ben Greear @ 2001-06-12 21:47 ` Rob Landley 2001-06-13 9:40 ` Luigi Genoni 1 sibling, 0 replies; 5+ messages in thread From: Rob Landley @ 2001-06-12 21:47 UTC (permalink / raw) To: Ben Greear, landley; +Cc: linux-kernel >You can tune things by setting the tcp-timeout probably..I don't >know exactly where to set this.. Aha, found it. /proc/sys/net/ipv4/tcp_syn_retries I am a victim of the exponential retry falloff, it would seem. syn_retries of 1 takes a few seconds, 3 takes less than half a minute, and 5 takes several minutes. The default value of 10 is what's giving me the problem (something like 20 minutes to time out, according to my earlier tests.) Then the fact that ssh then re-attempts the connection four times before actually failing is where I got my hour and change timeout. ("ssh -v -v -v" comes in handy...) Fun. Can we change the default value for this to something more sane, like 5? Exponential falloff is not good when your order of magnitude hits double digits. > You probably don't want all tcp to time out at 15 seconds anyway, so Just connection initiation. (If their ip stack hasn't replied to me by then, I doubt it's going to.) > I'd suggest either using non-blocking connect (if you have the code > that does the connect), or just set a timer (or use sigalarm) when you > start the attempt, and fail the attempt if the timer or alarm signal > goes off. Except I'm using off-the-shelf ssh. (I asked them about this problem a month ago, and there was some discussion of a workaround on their mailing list, but 2.9 came out and still had the same behavior. Apparently, if it's not a problem in OpenBSD, it's not a problem in OpenSSH...) > > If it's glibc I'm probably better off writing a wrapper to ping the > > destination before trying to connect, or killing the connection after a > > timeout with no traffic. But both of those are really ugly solutions. > > Ugly is relative, and don't use ping because there is still a race > condition (ping worked, but by the time you try tcp, the box is down.) Yeah, but it would eventually time out and recover, I've got ten threads out querying boxes, that's a really rare race condition. And I already acknowledged it was ugly. :) So the problem is just that tcp_syn_retries' default value of 10 is way too high due to the exponentially increasing gap between each retry. Rob ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Hour long timeout to ssh/telnet/ftp to down host? 2001-06-13 2:18 ` Ben Greear 2001-06-12 21:47 ` Rob Landley @ 2001-06-13 9:40 ` Luigi Genoni 2001-06-13 10:07 ` Rob Landley 1 sibling, 1 reply; 5+ messages in thread From: Luigi Genoni @ 2001-06-13 9:40 UTC (permalink / raw) To: Ben Greear; +Cc: landley, linux-kernel On Tue, 12 Jun 2001, Ben Greear wrote: > Rob Landley wrote: > > > > I have scripts that ssh into large numbers of boxes, which are sometimes > > down. The timeout for figuring out the box is down is over an hour. This is > > just insane. > > > > Telnet and ftp behave similarly, or at least tthey lasted the 5 minutes I was > > willing to wait, anyway. Basically anything that calls connect(). If the > > box doesn't respond in 15 seconds, I want to give up. > > > > Is this a problem with the kernel or with glibc? If it's the kernel, I'd > > expect a /proc entry where I can set this, but I can't seem to find one. Is > > there one? What would be involved in writing one? > > > > You can tune things by setting the tcp-timeout probably..I don't > know exactly where to set this.. /proc/sys/net/ipv4/tcp_fin_timeout default is 60. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Hour long timeout to ssh/telnet/ftp to down host? 2001-06-13 9:40 ` Luigi Genoni @ 2001-06-13 10:07 ` Rob Landley 0 siblings, 0 replies; 5+ messages in thread From: Rob Landley @ 2001-06-13 10:07 UTC (permalink / raw) To: Luigi Genoni, Ben Greear; +Cc: landley, linux-kernel On Wednesday 13 June 2001 05:40, Luigi Genoni wrote: > On Tue, 12 Jun 2001, Ben Greear wrote: > > You can tune things by setting the tcp-timeout probably..I don't > > know exactly where to set this.. > > /proc/sys/net/ipv4/tcp_fin_timeout > > default is 60. Never got that far. My problem was actually tcp_syn_retries. Remember, I was talking to a host that was unplugged. (I wasn't even getting "host unreachable" messages, the packets were just disappearing.) The default timeout in that case is rediculous do to the exponentially increasing delays between retries. 10 retries wound up being something like 20 minutes. I set it to 5 and everything works beautifully now. ssh (which retries the connection 4 times, and used to take over an hour to time out) now takes just over 3 minutes, which I can live with. Rob ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2001-06-13 15:09 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2001-06-12 21:02 Hour long timeout to ssh/telnet/ftp to down host? Rob Landley 2001-06-13 2:18 ` Ben Greear 2001-06-12 21:47 ` Rob Landley 2001-06-13 9:40 ` Luigi Genoni 2001-06-13 10:07 ` Rob Landley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).