* Rsync SSH session hang, AGAIN - Help! Deadlock debugging needed.
@ 2002-10-01 14:49 Stephen D. Williams
2002-10-04 5:12 ` Rsync SSH session hang, AGAIN - cancel Stephen D. Williams
0 siblings, 1 reply; 2+ messages in thread
From: Stephen D. Williams @ 2002-10-01 14:49 UTC (permalink / raw)
To: linux-kernel
This has been a recurring problem for a couple years which I and others
have experienced. I was free from it for a while, but after upgrading
OpenSSL/OpenSSH to avoid the recent exploit it is back and highly
repeatable. This has been persistant enough that I am going to start
with the assumption that it may be a kernel bug, or at least probably
debuggable definitively only by a proficient kernel developer. We have
got to squash this once and for all; SSH is used everywhere and it needs
to be reliable. Probably there is a race condition in ssh, as mentioned
below, but it must be subtle.
rsync/ssh transfers from local system to local system work perfectly.
Between the systems, there is nearly always large delays at certain
times and usually a complete hang. After a long period, this often
produces a timeout. These sytems are on 100baseT on the same switch.
One system appears to be having mild packet loss (400 out of 400,000 on
both send and receive as frame/carrier erros). BTW, running a cpio
through the SSH connections causes a nearly immediate hang, so it is
unlikely to be a problem with rsync.
Both systems work find receiving rsync/ssh from my laptop over a 400Kb
DSL connection with:
OpenSSH 3.1p1
openssl 0.9.6c
rsync 2.5.4
gcc 2.96
kernel 2.4.19
(systems are a combination of Suse and Redhat 7.3, upgraded variously by
hand)
My standard rsync/ssh script looks like:
brsyncndz (backup rsync no delete or compression):
#!/bin/sh
if [ "$PORT" = "" ]; then PORT=22; fi
rsync -vv -HpogDtSxlra --partial --progress --stats -e "ssh -p $PORT" $*
On both sides:
OpenSSL-0.9.6g
Openssh-3.4p1
rsync-2.5.5
On 'old' system:
gcc 2.95.2
kernel 2.4.3
On 'new' system:
gcc 2.96
kernel 2.4.20-pre8
References to past discussions: (Tried the TCP buffers tuning.)
http://lists.insecure.org/linux-kernel/2001/Mar/0374.html
http://lists.insecure.org/linux-kernel/2001/Mar/0380.html
http://lists.insecure.org/linux-kernel/2001/Mar/0400.html
Haven't tried this code yet:
http://lists.insecure.org/linux-kernel/2001/Mar/0652.html
Thanks!
sdw
--
sdw@lig.net http://sdw.st
Stephen D. Williams 43392 Wayside Cir,Ashburn,VA 20147-4622
703-724-0118W 703-995-0407Fax Dec2001
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Rsync SSH session hang, AGAIN - cancel
2002-10-01 14:49 Rsync SSH session hang, AGAIN - Help! Deadlock debugging needed Stephen D. Williams
@ 2002-10-04 5:12 ` Stephen D. Williams
0 siblings, 0 replies; 2+ messages in thread
From: Stephen D. Williams @ 2002-10-04 5:12 UTC (permalink / raw)
To: Stephen D. Williams; +Cc: linux-kernel
I didn't think any of the tools that I know, trust, and have used for
years could be failing me, but I hadn't seen quite this failure mode before.
Even though the 'old' system below was reachable via web, ssh command
line sessions, etc., any connection between it and another box on the
same network at the ISP failed after 100-200KB. While there were a few
ethernet errors, nothing seemed to increase much or at all with each ssh
hang. Nevertheless, it was either a cable or hub/switch at the ISP that
was bad. I now theorize that, having a noisy port, the hub/switch
involved would sense an unacceptably bad port and shut it down, but only
when burst transfers were between local boxes. Everything remote was
spaced enough to be below that threshold. After a few seconds of no
activity, the port was apparently unlocked. I knew that switches had
bad port sensing, but had never seen a system on the fence like this.
It didn't help that these machines were 500 miles away in a colo with
only daytime staff.
Apologies for the premature post.
sdw
Stephen D. Williams wrote:
> This has been a recurring problem for a couple years which I and
> others have experienced. I was free from it for a while, but after
> upgrading OpenSSL/OpenSSH to avoid the recent exploit it is back and
> highly repeatable. This has been persistant enough that I am going to
> start with the assumption that it may be a kernel bug, or at least
> probably debuggable definitively only by a proficient kernel
> developer. We have got to squash this once and for all; SSH is used
> everywhere and it needs to be reliable. Probably there is a race
> condition in ssh, as mentioned below, but it must be subtle.
>
> rsync/ssh transfers from local system to local system work perfectly.
> Between the systems, there is nearly always large delays at certain
> times and usually a complete hang. After a long period, this often
> produces a timeout. These sytems are on 100baseT on the same switch.
> One system appears to be having mild packet loss (400 out of 400,000
> on both send and receive as frame/carrier erros). BTW, running a cpio
> through the SSH connections causes a nearly immediate hang, so it is
> unlikely to be a problem with rsync.
>
> Both systems work find receiving rsync/ssh from my laptop over a 400Kb
> DSL connection with:
> OpenSSH 3.1p1
> openssl 0.9.6c
> rsync 2.5.4
> gcc 2.96
> kernel 2.4.19
>
> (systems are a combination of Suse and Redhat 7.3, upgraded variously
> by hand)
>
> My standard rsync/ssh script looks like:
>
> brsyncndz (backup rsync no delete or compression):
> #!/bin/sh
> if [ "$PORT" = "" ]; then PORT=22; fi
> rsync -vv -HpogDtSxlra --partial --progress --stats -e "ssh -p $PORT" $*
>
> On both sides:
> OpenSSL-0.9.6g
> Openssh-3.4p1
> rsync-2.5.5
>
> On 'old' system:
> gcc 2.95.2
> kernel 2.4.3
>
> On 'new' system:
> gcc 2.96
> kernel 2.4.20-pre8
>
>
> References to past discussions: (Tried the TCP buffers tuning.)
> http://lists.insecure.org/linux-kernel/2001/Mar/0374.html
> http://lists.insecure.org/linux-kernel/2001/Mar/0380.html
> http://lists.insecure.org/linux-kernel/2001/Mar/0400.html
> Haven't tried this code yet:
> http://lists.insecure.org/linux-kernel/2001/Mar/0652.html
>
> Thanks!
> sdw
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2002-10-04 5:07 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-01 14:49 Rsync SSH session hang, AGAIN - Help! Deadlock debugging needed Stephen D. Williams
2002-10-04 5:12 ` Rsync SSH session hang, AGAIN - cancel Stephen D. Williams
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).