linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Another rsync over ssh hang (repeatable, with 2.4.1 on both ends)
@ 2001-03-02  0:41 Scott Laird
  2001-03-02 10:12 ` Russell King
  2001-03-03 15:42 ` kuznet
  0 siblings, 2 replies; 10+ messages in thread
From: Scott Laird @ 2001-03-02  0:41 UTC (permalink / raw)
  To: linux-kernel


I have a fairly repeatable rsync over ssh stall that I'm seeing between
two Linux boxes, both running identical 2.4.1 kernels.  The stall is
fairly easy to repeat in our environment -- it can happen up to several
times per minute, and usually happens at least once per minute.  It
doesn't really seem to be data-sensitive.  The stall will last until the
session times out *unless* I take one of two steps to "unstall" it.  The
easiest way to do this is to run 'strace -p $PID' against the sending ssh
process.  As soon as the strace is started, rsync starts working again,
but will stall again (even with strace still running) after a short period
of time.

We've seen this bug (or a *very* similar one) with 2.2.16 and 2.4.[01].  I
haven't tried a newer 2.2.x or 2.4.2 or -acX.


One system is a P2/400, the other is a P3/800.  The two boxes are
communicating over a mostly idle Ethernet, through 3 switches.  One end is
a EEPro 100, the other end is an Acenic, although that shouldn't matter.

During a stall, the sending end shows a lot of data stuck in the Recv-Q:

Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp    72848      0 ref.lab.ocp.interna:840 ref-0.sys.pnap.net:ssh  ESTABLISHED

The receiving end shows a similar problem, but on the sending queue:

Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0  28960 ref-0.sys.pnap.net:ssh  ref.lab.ocp.interna:840 ESTABLISHED

Like I said, I don't believe that this is a network issue, because I can
un-stall the rsync by either stracing the *sending* ssh process, or by
putting the sending rsync into the background with ^Z and then popping it
back into the foreground.  I have tcpdumps that I can send, but they look
pretty straightforward to me -- the window fills, so data stops flowing.

Strace doesn't seem to be particularly informative:

<blocked, strace starts>
select(4, [0], [1], NULL, NULL) = 1 (out [1])
write(1, "xxxxxxxxxxxxx"..., 66156) = 66156
...
select(4, [0], [1 3], NULL, NULL)       = 2 (out [1 3])
write(1, "\0\0\0\0\274\2\0\0\0\0\0\0\271\30\0\0\0\0\0\0\274\2\0\0"..., 69526
<blocked again>

Strace on the receiving end shows the obvious -- it's sitting in select
waiting for data to arrive.

According to 'ps l', the ssh process is waiting in 'sock_wait_for_wmem'.

We've tried changing versions of rsync and ssh without any success.  FWIW,
this kernel was compiled with GCC 2.95.2, from Debian potato.


Scott


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2001-03-04 21:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-03-02  0:41 Another rsync over ssh hang (repeatable, with 2.4.1 on both ends) Scott Laird
2001-03-02 10:12 ` Russell King
2001-03-02 15:47   ` kuznet
2001-03-02 16:31     ` Russell King
2001-03-02 17:06       ` Ben Collins
2001-03-02 18:08       ` kuznet
2001-03-02 18:35         ` Scott Laird
2001-03-02 19:45   ` Tim Wright
2001-03-04 21:28     ` Ton Hospel
2001-03-03 15:42 ` kuznet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).