linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* (no subject)
@ 2005-06-16 23:08 trmcneal
  2005-06-16 23:32 ` your mail Chris Wedgwood
  0 siblings, 1 reply; 3+ messages in thread
From: trmcneal @ 2005-06-16 23:08 UTC (permalink / raw)
  To: linux-kernel

I posted this on linux-net, and got only one reply, saying that they
had run into this as well.  Not much help.  I'm thinking that I need to 
tune the kernel somehow to prevent this, but I'm not sure how.  It doesn't 
happen on Solaris, and seems to be a resource issue, but its probably 
not memory. Here are the details:

> I've been working with some tcp network test programs that have
> multiple clients opening nonblocking sockets to a single server port,
> sending a short message, and then closing the socket, 100,000 times.
> Since the socket is non-blocking, it generally tries to connect and then
> does a poll since the socket is busy.  The test fails if the poll times out in
> 10 seconds.  It fails consistently on Linux servers but succeeds on Solaris
> servers; the client is a non-issue unless its loopback on the Linux server.

> This issue crops up both on 2.4 and 2.6 kernels; the main feature seen
> in tcp dumps is that the server gets inundated with retrys, and then starts
> sending RST responses to everything (labelled as ZeroWindow in a
> Ethereal dump).  One interesting feature is that many clients thinks the
> 3-way connection handshake is complete, when the server actually
> doesn't get the final ACK; The client starts sending and then retransmitting
> the data, and the server keeps sending back the SYN/ACK part of the
> handshake.  Another interesting feature is a group of retries from various
> client ports, followed by a group of ACK,RST responses from the server,
> followed by the ZeroWindow RST to everything.

> On Linux, the only way to succeed is to use remote clients - thus avoiding
> extra work on the server -and changing test parameters to put in a longer
> server delay, which is inserted between the closing of one connection and
> the opening of the next. I'm assuming that the tcp network queues are just
> getting overloaded with the data retransmissions, but I don't know enough
> about the queue management yet.  Any suggestions/pointers/fixes?

Any ideas about tuning, or what resource I'm running out of?  

Regards -

Tom McNeal

--
Tom McNeal
(650)906-0761(cell)
(650)964-8459(fax)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: your mail
  2005-06-16 23:08 trmcneal
@ 2005-06-16 23:32 ` Chris Wedgwood
  2005-06-17  1:46   ` Tom McNeal
  0 siblings, 1 reply; 3+ messages in thread
From: Chris Wedgwood @ 2005-06-16 23:32 UTC (permalink / raw)
  To: trmcneal; +Cc: linux-kernel

On Thu, Jun 16, 2005 at 11:08:28PM +0000, trmcneal@comcast.net wrote:

> > I've been working with some tcp network test programs that have
> > multiple clients opening nonblocking sockets to a single server
> > port, sending a short message, and then closing the socket,
> > 100,000 times.  Since the socket is non-blocking, it generally
> > tries to connect and then does a poll since the socket is busy.
> > The test fails if the poll times out in 10 seconds.  It fails
> > consistently on Linux servers but succeeds on Solaris servers; the
> > client is a non-issue unless its loopback on the Linux server.

where is the code for this?  are you sure you're not overflowing the
listen backlog somewhere?  that would show up in some cases but not
all depending on latencies and local scheduler behavior

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: your mail
  2005-06-16 23:32 ` your mail Chris Wedgwood
@ 2005-06-17  1:46   ` Tom McNeal
  0 siblings, 0 replies; 3+ messages in thread
From: Tom McNeal @ 2005-06-17  1:46 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: linux-kernel

I'll look at that.  This occurs on all Linux platforms, including a generic
2.4.31 I downloaded from kernel.org. The user test is trivial, just doing
the nonblocking connect, the poll, the send, and then the close, in that loop.

Tom

Chris Wedgwood wrote:
> On Thu, Jun 16, 2005 at 11:08:28PM +0000, trmcneal@comcast.net wrote:
> 
> 
>>>I've been working with some tcp network test programs that have
>>>multiple clients opening nonblocking sockets to a single server
>>>port, sending a short message, and then closing the socket,
>>>100,000 times.  Since the socket is non-blocking, it generally
>>>tries to connect and then does a poll since the socket is busy.
>>>The test fails if the poll times out in 10 seconds.  It fails
>>>consistently on Linux servers but succeeds on Solaris servers; the
>>>client is a non-issue unless its loopback on the Linux server.
> 
> 
> where is the code for this?  are you sure you're not overflowing the
> listen backlog somewhere?  that would show up in some cases but not
> all depending on latencies and local scheduler behavior
> 

-- 
Tom McNeal
(650)906-0761(cell)
(650)964-8459(fax)
Email: trmcneal@comcast.net

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2005-06-17  1:43 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-06-16 23:08 trmcneal
2005-06-16 23:32 ` your mail Chris Wedgwood
2005-06-17  1:46   ` Tom McNeal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).