All of lore.kernel.org
 help / color / mirror / Atom feed
* bug in select(2) regarding non-blocking connect(2) completion?
@ 2011-05-07 10:51 Michael Shuldman
  2011-05-07 12:12 ` Eric Dumazet
  0 siblings, 1 reply; 3+ messages in thread
From: Michael Shuldman @ 2011-05-07 10:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: David S. Miller, karls


Hello, I am occasionally encountering what I belive is a bug in the
kernel.

Below is a strace that I believe shows how the bug manifests itself,
with my comments.


# first select.  All fd's in the write set ([15 17 ... 51 55]) are 
# non-blocking sockets that have had a connect(2) previously issued on
# them, and which have yet to finish connecting as far as we know
# at the time we call select(2).
03:55:31.808548 select(58, [4 8 11 12 13 14 16 18 19 20 21 22 23 24 26 27 30 31
32 33 34 35 36 37 39 40 41 43 44 46 48 49 50 52 53 54 57], [15 17 25 29 45 47 51
 55], [11 12 13 14 16 18 19 20 21 22 23 24 26 27 30 31 32 33 34 35 36 37 39 40 4
1 43 44 46 48 49 50 52 53 54 57], {1, 0}) = 3 (in [16 26], out [51], left {1, 0}
)

# As indicated by the results returned by the above select(2), fd 51 should
# have finished the connect attempt, but when we try to find out whether 
# the connect(2) succeeded or failed, the results are conflicting.

03:55:31.808622 getpeername(51, 0x7fff5d2eaa8c, [0]) = -1 ENOTCONN (Transport en
dpoint is not connected)
03:55:31.808900 getsockopt(51, SOL_SOCKET, SO_ERROR, [0], [4]) = 0

# getpeername(2) failing on a socket that has finished connecting should 
# indicate that the connect(2) failed.  Yet when we try to fetch the
# SO_ERROR of the socket, it says no error is currently set.
# We then loop around with select(2) again, and again the same thing
# happens:

03:55:31.809259 select(58, [4 8 11 12 13 14 16 18 19 20 21 22 23 24 26 27 30 31
32 33 34 35 36 37 39 40 41 43 44 46 48 49 50 52 53 54 57], [15 17 25 29 45 47 51
 55], [11 12 13 14 16 18 19 20 21 22 23 24 26 27 30 31 32 33 34 35 36 37 39 40 4
1 43 44 46 48 49 50 52 53 54 57], {1, 0}) = 3 (in [16 26], out [51], left {1, 0}
)
03:55:31.809329 getpeername(51, 0x7fff5d2eaa8c, [0]) = -1 ENOTCONN (Transport en
dpoint is not connected)
03:55:31.809640 getsockopt(51, SOL_SOCKET, SO_ERROR, [0], [4]) = 0

...

# finally, getsockopt(2) returns that the connect(2) failed.
03:55:32.521146 getpeername(51, 0x7fff5d2eaa8c, [0]) = -1 ENOTCONN (Transport en
dpoint is not connected)
03:55:32.521614 getsockopt(51, SOL_SOCKET, SO_ERROR, [101], [4]) = 0

In other words, select(2) says the socket has finished connecting,
getpeername(2) neither confirms nor denies this (it can only confirm
if the connect finished successfully).  getsockopt(2) and SO_ERROR
however says there is no error on the socket, which coupled
with getpeername(2) failing, indicates that the connect(2) has
not yet finished



This does not happen all the time.  E.g., I watched the system for
an hour yesterday, as things were staring up and the number of
concurrent tcp clients gradually increased from zero to around 700,
with no observable problems.  However after a while, the problem
starts occurring, related to an increasing number of clients or
something else, I do not know.

Currently the system has a little over 3,000 clients and the problem
occurs now and then, sometimes several times a minute, while sometimes
it can take dozens of minutes between each time.  At the moment,
the last time the problem was detected was 40 minutes ago.

The software the above strace is related to is a proxy server, and
if there are 3000 clients (incoming TCP sessions), there would
normally be 3000 outgoing TCP sessions also.  

uname -a on the system in question reports 
2.6.18-238.9.1.el5 #1 SMP Tue Apr 12 18:10:13 EDT 2011 x86_64 x86_64
x86_64 GNU/Linux

Thankful for any hints or pointers related to this problem.
With kind regards,

-- 
  _ // 
  \X/ -- Michael Shuldman 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: bug in select(2) regarding non-blocking connect(2) completion?
  2011-05-07 10:51 bug in select(2) regarding non-blocking connect(2) completion? Michael Shuldman
@ 2011-05-07 12:12 ` Eric Dumazet
  2011-05-07 16:06   ` Michael Shuldman
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Dumazet @ 2011-05-07 12:12 UTC (permalink / raw)
  To: Michael Shuldman; +Cc: linux-kernel, David S. Miller, karls, netdev

Le samedi 07 mai 2011 à 12:51 +0200, Michael Shuldman a écrit :
> Hello, I am occasionally encountering what I belive is a bug in the
> kernel.
> 
> Below is a strace that I believe shows how the bug manifests itself,
> with my comments.
> 
> 
> # first select.  All fd's in the write set ([15 17 ... 51 55]) are 
> # non-blocking sockets that have had a connect(2) previously issued on
> # them, and which have yet to finish connecting as far as we know
> # at the time we call select(2).

We dont see the return from connect() : maybe the error was already
returned there.

Only EINPROGRESS is valid here (or fd should be closed right now)

> 03:55:31.808548 select(58, [4 8 11 12 13 14 16 18 19 20 21 22 23 24 26 27 30 31
> 32 33 34 35 36 37 39 40 41 43 44 46 48 49 50 52 53 54 57], [15 17 25 29 45 47 51
>  55], [11 12 13 14 16 18 19 20 21 22 23 24 26 27 30 31 32 33 34 35 36 37 39 40 4
> 1 43 44 46 48 49 50 52 53 54 57], {1, 0}) = 3 (in [16 26], out [51], left {1, 0}
> )
> 
> # As indicated by the results returned by the above select(2), fd 51 should
> # have finished the connect attempt, but when we try to find out whether 
> # the connect(2) succeeded or failed, the results are conflicting.
> 

If connect() attempt is rejected by remote peer, then select() says your
fd is 'writeable', in the sense you have the definitive answer to your
non blocking connect().

> 03:55:31.808622 getpeername(51, 0x7fff5d2eaa8c, [0]) = -1 ENOTCONN (Transport en
> dpoint is not connected)

This means end point is non connected : other peer sent RST or no answer
to SYN packets.


> 03:55:31.808900 getsockopt(51, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
> 

Hmm, interesting... Are you sure a previous call was not already done
(since this clears the error) ?

> # getpeername(2) failing on a socket that has finished connecting should 
> # indicate that the connect(2) failed.  Yet when we try to fetch the
> # SO_ERROR of the socket, it says no error is currently set.
> # We then loop around with select(2) again, and again the same thing
> # happens:
> 
> 03:55:31.809259 select(58, [4 8 11 12 13 14 16 18 19 20 21 22 23 24 26 27 30 31
> 32 33 34 35 36 37 39 40 41 43 44 46 48 49 50 52 53 54 57], [15 17 25 29 45 47 51
>  55], [11 12 13 14 16 18 19 20 21 22 23 24 26 27 30 31 32 33 34 35 36 37 39 40 4
> 1 43 44 46 48 49 50 52 53 54 57], {1, 0}) = 3 (in [16 26], out [51], left {1, 0}
> )
> 03:55:31.809329 getpeername(51, 0x7fff5d2eaa8c, [0]) = -1 ENOTCONN (Transport en
> dpoint is not connected)
> 03:55:31.809640 getsockopt(51, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
> 

Well, if you missed the original error report, all next getpeername()
and SO_ERROR will do the same, and select() says fd is ready for 'write'

> ...
> 
> # finally, getsockopt(2) returns that the connect(2) failed.
> 03:55:32.521146 getpeername(51, 0x7fff5d2eaa8c, [0]) = -1 ENOTCONN (Transport en
> dpoint is not connected)
> 03:55:32.521614 getsockopt(51, SOL_SOCKET, SO_ERROR, [101], [4]) = 0
> 
> In other words, select(2) says the socket has finished connecting,
> getpeername(2) neither confirms nor denies this (it can only confirm
> if the connect finished successfully).  getsockopt(2) and SO_ERROR
> however says there is no error on the socket, which coupled
> with getpeername(2) failing, indicates that the connect(2) has
> not yet finished
> 
> 
> 
> This does not happen all the time.  E.g., I watched the system for
> an hour yesterday, as things were staring up and the number of
> concurrent tcp clients gradually increased from zero to around 700,
> with no observable problems.  However after a while, the problem
> starts occurring, related to an increasing number of clients or
> something else, I do not know.
> 
> Currently the system has a little over 3,000 clients and the problem
> occurs now and then, sometimes several times a minute, while sometimes
> it can take dozens of minutes between each time.  At the moment,
> the last time the problem was detected was 40 minutes ago.
> 
> The software the above strace is related to is a proxy server, and
> if there are 3000 clients (incoming TCP sessions), there would
> normally be 3000 outgoing TCP sessions also.  
> 
> uname -a on the system in question reports 
> 2.6.18-238.9.1.el5 #1 SMP Tue Apr 12 18:10:13 EDT 2011 x86_64 x86_64
> x86_64 GNU/Linux
> 
> Thankful for any hints or pointers related to this problem.
> With kind regards,
> 

Make sure you dont miss an error in connect() system call.




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: bug in select(2) regarding non-blocking connect(2) completion?
  2011-05-07 12:12 ` Eric Dumazet
@ 2011-05-07 16:06   ` Michael Shuldman
  0 siblings, 0 replies; 3+ messages in thread
From: Michael Shuldman @ 2011-05-07 16:06 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, David S. Miller, karls, netdev

Eric Dumazet wrote,
> Well, if you missed the original error report, all next getpeername()
> and SO_ERROR will do the same, and select() says fd is ready for 'write'

Many thanks for that.  I was not aware that SO_ERROR would not get
set if the connect(2) failed "permanently" immediately, but have
now tested that it does.  The other "strangeness" might very well
be caused by bugs related to this, so apologies for the noise and
many thanks for the help.

With kind regards,

-- 
  _ // 
  \X/ -- Michael Shuldman 


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-05-07 16:06 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-07 10:51 bug in select(2) regarding non-blocking connect(2) completion? Michael Shuldman
2011-05-07 12:12 ` Eric Dumazet
2011-05-07 16:06   ` Michael Shuldman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.