All of lore.kernel.org
 help / color / mirror / Atom feed
* Fw: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
@ 2007-02-16  0:04 Stephen Hemminger
  2007-02-16 15:25 ` Evgeniy Polyakov
  0 siblings, 1 reply; 17+ messages in thread
From: Stephen Hemminger @ 2007-02-16  0:04 UTC (permalink / raw)
  To: netdev

Someone want to take a stab at fixing this??

Begin forwarded message:

Date: Wed, 14 Feb 2007 19:32:52 -0800
From: bugme-daemon@bugzilla.kernel.org
To: shemminger@linux-foundation.org
Subject: [Bug 8013] New: select for write hangs on a socket after write
returned ECONNRESET


http://bugzilla.kernel.org/show_bug.cgi?id=8013

           Summary: select for write hangs on a socket after write
returned ECONNRESET
    Kernel Version: 2.6.16
            Status: NEW
          Severity: normal
             Owner: shemminger@osdl.org
         Submitter: ajd@gentrack.com


Distribution: Debian
Also reproduced on: 2.4 based Redhat.

Hardware Environment:
i686/Xeon
Problem Description:

If you write() to a disconnected socket, write returns ECONNRESET.
If you then select() on that socket, checking for write, the select
never returns.

For example from strace:
write(4, "fred", 4)                     = 4
...
write(4, "fred", 4)                     = -1 ECONNRESET (Connection
reset by peer)
select(5, NULL, [4], NULL, NULL ... hung in select

The select documentation says "those in writefds will be watched to see
if a write will not block".
A write on this socket will not block, therefore select should return 
immediately.

When the program is run on Solaris, AIX and HPUX, the select returns 
immediately.

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fw: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
  2007-02-16  0:04 Fw: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET Stephen Hemminger
@ 2007-02-16 15:25 ` Evgeniy Polyakov
  2007-02-16 15:28   ` Evgeniy Polyakov
  0 siblings, 1 reply; 17+ messages in thread
From: Evgeniy Polyakov @ 2007-02-16 15:25 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 506 bytes --]

On Thu, Feb 15, 2007 at 04:04:05PM -0800, Stephen Hemminger (shemminger@linux-foundation.org) wrote:
> Someone want to take a stab at fixing this??

Works for me with attached application - select on connreset socket
immediately returns one ready descriptor.

$ ./a.out 192.168.0.48 22
select: err: 1, errno: 0.
write: err: 128, errno: 0.
select: err: 1, errno: 0.
write: err: 128, errno: 0.
select: err: 1, errno: 0.
write: err: -1, errno: 104.
on exit: select: err: 1, errno: 104.

-- 
	Evgeniy Polyakov

[-- Attachment #2: select.c --]
[-- Type: text/plain, Size: 1789 bytes --]

#include <sys/types.h>
#include <sys/socket.h>
#include <sys/resource.h>
#include <sys/wait.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <sys/poll.h>
#include <sys/sendfile.h>

#include <netinet/in.h>
#include <netinet/tcp.h>
#include <arpa/inet.h>

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <fcntl.h>
#include <time.h>
#include <ctype.h>
#include <netdb.h>

#define ulog(f, a...) fprintf(stderr, f, ##a)
#define ulog_err(f, a...) ulog(f ": %s [%d].\n", ##a, strerror(errno), errno)

int main(int argc, char *argv[])
{
	struct hostent *h;
	int s, err;
	char buf[128];
	struct sockaddr_in sa;
	fd_set rfds;
	struct timeval tv;

	if (argc != 3) {
		ulog("Usage: %s <addr> <port>\n", argv[0]);
		return -1;
	}
	
	h = gethostbyname(argv[1]);
	if (!h) {
		ulog_err("%s: Failed to get address of %s.\n", __func__, argv[1]);
		return -1;
	}

	s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
	if (s == -1) {
		ulog_err("%s: Failed to create server socket", __func__);
		return -1;
	}
	
	memcpy(&(sa.sin_addr.s_addr), h->h_addr_list[0], 4);
	sa.sin_port = htons(atoi(argv[2]));
	sa.sin_family = AF_INET;

	if (connect(s, (struct sockaddr *)&sa, sizeof(sa))) {
		ulog_err("%s: Failed to connect to %s:%s", __func__, argv[1], argv[2]);
		return -1;
	}

	do {
		FD_ZERO(&rfds);
		FD_SET(s, &rfds);

		tv.tv_sec = 500;
		tv.tv_usec = 0;

		err = select(s+1, &rfds, NULL, NULL, &tv);
		ulog("select: err: %d, errno: %d.\n", err, errno);

		err = write(s, buf, sizeof(buf));
		ulog("write: err: %d, errno: %d.\n", err, errno);
	} while (err > 0);

	FD_ZERO(&rfds);
	FD_SET(s, &rfds);

	tv.tv_sec = 500;
	tv.tv_usec = 0;

	err = select(s+1, &rfds, NULL, NULL, &tv);

	ulog("on exit: select: err: %d, errno: %d.\n", err, errno);
}

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fw: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
  2007-02-16 15:25 ` Evgeniy Polyakov
@ 2007-02-16 15:28   ` Evgeniy Polyakov
  2007-02-16 15:39     ` Evgeniy Polyakov
  0 siblings, 1 reply; 17+ messages in thread
From: Evgeniy Polyakov @ 2007-02-16 15:28 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

On Fri, Feb 16, 2007 at 06:25:57PM +0300, Evgeniy Polyakov (johnpol@2ka.mipt.ru) wrote:
> On Thu, Feb 15, 2007 at 04:04:05PM -0800, Stephen Hemminger (shemminger@linux-foundation.org) wrote:
> > Someone want to take a stab at fixing this??
> 
> Works for me with attached application - select on connreset socket
> immediately returns one ready descriptor.
> 
> $ ./a.out 192.168.0.48 22
> select: err: 1, errno: 0.
> write: err: 128, errno: 0.
> select: err: 1, errno: 0.
> write: err: 128, errno: 0.
> select: err: 1, errno: 0.
> write: err: -1, errno: 104.
> on exit: select: err: 1, errno: 104.

Nope, fails if file descriptor is checked for write.

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fw: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
  2007-02-16 15:28   ` Evgeniy Polyakov
@ 2007-02-16 15:39     ` Evgeniy Polyakov
  2007-02-16 15:54       ` Evgeniy Polyakov
  0 siblings, 1 reply; 17+ messages in thread
From: Evgeniy Polyakov @ 2007-02-16 15:39 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

On Fri, Feb 16, 2007 at 06:28:58PM +0300, Evgeniy Polyakov (johnpol@2ka.mipt.ru) wrote:
> On Fri, Feb 16, 2007 at 06:25:57PM +0300, Evgeniy Polyakov (johnpol@2ka.mipt.ru) wrote:
> > On Thu, Feb 15, 2007 at 04:04:05PM -0800, Stephen Hemminger (shemminger@linux-foundation.org) wrote:
> > > Someone want to take a stab at fixing this??

Ok, I've fund a reason - select() only checks if valid output condition
is 
#define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR)

Linux never sets POLLERR on shutdown - only on socket error, on shutdown
POLLHUP is setup instead, so reading always shows that:

#define POLLIN_SET (POLLRDNORM | POLLRDBAND | POLLIN | POLLHUP |
POLLERR)

A 'fix' can be to add POLLHUP into POLLOUT_SET or workaround that in
application to check both input and output events in select() or use
poll and explicitly check returned mask.

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fw: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
  2007-02-16 15:39     ` Evgeniy Polyakov
@ 2007-02-16 15:54       ` Evgeniy Polyakov
  2007-02-16 16:10         ` Evgeniy Polyakov
  0 siblings, 1 reply; 17+ messages in thread
From: Evgeniy Polyakov @ 2007-02-16 15:54 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

On Fri, Feb 16, 2007 at 06:39:58PM +0300, Evgeniy Polyakov (johnpol@2ka.mipt.ru) wrote:
> On Fri, Feb 16, 2007 at 06:28:58PM +0300, Evgeniy Polyakov (johnpol@2ka.mipt.ru) wrote:
> > On Fri, Feb 16, 2007 at 06:25:57PM +0300, Evgeniy Polyakov (johnpol@2ka.mipt.ru) wrote:
> > > On Thu, Feb 15, 2007 at 04:04:05PM -0800, Stephen Hemminger (shemminger@linux-foundation.org) wrote:
> > > > Someone want to take a stab at fixing this??
> 
> Ok, I've fund a reason - select() only checks if valid output condition
> is 
> #define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR)
> 
> Linux never sets POLLERR on shutdown - only on socket error, on shutdown
> POLLHUP is setup instead, so reading always shows that:
> 
> #define POLLIN_SET (POLLRDNORM | POLLRDBAND | POLLIN | POLLHUP |
> POLLERR)
> 
> A 'fix' can be to add POLLHUP into POLLOUT_SET or workaround that in
> application to check both input and output events in select() or use
> poll and explicitly check returned mask.

Here is mask returned from poll() in the described case:

pollin
[ 2000.564000] do_select: 2076: mask: 104, i: 3/4, j: 3/32, ret: 0, in:

pollin/pollout
[ 2000.572000] do_select: 2076: mask: 145, i: 3/4, j: 3/32, ret: 1, in:
[ 2000.572000] do_select: 2076: mask: 145, i: 3/4, j: 3/32, ret: 1, in:

error is set! but only once since sock_error() replaces sk->sk_err with
zero. Also pollhup is set.
[ 2000.572000] do_select: 2076: mask: 2059, i: 3/4, j: 3/32, ret: 1, in:

Only pollhup is set without pollerr.
[ 2000.572000] do_select: 2076: mask: 2051, i: 3/4, j: 3/32, ret: 0, in:
[ 2014.400000] do_select: 2076: mask: 2051, i: 3/4, j: 3/32, ret: 0, in:

Above sock_error() is called through the following codepath:

write() -> tcp_sendmsg() -> disconnect detected -> sk_stream_error() ->
sock_error() ==> ends up with zero sk->sk_err.

select() -> tcp_poll() ==> check for sk->sk_err fails and thus poller is
not set and tcp_poll() does not return that bit, so select() fails to
say that file descriptor is ready for output.

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fw: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
  2007-02-16 15:54       ` Evgeniy Polyakov
@ 2007-02-16 16:10         ` Evgeniy Polyakov
  2007-02-16 18:29           ` Stephen Hemminger
  0 siblings, 1 reply; 17+ messages in thread
From: Evgeniy Polyakov @ 2007-02-16 16:10 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

one of the possible fixes for select() after write() after ECONNRESET.

Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index b67e0dd..661ca0c 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -365,7 +365,7 @@ unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
 	 * blocking on fresh not-connected or disconnected socket. --ANK
 	 */
 	if (sk->sk_shutdown == SHUTDOWN_MASK || sk->sk_state == TCP_CLOSE)
-		mask |= POLLHUP;
+		mask |= POLLHUP | POLLERR;
 	if (sk->sk_shutdown & RCV_SHUTDOWN)
 		mask |= POLLIN | POLLRDNORM | POLLRDHUP;
 

-- 
	Evgeniy Polyakov

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
  2007-02-16 16:10         ` Evgeniy Polyakov
@ 2007-02-16 18:29           ` Stephen Hemminger
  2007-02-16 18:34             ` Evgeniy Polyakov
  0 siblings, 1 reply; 17+ messages in thread
From: Stephen Hemminger @ 2007-02-16 18:29 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: netdev

On Fri, 16 Feb 2007 19:10:45 +0300
Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote:

> one of the possible fixes for select() after write() after ECONNRESET.
> 
> Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index b67e0dd..661ca0c 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -365,7 +365,7 @@ unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
>  	 * blocking on fresh not-connected or disconnected socket. --ANK
>  	 */
>  	if (sk->sk_shutdown == SHUTDOWN_MASK || sk->sk_state == TCP_CLOSE)
> -		mask |= POLLHUP;
> +		mask |= POLLHUP | POLLERR;
>  	if (sk->sk_shutdown & RCV_SHUTDOWN)
>  		mask |= POLLIN | POLLRDNORM | POLLRDHUP;
>  
> 

No, that would end up setting error bit on normal shutdown. This is incorrect 
behaviour and might even be described in SUSE standard.

There might even be LSB tests on this?

-- 
Stephen Hemminger <shemminger@linux-foundation.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
  2007-02-16 18:29           ` Stephen Hemminger
@ 2007-02-16 18:34             ` Evgeniy Polyakov
  2007-02-17 16:25               ` Evgeniy Polyakov
  2007-02-22  6:45               ` David Miller
  0 siblings, 2 replies; 17+ messages in thread
From: Evgeniy Polyakov @ 2007-02-16 18:34 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

On Fri, Feb 16, 2007 at 10:29:14AM -0800, Stephen Hemminger (shemminger@linux-foundation.org) wrote:
> On Fri, 16 Feb 2007 19:10:45 +0300
> Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote:
> 
> > one of the possible fixes for select() after write() after ECONNRESET.
> > 
> > Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
> > 
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index b67e0dd..661ca0c 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -365,7 +365,7 @@ unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
> >  	 * blocking on fresh not-connected or disconnected socket. --ANK
> >  	 */
> >  	if (sk->sk_shutdown == SHUTDOWN_MASK || sk->sk_state == TCP_CLOSE)
> > -		mask |= POLLHUP;
> > +		mask |= POLLHUP | POLLERR;
> >  	if (sk->sk_shutdown & RCV_SHUTDOWN)
> >  		mask |= POLLIN | POLLRDNORM | POLLRDHUP;
> >  
> > 
> 
> No, that would end up setting error bit on normal shutdown. This is incorrect 
> behaviour and might even be described in SUSE standard.
> 
> There might even be LSB tests on this?

Maybe - I do not have it handy.
Otherwise we can extend select output mask to include hungup too
(getting into account that hungup is actually output event).

> -- 
> Stephen Hemminger <shemminger@linux-foundation.org>

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
  2007-02-16 18:34             ` Evgeniy Polyakov
@ 2007-02-17 16:25               ` Evgeniy Polyakov
  2007-02-19 14:19                 ` Jarek Poplawski
  2007-02-22  6:47                 ` David Miller
  2007-02-22  6:45               ` David Miller
  1 sibling, 2 replies; 17+ messages in thread
From: Evgeniy Polyakov @ 2007-02-17 16:25 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

On Fri, Feb 16, 2007 at 09:34:27PM +0300, Evgeniy Polyakov (johnpol@2ka.mipt.ru) wrote:
> Otherwise we can extend select output mask to include hungup too
> (getting into account that hungup is actually output event).

This is another possible way to fix select after write after connection
reset.

Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>

diff --git a/fs/select.c b/fs/select.c
index fe0893a..f2a8e46 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -186,7 +186,7 @@ get_max:
 #define SET(i,m)	(*(m) |= (i))
 
 #define POLLIN_SET (POLLRDNORM | POLLRDBAND | POLLIN | POLLHUP | POLLERR)
-#define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR)
+#define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR | POLLHUP)
 #define POLLEX_SET (POLLPRI)
 
 int do_select(int n, fd_set_bits *fds, s64 *timeout)

-- 
	Evgeniy Polyakov

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
  2007-02-17 16:25               ` Evgeniy Polyakov
@ 2007-02-19 14:19                 ` Jarek Poplawski
  2007-02-22  6:47                 ` David Miller
  1 sibling, 0 replies; 17+ messages in thread
From: Jarek Poplawski @ 2007-02-19 14:19 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: Stephen Hemminger, netdev

On 17-02-2007 17:25, Evgeniy Polyakov wrote:
> On Fri, Feb 16, 2007 at 09:34:27PM +0300, Evgeniy Polyakov (johnpol@2ka.mipt.ru) wrote:
>> Otherwise we can extend select output mask to include hungup too
>> (getting into account that hungup is actually output event).
> 
> This is another possible way to fix select after write after connection
> reset.

I hope you know what you are doing and that this will
change functionality for some users.

In my opinion it looks like a problem with interpretation
and not a bug. From tcp.c:

"
       * Some poll() documentation says that POLLHUP is incompatible
       * with the POLLOUT/POLLWR flags, so somebody should check this
       * all. But careful, it tends to be safer to return too many
       * bits than too few, and you can easily break real applications
       * if you don't tell them that something has hung up!
...
       * Actually, it is interesting to look how Solaris and DUX
       * solve this dilemma. I would prefer, if PULLHUP were maskable,
       * then we could set it on SND_SHUTDOWN. BTW examples given
       * in Stevens' books assume exactly this behaviour, it explains
       * why PULLHUP is incompatible with POLLOUT.    --ANK
       *
       * NOTE. Check for TCP_CLOSE is added. The goal is to prevent
       * blocking on fresh not-connected or disconnected socket. --ANK
       */"

So it seems ANK hesitated and somebody choose not to do
this - maybe for some reason... 

Regards,
Jarek P.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
  2007-02-16 18:34             ` Evgeniy Polyakov
  2007-02-17 16:25               ` Evgeniy Polyakov
@ 2007-02-22  6:45               ` David Miller
  2007-02-22 12:14                 ` Evgeniy Polyakov
  1 sibling, 1 reply; 17+ messages in thread
From: David Miller @ 2007-02-22  6:45 UTC (permalink / raw)
  To: johnpol; +Cc: shemminger, netdev

From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Date: Fri, 16 Feb 2007 21:34:27 +0300

> Otherwise we can extend select output mask to include hungup too
> (getting into account that hungup is actually output event).

POLLHUP is non-maskable and this is very clearly defined in
just about every Unix definition of poll().

This non-maskability leads to all the strange semantics in this area.

There is simply no way for sockets to return HUP for someone who
is only waiting to write.  Definitions of POLLHUP include language
such as "This event and POLLOUT are mutually exclusive".

I reviewed the current code and the many (and I mean MANY) threads on
this topic that have occurred over the years going back to even 1999,
and what we're doing now is the best possible set of semantics we can
provide without breaking any existing applications.

How many others have gone into the archives and actually looked at
the discussions?   That appears to be going out of style :-/

Now, on the read side, for the epoll folks, we added a POLLRDHUP
to handle that case.  It's not relevant here, but I'm mentioning it
in passing since it's roughly related and it shows part of the devil
in dealing with POLLHUP semantics.

POLLHUP is a complete mess, because it means different things on
different kinds of fds.  And exactly, because it is not maskable,
this makes it totally useless for sockets.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
  2007-02-17 16:25               ` Evgeniy Polyakov
  2007-02-19 14:19                 ` Jarek Poplawski
@ 2007-02-22  6:47                 ` David Miller
  1 sibling, 0 replies; 17+ messages in thread
From: David Miller @ 2007-02-22  6:47 UTC (permalink / raw)
  To: johnpol; +Cc: shemminger, netdev

From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Date: Sat, 17 Feb 2007 19:25:33 +0300

> On Fri, Feb 16, 2007 at 09:34:27PM +0300, Evgeniy Polyakov (johnpol@2ka.mipt.ru) wrote:
> > Otherwise we can extend select output mask to include hungup too
> > (getting into account that hungup is actually output event).
> 
> This is another possible way to fix select after write after connection
> reset.
> 
> Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
 ...
>  #define POLLIN_SET (POLLRDNORM | POLLRDBAND | POLLIN | POLLHUP | POLLERR)
> -#define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR)
> +#define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR | POLLHUP)
>  #define POLLEX_SET (POLLPRI)

POLLHUP is absolutely not maskable, this is very deeply entrenched
in the poll() semantics and definitions.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
  2007-02-22  6:45               ` David Miller
@ 2007-02-22 12:14                 ` Evgeniy Polyakov
  2007-02-22 13:31                   ` David Miller
  0 siblings, 1 reply; 17+ messages in thread
From: Evgeniy Polyakov @ 2007-02-22 12:14 UTC (permalink / raw)
  To: David Miller; +Cc: shemminger, netdev

On Wed, Feb 21, 2007 at 10:45:43PM -0800, David Miller (davem@davemloft.net) wrote:
> POLLHUP is a complete mess, because it means different things on
> different kinds of fds.  And exactly, because it is not maskable,
> this makes it totally useless for sockets.

POLLHUP does show that socket can not be used for transfer, that is what
select() expects to receive. Given select() semantic and the fact, that
socket error is cleared, there is no simple way to say that socket is
not suitable for output - actually not - all reset events ends up with
tcp_done() which marks socket as close, so if in polling time we have
closed socket, it can be considered as error and thus we can add POLLERR
into the mask - so we can extend following code in poll:

if (sk->sk_shutdown == SHUTDOWN_MASK || sk->sk_state == TCP_CLOSE)
	mask |= POLLHUP;

to

if (sk->sk_shutdown == SHUTDOWN_MASK || sk->sk_state == TCP_CLOSE)
	mask |= POLLHUP | (sk->sk_state == TCP_CLOSE)?POLLERR:0;


Thoughts?

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
  2007-02-22 12:14                 ` Evgeniy Polyakov
@ 2007-02-22 13:31                   ` David Miller
  2007-02-22 13:46                     ` Evgeniy Polyakov
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2007-02-22 13:31 UTC (permalink / raw)
  To: johnpol; +Cc: shemminger, netdev

From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Date: Thu, 22 Feb 2007 15:14:27 +0300

> if (sk->sk_shutdown == SHUTDOWN_MASK || sk->sk_state == TCP_CLOSE)
> 	mask |= POLLHUP;
> 
> to
> 
> if (sk->sk_shutdown == SHUTDOWN_MASK || sk->sk_state == TCP_CLOSE)
> 	mask |= POLLHUP | (sk->sk_state == TCP_CLOSE)?POLLERR:0;
> 
> 
> Thoughts?

TCP_CLOSE is where we end up on a non-error close too, this has
the same kind of bug as your previous attempt to set POLLERR
here.

One side gets TCP_TIMEWAIT the other goes straight to TCP_CLOSE.

It really is not possible to change current semantics, they are the
best possible unfortunately.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
  2007-02-22 13:31                   ` David Miller
@ 2007-02-22 13:46                     ` Evgeniy Polyakov
  2007-02-22 14:08                       ` David Miller
  0 siblings, 1 reply; 17+ messages in thread
From: Evgeniy Polyakov @ 2007-02-22 13:46 UTC (permalink / raw)
  To: David Miller; +Cc: shemminger, netdev

On Thu, Feb 22, 2007 at 05:31:38AM -0800, David Miller (davem@davemloft.net) wrote:
> TCP_CLOSE is where we end up on a non-error close too, this has
> the same kind of bug as your previous attempt to set POLLERR
> here.
> 
> One side gets TCP_TIMEWAIT the other goes straight to TCP_CLOSE.
> 
> It really is not possible to change current semantics, they are the
> best possible unfortunately.

AS a last attempt - we can have a sockt flag set when sk-err is
installed in tcp_reset(), and tcp_poll() will set POLLERR if that flag
exist (just like it checks for sk_err, which is cleared in tcp_sendmsg()
when error is being returned).

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
  2007-02-22 13:46                     ` Evgeniy Polyakov
@ 2007-02-22 14:08                       ` David Miller
  2007-02-22 18:24                         ` Stephen Hemminger
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2007-02-22 14:08 UTC (permalink / raw)
  To: johnpol; +Cc: shemminger, netdev

From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Date: Thu, 22 Feb 2007 16:46:21 +0300

> On Thu, Feb 22, 2007 at 05:31:38AM -0800, David Miller (davem@davemloft.net) wrote:
> > TCP_CLOSE is where we end up on a non-error close too, this has
> > the same kind of bug as your previous attempt to set POLLERR
> > here.
> > 
> > One side gets TCP_TIMEWAIT the other goes straight to TCP_CLOSE.
> > 
> > It really is not possible to change current semantics, they are the
> > best possible unfortunately.
> 
> AS a last attempt - we can have a sockt flag set when sk-err is
> installed in tcp_reset(), and tcp_poll() will set POLLERR if that flag
> exist (just like it checks for sk_err, which is cleared in tcp_sendmsg()
> when error is being returned).

Oh is that the problem?  Someone sees a fatal connection error from
write() then attempts to poll() the socket?

That is illegal.

Socket is dead, you cannot do anything reasonable with it and you know
the socket is errored so there is nothing you can possibly try to
poll() on it for.

One should close() the file descriptor at this point.  Even
getpeername() cannot work at this point, since socket is closed and
has lost identity.

Socket errors are delivered as unique events, once error is delivered
the socket is not in error state any more, it is instead closed.
That's why we clear sk->sk_err after error delivery.

BTW, there was a query about this back in Feb. 2006 on linux-kernel,
nobody replied, he reposted to linux-net in September 2006 and this
is likely where this kernel bugzilla comes from :-)

This is not a kernel bug, let's close this and move on.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
  2007-02-22 14:08                       ` David Miller
@ 2007-02-22 18:24                         ` Stephen Hemminger
  0 siblings, 0 replies; 17+ messages in thread
From: Stephen Hemminger @ 2007-02-22 18:24 UTC (permalink / raw)
  To: David Miller; +Cc: johnpol, netdev


> BTW, there was a query about this back in Feb. 2006 on linux-kernel,
> nobody replied, he reposted to linux-net in September 2006 and this
> is likely where this kernel bugzilla comes from :-)
> 
> This is not a kernel bug, let's close this and move on.

Agreed closed

-- 
Stephen Hemminger <shemminger@linux-foundation.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2007-02-22 18:24 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-16  0:04 Fw: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET Stephen Hemminger
2007-02-16 15:25 ` Evgeniy Polyakov
2007-02-16 15:28   ` Evgeniy Polyakov
2007-02-16 15:39     ` Evgeniy Polyakov
2007-02-16 15:54       ` Evgeniy Polyakov
2007-02-16 16:10         ` Evgeniy Polyakov
2007-02-16 18:29           ` Stephen Hemminger
2007-02-16 18:34             ` Evgeniy Polyakov
2007-02-17 16:25               ` Evgeniy Polyakov
2007-02-19 14:19                 ` Jarek Poplawski
2007-02-22  6:47                 ` David Miller
2007-02-22  6:45               ` David Miller
2007-02-22 12:14                 ` Evgeniy Polyakov
2007-02-22 13:31                   ` David Miller
2007-02-22 13:46                     ` Evgeniy Polyakov
2007-02-22 14:08                       ` David Miller
2007-02-22 18:24                         ` Stephen Hemminger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.