linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Davide Libenzi <davidel@xmailserver.org>
To: Eric Varsanyi <e0206@foo21.com>
Cc: David Miller <davem@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [Patch][RFC] epoll and half closed TCP connections
Date: Sat, 12 Jul 2003 13:01:21 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.55.0307121256200.4720@bigblue.dev.mcafeelabs.com> (raw)
In-Reply-To: <20030712181654.GB15643@srv.foo21.com>


[Cc:ing DaveM ]


On Sat, 12 Jul 2003, Eric Varsanyi wrote:

> I'm proposing adding a new POLL event type (POLLRDHUP) as way to solve
> a new race introduced by having an edge triggered event mechanism
> (epoll). The problem occurs when a client writes data and then does a
> write side shutdown(). The server (using epoll) sees only one event for
> the read data ready and the read EOF condition and has no way to tell
> that an EOF occurred.
>
> -Eric Varsanyi
>
> Details
> -----------
> 	- remote sends data and does a shutdown
> 	   [ we see a data bearing packet and FIN from client on the wire ]
>
> 	- user mode server gets around to doing accept() and registers
> 	  for EPOLLIN events (along with HUP and ERR which are forced on)
>
> 	- epoll_wait() returns a single EPOLLIN event on the FD representing
> 	  both the 1/2 shutdown state and data available
>
> At this point there is no way the app can tell if there is a half closed
> connection so it may issue a close() back to the client after writing
> results. Normally the server would distinguish these events by assuming
> EOF if it got a read ready indication and the first read returned 0 bytes,
> or would issue read calls until less data was returned than was asked for.
>
> In a level triggered world this all just works because the read ready
> indication is driven back to the app as long as the socket state is half
> closed. The event driven epoll mechanism folds these two indications
> together and thus loses one 'edge'.
>
> One would be tempted to issue an extra read() after getting back less than
> expected, but this is an extra system call on every read event and you get
> back the same '0' bytes that you get if the buffer is just empty. The only
> sure bet seems to be CTL_MODding the FD to force a re-poll (which would
> cost a syscall and hash-lookup in eventpoll for every read event).
>

Yes, this is overhead that should be avoided. It is true that you could
use Level Triggered events, but if you structured your app on edge you
should be able to solve this w/out overhead.



> 	2) add a new 1/2 closed event type that a poll routine can return
>
> The implementation is trivial, a patch is included below. If this idea sees
> favor I'll fix the other architectures, ipv6, epoll.h, and make a 'real'
> patch. I do not believe any drivers deserve to be modified to return this
> new event.

This looks good to me. David what do you think ?



> diff -Naur linux-2.4.20/include/asm-i386/poll.h linux-2.4.20_ev/include/asm-i386/poll.h
> --- linux-2.4.20/include/asm-i386/poll.h	Thu Jan 23 13:01:28 1997
> +++ linux-2.4.20_ev/include/asm-i386/poll.h	Sat Jul 12 12:29:11 2003
> @@ -15,6 +15,7 @@
>  #define POLLWRNORM	0x0100
>  #define POLLWRBAND	0x0200
>  #define POLLMSG		0x0400
> +#define POLLRDHUP	0x0800
>
>  struct pollfd {
>  	int fd;
> diff -Naur linux-2.4.20/net/ipv4/tcp.c linux-2.4.20_ev/net/ipv4/tcp.c
> --- linux-2.4.20/net/ipv4/tcp.c	Tue Jul  8 09:40:42 2003
> +++ linux-2.4.20_ev/net/ipv4/tcp.c	Sat Jul 12 12:29:56 2003
> @@ -424,7 +424,7 @@
>  	if (sk->shutdown == SHUTDOWN_MASK || sk->state == TCP_CLOSE)
>  		mask |= POLLHUP;
>  	if (sk->shutdown & RCV_SHUTDOWN)
> -		mask |= POLLIN | POLLRDNORM;
> +		mask |= POLLIN | POLLRDNORM | POLLRDHUP;
>
>  	/* Connected? */
>  	if ((1 << sk->state) & ~(TCPF_SYN_SENT|TCPF_SYN_RECV)) {
>



- Davide


  parent reply	other threads:[~2003-07-12 19:54 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-07-12 18:16 [Patch][RFC] epoll and half closed TCP connections Eric Varsanyi
2003-07-12 19:44 ` Jamie Lokier
2003-07-12 20:51   ` Eric Varsanyi
2003-07-12 20:48     ` Davide Libenzi
2003-07-12 21:19       ` Eric Varsanyi
2003-07-12 21:20         ` Davide Libenzi
2003-07-12 21:41         ` Davide Libenzi
2003-07-12 23:11           ` Eric Varsanyi
2003-07-12 23:55             ` Davide Libenzi
2003-07-13  1:05               ` Eric Varsanyi
2003-07-13 20:32       ` David Schwartz
2003-07-13 21:10         ` Jamie Lokier
2003-07-13 23:05           ` David Schwartz
2003-07-13 23:09             ` Davide Libenzi
2003-07-14  8:14               ` Alan Cox
2003-07-14 15:03                 ` Davide Libenzi
2003-07-14  1:27             ` Jamie Lokier
2003-07-13 21:14         ` Davide Libenzi
2003-07-13 23:05           ` David Schwartz
2003-07-13 23:11             ` Davide Libenzi
2003-07-13 23:52             ` Entrope
2003-07-14  6:14               ` David Schwartz
2003-07-14  7:20                 ` Jamie Lokier
2003-07-14  1:51             ` Jamie Lokier
2003-07-14  6:14               ` David Schwartz
2003-07-15 20:27             ` James Antill
2003-07-16  1:46               ` David Schwartz
2003-07-16  2:09                 ` James Antill
2003-07-13 13:12     ` Jamie Lokier
2003-07-13 16:55       ` Davide Libenzi
2003-07-12 20:01 ` Davide Libenzi [this message]
2003-07-13  5:24   ` David S. Miller
2003-07-13 14:07     ` Jamie Lokier
2003-07-13 17:00       ` Davide Libenzi
2003-07-13 19:15         ` Jamie Lokier
2003-07-13 23:03           ` Davide Libenzi
2003-07-14  1:41             ` Jamie Lokier
2003-07-14  2:24               ` POLLRDONCE optimisation for epoll users (was: epoll and half closed TCP connections) Jamie Lokier
2003-07-14  2:37                 ` Davide Libenzi
2003-07-14  2:43                   ` Davide Libenzi
2003-07-14  2:56                   ` Jamie Lokier
2003-07-14  3:02                     ` Davide Libenzi
2003-07-14  3:16                       ` Jamie Lokier
2003-07-14  3:21                         ` Davide Libenzi
2003-07-14  3:42                           ` Jamie Lokier
2003-07-14  4:00                             ` Davide Libenzi
2003-07-14  5:51                               ` Jamie Lokier
2003-07-14  6:24                                 ` Davide Libenzi
2003-07-14  6:57                                   ` Jamie Lokier
2003-07-14  3:12                     ` Jamie Lokier
2003-07-14  3:17                       ` Davide Libenzi
2003-07-14  3:35                         ` Jamie Lokier
2003-07-14  3:04                   ` Jamie Lokier
2003-07-14  3:12                     ` Davide Libenzi
2003-07-14  3:27                       ` Jamie Lokier
2003-07-14 17:09     ` [Patch][RFC] epoll and half closed TCP connections kuznet
2003-07-14 17:09       ` Davide Libenzi
2003-07-14 21:45       ` Jamie Lokier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.55.0307121256200.4720@bigblue.dev.mcafeelabs.com \
    --to=davidel@xmailserver.org \
    --cc=davem@redhat.com \
    --cc=e0206@foo21.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).